2SCE-4SL: a 2-stage causality extraction framework for scientific literature
Yujie Zhang (),
Rujiang Bai (),
Ling Kong and
Xiaoyue Wang
Additional contact information
Yujie Zhang: Nanjing University
Rujiang Bai: Shandong University of Technology
Ling Kong: Nanjing Agricultural University
Xiaoyue Wang: Shandong University of Technology
Scientometrics, 2024, vol. 129, issue 11, No 31, 7175-7195
Abstract:
Abstract Extracting causality from scientific literature is a crucial task that underpins many downstream knowledge-driven applications. To this end, this paper presents a novel causality extraction framework for scientific literature, called 2-Stage Causality Extraction for Scientific Literature (2SCE-4SL). The framework consists of two stages: in the stage 1, terms and causal trigger words are identified from causal sentences in the literature, and noisy causal triplets are then collocated. In the stage 2, we propose a Denoising AutoEncoder based on Transformer to represent the causal sentences. This approach is used to learn the causal dependency and contextual information of sentences, incorporating causal trigger word tagging and noise elimination, as well as injecting domain-specific knowledge. By combining the causality structure of stage 1 and the causality representation of stage 2, the true causal triplets are identified from the noisy causal triplets. We conducted experiments on an open access scientific literature dataset, comparing the performance of different disciplines, different training data volume, different document length and whether causality representation. We found that the average precision of 2SCE-4SL was 0.8146, and the average F1 was 0.8308, with the best performance achieved on full-text data. We also verified the effectiveness of the causality representation in stage 2, demonstrating that the architecture can capture the causal dependency of sentences and achieve good performance on two related tasks. Overall, detailed comparative and ablation experiments revealed that 2SCE-4SL requires only a small amount of annotated data to achieve better performance and domain adaptability in scientific literature.
Keywords: 2SCE-4SL; Causality extraction; Causality representation; Scientific literature mining (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-023-04817-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:11:d:10.1007_s11192-023-04817-z
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-023-04817-z
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().