CAD: an algorithm for citation-anchors detection in research papers

Ahmad, Riaz; Afzal, Muhammad Tanvir

CAD: an algorithm for citation-anchors detection in research papers

Riaz Ahmad () and Muhammad Tanvir Afzal ()
Additional contact information
Riaz Ahmad: Capital University of Science & Technology
Muhammad Tanvir Afzal: Capital University of Science & Technology

Scientometrics, 2018, vol. 117, issue 3, No 5, 1405-1423

Abstract: Abstract Citations are very important parameters and are used to take many important decisions like ranking of researchers, institutions, countries, and to measure the relationship between research papers. All of these require accurate counting of citations and their occurrence (in-text citation counts) within the citing papers. Citation anchors refer to the citation made within the full text of the citing paper for example: ‘[1]’, ‘(Afzal et al, 2015)’, ‘[Afzal, 2015]’ etc. Identification of citation-anchors from the plain-text is a very challenging task due to the various styles and formats of citations. Recently, Shahid et al. highlighted some of the problems such as commonality in content, wrong allotment, mathematical ambiguities, and string variations etc in automatically identifying the in-text citation frequencies. The paper proposes an algorithm, CAD, for identification of citation-anchors and its in-text citation frequency based on different rules. For a comprehensive analysis, the dataset of research papers is prepared: on both Journal of Universal Computer Science (J.UCS) and (2) CiteSeer digital libraries. In experimental study, we conducted two experiments. In the first experiment, the proposed approach is compared with state-of-the-art technique over both datasets. The J.UCS dataset consists of 1200 research papers with 16,000 citation strings or references while the CiteSeer dataset consists of 52 research papers with 1850 references. The total dataset size becomes 1252 citing documents and 17,850 references. The experiments showed that CAD algorithm improved F-score by 44% and 37% respectively on both J.UCS and CiteSeer dataset over the contemporary technique (Shahid et al. in Int J Arab Inf Technol 12:481–488, 2014). The average score is 41% on both datasets. In the second experiment, the proposed approach is further analyzed against the existing state-of-the-art tools: CERMINE and GROBID. According to our results, the proposed approach is best performing with F1 of 0.99, followed by GROBID (F1 0.89) and CERMINE (F1 0.82).

Keywords: In-text citation analysis; Citation string; Citation-anchor; Citation-tag; Citation frequency; Research papers (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-018-2920-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:117:y:2018:i:3:d:10.1007_s11192-018-2920-6

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-018-2920-6

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().