Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?
Cyril Labbé (),
Guillaume Cabanac (),
Rachael A. West (),
Thierry Gautier (),
Bertrand Favier () and
Jennifer A. Byrne ()
Additional contact information
Cyril Labbé: University of Grenoble Alpes
Guillaume Cabanac: University of Toulouse
Rachael A. West: The University of Sydney
Thierry Gautier: Institute for Advanced Biology
Bertrand Favier: University of Grenoble Alpes
Jennifer A. Byrne: The University of Sydney
Scientometrics, 2020, vol. 124, issue 2, No 16, 1139-1156
Abstract:
Abstract In an idealised vision of science the scientific literature is error-free. Errors reported during peer review are supposed to be corrected prior to publication, as further research establishes new knowledge based on the body of literature. It happens, however, that errors pass through peer review, and a minority of cases errata and retractions follow. Automated screening software can be applied to detect errors in manuscripts and publications. The contribution of this paper is twofold. First, we designed the erroneous reagent checking (ERC) benchmark to assess the accuracy of fact-checkers screening biomedical publications for dubious mentions of nucleotide sequence reagents. It comes with a test collection comprised of 1679 nucleotide sequence reagents that were curated by biomedical experts. Second, we benchmarked our own screening software called Seek&Blastn with three input formats to assess the extent of performance loss when operating on various publication formats. Our findings stress the superiority of markup formats (a 79% detection rate on XML and HTML) over the prominent PDF format (a 69% detection rate at most) regarding an error flagging task. This is the first published baseline on error detection involving reagents reported in biomedical scientific publications. The ERC benchmark is designed to facilitate the development and validation of software bricks to enhance the reliability of the peer review process.
Keywords: Scientific text; Biomedical literature; Fact-checking; Errors; Nucleotide sequences; Reagents; Genes; Benchmark; PDF (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-020-03463-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:124:y:2020:i:2:d:10.1007_s11192-020-03463-z
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-020-03463-z
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().