Manual versus machine: An evaluation of the performance of the Medical Text Indexer (MTI) at classifying different document types by disease area
Duncan A.Q. Moore,
Ohid Yaqub and
Bhaven N. Sampat
No b75fr, SocArXiv from Center for Open Science
Abstract:
The Medical Subject Headings (MeSH) thesaurus, a controlled vocabulary, is increasingly being used by those who study research and innovation. While classification was once purely entirely manual, human indexers are now assisted by algorithmic suggestions in an effort to automate some of the indexing process. A version of this algorithm, the Medical Text Indexer, has been made available, allowing for classification of arbitrary text into MeSH categories. Potentially, this opens up other document classes to MeSH assignment for research and innovation studies. However, it remains unclear how well the MTI, a tool designed to categorize publications for indexing purposes, can be reliably extended to other document classes. To allow for assessment of the MTI’s performance for different classes of documents, we collected text from grant descriptions, patent claims, and drug indications; and compared the MTI’s categorisation to that of a qualified human classifier. We also tested whether MTI performance varied with text length or score thresholding. Our results suggest that researchers can proceed with confidence that the MTI reliably captures the diseases contained in a text (recall), and that its scoring can be used to guard against false diseases in its outputs (precision).
Date: 2023-02-25
New Economics Papers: this item is included in nep-cmp and nep-hea
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://osf.io/download/63f8bbcbbbc5e5027ff804a9/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:b75fr
DOI: 10.31219/osf.io/b75fr
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().