An expert‐in‐the‐loop method for domain‐specific document categorization based on small training data
Kanyao Han,
Rezvaneh Rezapour,
Katia Nakamura,
Dikshya Devkota,
Daniel C. Miller and
Jana Diesner
Journal of the Association for Information Science & Technology, 2023, vol. 74, issue 6, 669-684
Abstract:
Automated text categorization methods are of broad relevance for domain experts since they free researchers and practitioners from manual labeling, save their resources (e.g., time, labor), and enrich the data with information helpful to study substantive questions. Despite a variety of newly developed categorization methods that require substantial amounts of annotated data, little is known about how to build models when (a) labeling texts with categories requires substantial domain expertise and/or in‐depth reading, (b) only a few annotated documents are available for model training, and (c) no relevant computational resources, such as pretrained models, are available. In a collaboration with environmental scientists who study the socio‐ecological impact of funded biodiversity conservation projects, we develop a method that integrates deep domain expertise with computational models to automatically categorize project reports based on a small sample of 93 annotated documents. Our results suggest that domain expertise can improve automated categorization and that the magnitude of these improvements is influenced by the experts' understanding of categories and their confidence in their annotation, as well as data sparsity and additional category characteristics such as the portion of exclusive keywords that can identify a category.
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.24714
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:74:y:2023:i:6:p:669-684
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635
Access Statistics for this article
More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().