Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE
Christian M{\o}ller Dahl,
Torben Johansen and
Christian Vedel
Papers from arXiv.org
Abstract:
This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system. The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming. We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took days and weeks. The model is trained on 14 million pairs of occupational descriptions and HISCO codes in 13 different languages contributed by 22 different sources. Our approach is shown to have accuracy, recall, and precision above 90 percent. Our tool breaks the metaphorical HISCO barrier and makes this data readily available for analysis of occupational structures with broad applicability in economics, economic history, and various related disciplines.
Date: 2024-02, Revised 2024-04
New Economics Papers: this item is included in nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://arxiv.org/pdf/2402.13604 Latest version (application/pdf)
Related works:
Working Paper: Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE (2024) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2402.13604
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().