EconPapers    
Economics at your fingertips  
 

Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

Christian Møller Dahl, Torben Johansen and Christian Vedel
Additional contact information
Christian Møller Dahl: University of Southern Denmark
Torben Johansen: University of Southern Denmark

No 255, Working Papers from European Historical Economics Society (EHES)

Abstract: This paper introduces a new tool, OccCANINE, to automatically transform occupational descriptions into the HISCO classification system. The manual work involved in processing and classifying occupational descriptions is error-prone, tedious, and time-consuming. We finetune a preexisting language model (CANINE) to do this automatically, thereby performing in seconds and minutes what previously took days and weeks. The model is trained on 14 million pairs of occupational descriptions and HISCO codes in 13 different languages contributed by 22 different sources. Our approach is shown to have accuracy, recall, and precision above 90 percent. Our tool breaks the metaphorical HISCO barrier and makes this data readily available for analysis of occupational structures with broad applicability in economics, economic history, and various related disciplines.

Keywords: Occupational Standardization; HISCO Classification System; Machine Learning in Economic History; Language Models (search for similar items in EconPapers)
JEL-codes: C55 C81 J1 N01 N3 N6 O33 (search for similar items in EconPapers)
Pages: 27 pages
Date: 2024-04
New Economics Papers: this item is included in nep-big and nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.ehes.org/wp/EHES_255.pdf (application/pdf)

Related works:
Working Paper: Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE (2024) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hes:wpaper:0255

Access Statistics for this paper

More papers in Working Papers from European Historical Economics Society (EHES) Contact information at EDIRC.
Bibliographic data for series maintained by Paul Sharp ().

 
Page updated 2025-03-24
Handle: RePEc:hes:wpaper:0255