EconPapers    
Economics at your fingertips  
 

Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

Christian M{\o}ller Dahl, Torben Johansen and Christian Vedel

Papers from arXiv.org

Abstract: This paper introduces OccCANINE, an open-source tool that maps occupational descriptions to HISCO codes. Manual coding is slow and error-prone; OccCANINE replaces weeks of work with results in minutes. We fine-tune CANINE on 15.8 million description-code pairs from 29 sources in 13 languages. The model achieves 96 percent accuracy, precision, and recall. We also show that the approach generalizes to three systems - OCC1950, OCCICEM, and ISCO-68 - and release them open source. By breaking the "HISCO barrier," OccCANINE democratizes access to high-quality occupational coding, enabling broader research in economics, economic history, and related disciplines.

Date: 2024-02, Revised 2026-02
New Economics Papers: this item is included in nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
http://arxiv.org/pdf/2402.13604 Latest version (application/pdf)

Related works:
Working Paper: Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE (2024) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2402.13604

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2026-02-26
Handle: RePEc:arx:papers:2402.13604