EconPapers    
Economics at your fingertips  
 

Three Methods for Occupation Coding Based on Statistical Learning

Gweon Hyukjun (), Schonlau Matthias (), Kaczmirek Lars (), Blohm Michael () and Steiner Stefan ()
Additional contact information
Gweon Hyukjun: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada
Schonlau Matthias: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada
Kaczmirek Lars: GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany
Blohm Michael: GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany
Steiner Stefan: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada

Journal of Official Statistics, 2017, vol. 33, issue 1, 101-122

Abstract: Occupation coding, an important task in official statistics, refers to coding a respondent’s text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.

Keywords: Automated coding; Machine learning; ISCO-88; ALLBUS (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1515/jos-2017-0006 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:vrs:offsta:v:33:y:2017:i:1:p:101-122:n:6

DOI: 10.1515/jos-2017-0006

Access Statistics for this article

Journal of Official Statistics is currently edited by Annica Isaksson and Ingegerd Jansson

More articles in Journal of Official Statistics from Sciendo
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-20
Handle: RePEc:vrs:offsta:v:33:y:2017:i:1:p:101-122:n:6