Three Methods for Occupation Coding Based on Statistical Learning
Gweon Hyukjun (),
Schonlau Matthias (),
Kaczmirek Lars (),
Blohm Michael () and
Steiner Stefan ()
Additional contact information
Gweon Hyukjun: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada
Schonlau Matthias: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada
Kaczmirek Lars: GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany
Blohm Michael: GESIS – Leibniz-Institute for the Social Sciences, PO Box 12 21 55, D-68072 Mannheim, Germany
Steiner Stefan: Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1 Canada
Journal of Official Statistics, 2017, vol. 33, issue 1, 101-122
Abstract:
Occupation coding, an important task in official statistics, refers to coding a respondent’s text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.
Keywords: Automated coding; Machine learning; ISCO-88; ALLBUS (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1515/jos-2017-0006 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:offsta:v:33:y:2017:i:1:p:101-122:n:6
DOI: 10.1515/jos-2017-0006
Access Statistics for this article
Journal of Official Statistics is currently edited by Annica Isaksson and Ingegerd Jansson
More articles in Journal of Official Statistics from Sciendo
Bibliographic data for series maintained by Peter Golla ().