EconPapers    
Economics at your fingertips  
 

Automating survey coding for occupation

Malte Schierholz
Additional contact information
Malte Schierholz: Institute for Employment Research (IAB), Nuremberg, Germany ; Universität Mannheim, MZES

No 201410 (en), FDZ-Methodenreport from Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany]

Abstract: "Currently, most surveys ask for occupation with open-ended questions. The verbatim responses are coded afterwards into a classification with hundreds of categories and thousands of jobs, which is an error-prone, time-consuming, and costly task. Research related to the coding of occupations is summarized with an international literature review. Special attention is paid to our main topic, the automation of coding. A prominent approach for automated coding is to consult a dictionary on the correct code. In contrast, we focus on data-based methods where codes for new answers are predicted from those answers that are already coded. Four different coding methods are tested on two data sets: (1) Rule-based Coding that consults a dictionary, (2) data-based Naive Bayes that allows coding for text answers with multiple words, (3) data-based Bayesian Categorical is used to improve performance when relatively few answers were coded before, and (4) Combined Methods (Boosting) combining predictions from the first three methods. The proposed Bayesian Categorical model is able to code 38% of all answers at 3% error rate without human interaction. In all remaining cases or for higher quality human intellect is needed to decide on the correct code and computer software can only assist by suggesting possible job codes. With the prototype software we developed for this task, we expect that for 74% of all answers the correct category is provided within the top five code suggestions. The training data used for prediction consists of only 32882 coded answers which is small compared to other systems with similar purpose. The proportions given above are expected to improve with additional training data." (Author's abstract, IAB-Doku) ((en))

Keywords: Bundesrepublik Deutschland; Berufsklassifikation; Kodierer; Methodenliteratur (search for similar items in EconPapers)
Pages: 65 pages
Date: 2014
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doku.iab.de/fdz/reporte/2014/MR_10-14_EN.pdf

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:iab:iabfme:201410(en)

Access Statistics for this paper

More papers in FDZ-Methodenreport from Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany] Contact information at EDIRC.
Bibliographic data for series maintained by IAB, Geschäftsbereich Wissenschaftliche Fachinformation und Bibliothek ().

 
Page updated 2025-04-16
Handle: RePEc:iab:iabfme:201410(en)