A Supervised Multiclass Classifier for an Autocoding System
Yukako Toko,
Kazumi Wada and
Mariko Kawano
Additional contact information
Yukako Toko: National Statistics Center, Research and Development Division, Japan
Kazumi Wada: National Statistics Center, Research and Development Division, Japan
Mariko Kawano: National Statistics Center, Research and Development Division, Japan
Romanian Statistical Review, 2017, vol. 65, issue 4, 29-39
Abstract:
Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naive Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.
Keywords: Coding; Text classification; Naive Bayes; Machine learning (search for similar items in EconPapers)
JEL-codes: C38 (search for similar items in EconPapers)
Date: 2017
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.revistadestatistica.ro/wp-content/uploads/2017/11/RRS-4_2017_A02.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:rsr:journl:v:65:y:2017:i:4:p:29-39
Access Statistics for this article
More articles in Romanian Statistical Review from Romanian Statistical Review Contact information at EDIRC.
Bibliographic data for series maintained by Adrian Visoiu ().