Institutional sector classifier, a machine learning approach
Paolo Massaro (),
Ilaria Vannini and
Oliver Giudice ()
Additional contact information
Oliver Giudice: Bank of Italy
No 548, Questioni di Economia e Finanza (Occasional Papers) from Bank of Italy, Economic Research and International Relations Area
Abstract:
We implement machine learning techniques to obtain an automatic classification by sector of economic activity of the Italian companies recorded in the Bank of Italy Entities Register. To this end, first we extract a sample of correctly classified corporations from the universe of Italian companies. Second, we select a set of features that are related to the sector of economic activity code and use these to implement supervised approaches to infer output predictions. We choose a multi-step approach based on the hierarchical structure of the sector classification. Because of the imbalance in the target classes, at each step, we first apply two resampling procedures – random oversampling and the Synthetic Minority Over-sampling Technique – to get a more balanced training set. Then, we fit Gradient Boosting and Support Vector Machine models. Overall, the performance of our multi-step classifier yields very reliable predictions of the sector code. This approach can be employed to make the whole classification process more efficient by reducing the area of manual intervention.
Keywords: machine learning; entities register; classification by institutional sector (search for similar items in EconPapers)
JEL-codes: C18 C81 G21 (search for similar items in EconPapers)
Date: 2020-03
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.bancaditalia.it/pubblicazioni/qef/2020-0548/QEF_548_20.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bdi:opques:qef_548_20
Access Statistics for this paper
More papers in Questioni di Economia e Finanza (Occasional Papers) from Bank of Italy, Economic Research and International Relations Area Contact information at EDIRC.
Bibliographic data for series maintained by ().