Measuring a country’s digital industrial structure: commercial websites and weakly supervised classification to the rescue
Giulia Occhini,
Emmanouil Tranos and
Levi John Wolf
Additional contact information
Levi John Wolf: University of Bristol
No h572n, SocArXiv from Center for Open Science
Abstract:
In this paper we propose the use of commercial websites and a contextualized weak supervision framework as an alternative to industrial taxonomies to identify and classify digital industrial activity. Despite the crucial importance of industrial taxonomies for government and research, their static nature leaves taxonomies unable to accurately capture a country’s industrial structure. This is particularly problematic in the context for firms producing novel, digital outputs, which are nowadays classified into the wrong industrial sectors and thus rendered almost invisible to official statistics. To address this issue we show how commercial websites can complement, or even substitute industrial classification surveys and ultimately yield a more complete, up-to-date understanding of a country’s industrial structure evolution. In the process, we compare our classification results using only commercial websites’ landing page versus using full website for classification, finding that a company’s landing page is a better predictor of industrial classes than their full website. We also suggest that our framework could support longitudinal analyses by proposing a pipeline using archival websites. This method can be used by policymakers to identify classes of industries from a bottom-up perspective, while at the same time advocating for the usage of state-of-the art NLP techniques in economics and business research.
Date: 2023-03-07
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/6405effec74723023d10b56b/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:h572n
DOI: 10.31219/osf.io/h572n
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().