Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data
Arora Sanjay K. (),
Kelley Sarah () and
Madhavan Sarvothaman ()
Additional contact information
Arora Sanjay K.: Ernst & Young, LLP, 1101 New York Ave NW, Washington, D.C., 20005, U.S.A.
Kelley Sarah: Child Trends, 7315 Wisconsin Avenue, Suite 1200W, Bethesda, MD, 20814, U.S.A.
Madhavan Sarvothaman: American Institutes for Research, Washington, D.C., 20007, U.S.A.
Journal of Official Statistics, 2021, vol. 37, issue 1, 1-30
Abstract:
This research outlines the process of building a sample frame of US SMEs. The method starts with a list of patenting organizations and defines the boundaries of the population and subsequent frame using free to low-cost data sources, including search engines and websites. Generating high-quality data is of key importance throughout the process of building the frame and subsequent data collection; at the same time, there is too much data to curate by hand. Consequently, we turn to machine learning and other computational methods to apply a number of data matching, filtering, and cleaning routines. The results show that it is possible to generate a sample frame of innovative SMEs with reasonable accuracy for use in subsequent research: Our method provides data for 79% of the frame. We discuss implications for future work for researchers and NSIs alike and contend that the challenges associated with big data collections require not only new skillsets but also a new mode of collaboration.
Keywords: Sample frame; administrative and big data; machine learning; bias; small and medium-sized enterprises (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.2478/jos-2021-0001 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:vrs:offsta:v:37:y:2021:i:1:p:1-30:n:6
DOI: 10.2478/jos-2021-0001
Access Statistics for this article
Journal of Official Statistics is currently edited by Annica Isaksson and Ingegerd Jansson
More articles in Journal of Official Statistics from Sciendo
Bibliographic data for series maintained by Peter Golla ().