Data Science in Strategy: Machine learning and text analysis in the study of firm growth
Daan Kolkman and
Arjen van Witteloostuijn
Additional contact information
Daan Kolkman: Technical University Eindhoven
Arjen van Witteloostuijn: Vrije Universiteit Amsterdam
No 19-066/VI, Tinbergen Institute Discussion Papers from Tinbergen Institute
Abstract:
This study examines the applicability of modern Data Science techniques in the domain of Strategy. We apply novel techniques from the field of machine learning and text analysis. WE proceed in two steps. First, we compare different machine learning techniques to traditional regression methods in terms of their goodness-of-fit, using a dataset with 168,055 firms, only including basic demographic and financial information. The novel methods fare to three to four times better, with the random forest technique achieving the best goodness-of-fit. Second, based on 8,163 informative websites of Dutch SMEs, we construct four additional proxies for personality and strategy variables. Including our four text-analyzed variables adds about 2.5 per cent to the R2. Together, our pair of contributions provide evidence for the large potential of applying modern Data Science techniques in Strategy research. We reflect on the potential contribution of modern Data Science techniques from the perspective of the common critique that machine learning offers increased predictive accuracy at the expense of explanatory insight. Particularly, we will argue and illustrate why and how machine learning can be a productive element in the abductive theory-building cycle.
JEL-codes: L1 (search for similar items in EconPapers)
Date: 2019-09-20
New Economics Papers: this item is included in nep-big, nep-cmp, nep-ent and nep-sbm
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://papers.tinbergen.nl/19066.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:tin:wpaper:20190066
Access Statistics for this paper
More papers in Tinbergen Institute Discussion Papers from Tinbergen Institute Contact information at EDIRC.
Bibliographic data for series maintained by Tinbergen Office +31 (0)10-4088900 ().