A Machine Learning Approach Towards Startup Success Prediction
Cemre Ünal and
Ioana Ceasu
No 2019-022, IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series"
Abstract:
The importance of startups for a dynamic, innovative and competitive economy has already been acknowledged in the scientific and business literature. The highly uncertain and volatile nature of the startup ecosystem makes the evaluation of startup success through analysis and interpretation of information very time consuming and computationally intensive. This prediction problem brings forward the need for a quantitative model, which should enable an objective and fact- based approach to startup success prediction. This paper presents a series of reproducible models for startup success prediction, using machine learning methods. The data used for this purpose was received from the online investor platform, crunchbase.com. The data has been pre-processed for sampling bias and imbalance by using the oversampling approach, ADASYN. A total of six different models are implemented to predict startup success. Using goodness-of-fit measures, applicable to each model case, the best models selected are the ensemble methods, random forest and extreme gradient boosting with a test set prediction accuracy of 94.1% and 94.5% and AUC of 92.22% and 92.91% respectively. Top variables in these models are last funding to date, first funding lag and company age. The models presented in this study can be used to predict success rate for future new firms/ventures in a repeatable way.
Keywords: Machine; learning (search for similar items in EconPapers)
JEL-codes: C00 (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/230798/1/irtg1792dp2019-022.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:irtgdp:2019022
Access Statistics for this paper
More papers in IRTG 1792 Discussion Papers from Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series" Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().