Tobacco spending in Georgia: Machine learning approach
Maksym Obrizan (),
Karine Torosyan and
Norberto Pignatti ()
No 3184, Working Papers from Research Consulting and Development
The purpose of this study is to analyze tobacco spending in Georgia using various machine learning methods applied to a sample of 10,757 households from Integrated Household Survey collected by GeoStat in 2016. Previous research has shown that smoking is the leading cause of death for 35-69 year olds. In addition, tobacco expenditures may constitute as much as 17% of the household budget. Five different algorithms (ordinary least squares, random forest, two gradient boosting methods and deep learning) were applied to 8,173 households (or 76.0%) in the train set. Out-of-sample predictions were then obtained for 2,584 remaining households in the test set. Under the default settings random forest algorithm showed the best performance with more than 10% improvement in terms of root-mean-square error (RMSE). Improved accuracy and availability of machine learning tools in R calls for active use of these methods by policy makers and scientists in health economics, public health and related fields.
Keywords: Tobacco Spending; Household Survey; Georgia; Machine Learning (search for similar items in EconPapers)
JEL-codes: I12 L66 D12 (search for similar items in EconPapers)
Pages: 7 pages
New Economics Papers: this item is included in nep-agr, nep-big, nep-cis, nep-cmp, nep-cwa and nep-hea
References: View references in EconPapers View complete reference list from CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
http://rcd.org.ua/RePEc/files/WP3184.pdf First version, 2018 (application/pdf)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:rcd:wpaper:3184
Access Statistics for this paper
More papers in Working Papers from Research Consulting and Development Contact information at EDIRC.
Bibliographic data for series maintained by Olena Solovyova ().