Big Data in Stata
Andrew Maurer
United Kingdom Stata Users' Group Meetings 2015 from Stata Users Group
Abstract:
With more and more data being stored by organizations across industries – from academia, to health care, to banking – along with plummeting storage and RAM costs, there is a growing need for tools to analyze “big data”. The world is moving from needing to analyze megabytes of data to needing to analyze many gigabytes. While Stata is very user-friendly, many of the most basic commands – summarize, sample, collapse, and encode, etc – are not optimized for speed. These commands – as of Stata 14 – all rely on sorting, making them tens, or even hundreds (in the case of sample), of times slower than what is possible with better algorithms. In this presentation I illustrate alternative algorithms along with coded examples in Stata, Mata, and C++ plugins which may be used to more quickly analyze big data. fastsample and fastcollapse are available from the SSC.
Date: 2015-09-16
References: Add references at CitEc
Citations:
Downloads: (external link)
http://repec.org/usug2015/maurer_uksug15.pdf presentation slides (application/pdf)
Our link check indicates that this URL is bad, the error code is: 404 Not Found
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:usug15:09
Access Statistics for this paper
More papers in United Kingdom Stata Users' Group Meetings 2015 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().