Fitting generalized linear models when the data exceeds available memory
Joseph Canner () and
Krisztian Sebestyen
Additional contact information
Joseph Canner: Johns Hopkins University School of Medicine, Department of Surgery
Krisztian Sebestyen: Johns Hopkins University School of Medicine, Department of Surgery
2019 Stata Conference from Stata Users Group
Abstract:
Despite the increase in random access memory (RAM) capacity and the decrease in RAM prices in the years since Stata was first released, the increase in the size of data sets in recent years can still exceed available RAM. This is particularly true for those who are using Stata on a personal laptop or desktop instead of an enterprise server. Accordingly, there is a need for statistical tools that can read small chunks of data from disk, perform calculations on those chunks, accumulate intermediate results, and produce final results that are the same as those obtained by performing the entire calculation in memory. The most ubiquitous statistical method is the generalized linear model (GLM), and mathematical methods have been available for many years to update the Q-R or Cholesky decomposition matrices with small chunks of data. Thomas Lumley’s R command bigglm uses Fortran functions published by Alan J. Miller in 1992 and freely available as Algorithm AS 274. We have developed –bigglm- for Stata using the same functions, as well as expanding the library of available family and link functions. The current version can read Stata datasets as well as import data from an ODBC source. In the presentation we will discuss the limitations of the current approach and suggest areas for improvement.
Date: 2019-08-02
References: Add references at CitEc
Citations:
Downloads: (external link)
http://fmwww.bc.edu/repec/scon2019/chicago19_Canner.pdf
Our link check indicates that this URL is bad, the error code is: 404 Not Found
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:scon19:49
Access Statistics for this paper
More papers in 2019 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().