Using Stata to Manage and Create a Research Data Bank
Frederick Wolfe () and
Kaleb Michaud
Additional contact information
Frederick Wolfe: National Data Bank for Rheumatic Diseases
Kaleb Michaud: National Data Bank for Rheumatic Diseases
No 11, North American Stata Users' Group Meetings 2003 from Stata Users Group
Abstract:
We manage a longitudinal research data bank containing 3,000 variables that adds 25,000 observations per year. Data are batch converted from SQL to Stata on a daily basis, resulting in the creation of 20 preliminary data sets. We then use Stata to quality control the data and to prepare a single research data set that can be augmented as required by the data analyst by calls to specialized programs that access the additional data sets. Our philosophy is to that most of the quality control and programming and data set preparation should be built into the dataset creation process rather than requiring the data user to do this. For example, data quality checks and complex data preparation of items such as costs and hospital and mortality codes are programmed into the data set creation process, and relevant additional data sets are automatically created to reflect such new data. The basic data set consists of research and control variables that are needed for most analyses. With simple programming statements such as -getwork- and -getcosts-, preprocessed work and cost data, for example, are merged with the basic set. Global macros identify file locations, database versions, and variable sets, making updating and sharing simple.
Date: 2003-01-08
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:asug03:11
Access Statistics for this paper
More papers in North American Stata Users' Group Meetings 2003 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().