Creating self-validating datasets

Rising, Bill

Creating self-validating datasets

Bill Rising ()
Additional contact information
Bill Rising: StataCorp

United Kingdom Stata Users' Group Meetings 2007 from Stata Users Group

Abstract: One of Stata’s great strengths is its data management abilities. When either building or sharing datasets, some of the most time-consuming activities are validating the data and writing documentation for the data. Much of this futility could be avoided if datasets were self-contained, i.e., if they could validate themselves. I will show how to achieve this goal within Stata. I will demonstrate a package of commands for attaching validation rules to the variables themselves, via characteristics, along with commands for running error checks and marking suspicious observations in the dataset. The validation system is flexible enough that simple checks continue to work even if variable names change or if the data are reshaped, and it is rich enough that validation may depend on other variables in the dataset. Since the validation is at the variable level, the self-validation also works if variables are recombined with data from other datasets. With these tools, Stata’s datasets can become truly self-contained.

Date: 2007-09-14
References: Add references at CitEc
Citations:

Downloads: (external link)
http://repec.org/usug2007/ckvarTalk.beamer.pdf presentation slides (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:usug07:18

Access Statistics for this paper

More papers in United Kingdom Stata Users' Group Meetings 2007 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().