Krishnan Bhaskaran () and
Hannah Green ()
Additional contact information Krishnan Bhaskaran: MRC Clinical Trials Unit, London
Hannah Green: MRC Clinical Trials Unit, London
Abstract:
We introduce the assertk command, beginning with a motivation and a comparison with the built-in assert command. We will then show some examples demonstrating the various options that can be used to produce customized output and to perform more complex checks. assertk is a simple utility that makes data consistency checking and reporting on data quality easy. The built-in Stata command assert checks each observation for a specified condition and halts do-files and ado-files when the specified condition is not satisfied. For example: . assert age entry < . 2 contradictions in 149 observations assertion is false; end of do-file r(9); Thus assert is a useful tool for checking important assumptions about the data you are about to process; your do-file will simply not continue if these assumptions do not pass the checks. The principle of the assert command also lends itself to consistency checking, i.e., performing a suite of checks on a dataset to identify potential errors. This is an important part of the process of data cleaning. However, in this application, the halting of do files is a hindrance, and there is a lack of detailed output showing which observations failed the check. In assertk, a condition is specified, and each observation is checked against this condition. If any data do not pass the check, the irregularities are output (with the output customizable by various options) and the do-file continues. For example: . assertk age ent < ., mess(Age at entry is missing) vars(id age ent) Age at entry is missing (1 obs) id age ent 38048 . 40352 . Thus a suite of checks can be programmed easily, with one line per check, and a meaningful log of data errors can be produced for use by data managers and statisticians.
Date: 2006-09-18
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works: This item may be available elsewhere in EconPapers: Search for items with the same title.
More papers in United Kingdom Stata Users' Group Meetings 2006 from Stata Users Group Contact information at EDIRC. Series data maintained by Christopher F Baum ().
This site is part of RePEc
and all the data displayed here is part of the RePEc data set.
Is your work missing from RePEc? Here is how to
contribute.
Questions or problems? Check the EconPapers FAQ or send mail to .