dqrep: Facilitating harmonized data-quality assessments with Stata
Carsten Oliver Schmidt,
Stephan Struckmann and
Birgit Schauer
Additional contact information
Carsten Oliver Schmidt: University Medicine Greifswald
Stephan Struckmann: University Medicine Greifswald
Birgit Schauer: University Medicine Greifswald
2023 Stata Conference from Stata Users Group
Abstract:
Transparent data-quality reporting is a key element of reproducible research. Transparency ranges from explicit assumptions underlying any data-quality check up to harmonized reporting that facilitates comparisons of results within and across studies. However, this is far from being common. To the best of our knowledge, none of the existing routines was capable of triggering a series of structured reports on multiple datasets with potentially unknown errors based on a single command call to grade and compare data-quality issues. Therefore, the dqrep Stata package was developed. dqrep triggers a set of more than 60 newly developed Stata ado’s to compute a customizable range of quality checks. This comprises descriptive overviews, missing values, rule violations, outliers, time trends, observer and device effects. Underlying assumptions are read from easily modifiable spreadsheets. Based on this, all results are integrated in PDF and docx files, as well as in result summary files to facilitate postprocessing, for example, to create benchmarks. It is shown how a single command call is used to control the data-quality pipeline in a large scale cohort study and how this may contribute to FAIR research.
Date: 2023-07-29
References: Add references at CitEc
Citations:
Downloads: (external link)
http://repec.org/usug2023/US23_Schmidt.zip
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:boc:usug23:18
Access Statistics for this paper
More papers in 2023 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().