EconPapers    
Economics at your fingertips  
 

codefinder: optimising Stata for the analysis of large, routinely collected healthcare data

Jonathan Batty and Marlous Hall
Additional contact information
Jonathan Batty: University of Leeds
Marlous Hall: University of Leeds

UK Stata Conference 2024 from Stata Users Group

Abstract: Routinely collected healthcare data (including electronic healthcare records and administrative data) are increasingly available at the whole-population scale, and may span decades of data collection. These data may be analysed as part of clinical, pharmacoepidemiologic and health services research, producing insights that improve future clinical care. However, the analysis of healthcare data on this scale presents a number of unique challenges. These include the storage of diagnosis, medication and procedure codes using a number of discordant systems (including ICD-9 and 10, SNOMED-CT, Read codes, etc.) and the inherently relational nature of the data (each patient has multiple clinical contacts, during which multiple codes may be recorded). Pre-processing and analysing these data using optimised methods has a number of benefits, including minimisation of computational requirements, analytic time, carbon footprint and cost. We will focus on one of the main issues faced by the healthcare data analyst: how to most efficiently collapse multiple, disparate diagnosis codes (stored as strings across a number of variables) into a discrete disease entity, using a pre-defined code list. A number of approaches (including the use of Boolean logic, the inlist function, string functions and regular expressions) will be sequentially benchmarked in a large, real-world healthcare dataset (n = 192 million hospitalisation episodes during a 12-year period; approximately 1 terabyte of data). The time and space complexity of each approach (in addition to its carbon footprint), will be reported. The most efficient strategy has been implemented into our newly-developed Stata command: codefinder, which will be discussed.

Date: 2024-09-16
New Economics Papers: this item is included in nep-hea
References: Add references at CitEc
Citations:

Downloads: (external link)
http://repec.org/lsug2024/UK24_Batty2.pptx

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:lsug24:21

Access Statistics for this paper

More papers in UK Stata Conference 2024 from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().

 
Page updated 2025-03-19
Handle: RePEc:boc:lsug24:21