EconPapers    
Economics at your fingertips  
 

ETL: From the German Health Data Lab data formats to the OMOP Common Data Model

Melissa Finster, Maxim Moinat and Elham Taghizadeh

PLOS ONE, 2025, vol. 20, issue 1, 1-21

Abstract: Objective: The German Health Data Lab is going to provide access to German statutory health insurance claims data ranging from 2009 to the present for research purposes. Due to evolving data formats within the German Health Data Lab, there is a need to standardize this data into a Common Data Model to facilitate collaborative health research and minimize the need for researchers to adapt to multiple data formats. For this purpose we selected transforming the data to the Observational Medical Outcomes Partnership Common Data Model. Methods: We developed an Extract, Transform, and Load (ETL) pipeline for two distinct German Health Data Lab data formats: Format 1 (2009-2016) and Format 3 (2019 onwards). Due to the identical format structure of Format 1 and Format 2 (2017 -2018), the ETL pipeline of Format 1 can be applied on Format 2 as well. Our ETL process, supported by Observational Health Data Sciences and Informatics tools, includes specification development, SQL skeleton creation, and concept mapping. We detail the process characteristics and present a quality assessment that includes field coverage and concept mapping accuracy using example data. Results: For Format 1, we achieved a field coverage of 92.7%. The Data Quality Dashboard showed 100.0% conformance and 80.6% completeness, although plausibility checks were disabled. The mapping coverage for the Condition domain was low at 18.3% due to invalid codes and missing mappings in the provided example data. For Format 3, the field coverage was 86.2%, with Data Quality Dashboard reporting 99.3% conformance and 75.9% completeness. The Procedure domain had very low mapping coverage (2.2%) due to the use of mocked data and unmapped local concepts The Condition domain results with 99.8% of unique codes mapped. The absence of real data limits the comprehensive assessment of quality. Conclusion: The ETL process effectively transforms the data with high field coverage and conformance. It simplifies data utilization for German Health Data Lab users and enhances the use of OHDSI analysis tools. This initiative represents a significant step towards facilitating cross-border research in Europe by providing publicly available, standardized ETL processes (https://github.com/FraunhoferMEVIS/ETLfromHDLtoOMOP) and evaluations of their performance.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0311511 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 11511&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0311511

DOI: 10.1371/journal.pone.0311511

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-05-06
Handle: RePEc:plo:pone00:0311511