EconPapers    
Economics at your fingertips  
 

Leveraging LLMs for Unstructured Claims Data Analysis

Robert Lieberthal, Richard Tran, Vietbao Phan, Jawand Singh and Elizabeth Sottung
Additional contact information
Richard Tran: MDSight, LLC
Vietbao Phan: Thomas Jefferson University
Jawand Singh: Lieberthal and Associates, LLC
Elizabeth Sottung: Thomas Jefferson University

Papers from arXiv.org

Abstract: Actuaries rely primarily on structured numerical data for reserving and ratemaking, while valuable predictive information in unstructured text including medical records, adjuster notes, and call transcripts remains largely unused. Manual processing of these documents is time-consuming, inconsistent across reviewers, and unscalable. We present a proof-of-concept framework using large language models (LLMs) to extract structured actuarial variables from unstructured claims data. We implement a two-stage processing architecture separating document-level extraction (Stage 1) from claim-level synthesis (Stage 2). A modular four-script Python pipeline processes synthetic FHIR-based claims data and real claims documents, extracting 36 actuarial variables across reserving, ratemaking, and claims management categories. We validate 14 core variables using two independent clinical expert reviewers scoring 20 synthetic claims on a five-point Likert rubric, achieving mean scores above 4.0 and a weighted kappa of 0.53. Integration with chain ladder reserving demonstrates practical actuarial value: severity-segmented analysis reduced reserve estimation error from 6.5% to 4.0%. The open-source implementation includes audit trails and confidence scoring, providing a replicable foundation for LLM-based actuarial variable extraction in property-casualty insurance.

Date: 2026-06
References: Add references at CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2606.06089 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2606.06089

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2026-06-11
Handle: RePEc:arx:papers:2606.06089