EconPapers    
Economics at your fingertips  
 

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

Ver\'onica B\"acker-Peral, Vitaly Meursault and Christopher Severen

Papers from arXiv.org

Abstract: Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is 100 times less expensive than outsourcing, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an $R^2$ of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or gold standard data. LLM-based digitization unlocks complex historical tables, enabling new economic analyses and broader researcher participation.

Date: 2025-05
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2505.11599 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2505.11599

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2025-06-14
Handle: RePEc:arx:papers:2505.11599