EconPapers    
Economics at your fingertips  
 

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

Ver\'onica B\"acker-Peral, Vitaly Meursault and Christopher Severen

Papers from arXiv.org

Abstract: Multimodal LLMs offer the potential for a watershed change for the digitization of historical tables by enabling low-cost processing that is centered on domain expertise rather than technical skill. We develop and rigorously assess an LLM-based pipeline on a new panel of historical county-level vehicle registration tables from early 20th-century U.S. state reports. Using human-transcribed gold standard data for evaluation, the pipeline achieves an exact cell match rate of 95.4% at approximately 50 times less expense than traditional outsourcing. The pipeline performs well at extracting table structure, where it reduces critical parsing errors from 61.4% to 0.35%; in numerical transcription, where it exactly matches 96.7% of linked cells and achieves a mean absolute percentage error of 0.7%. The pipeline performs on par with human-based category alignment. We also assess pipeline performance in situ with two case studies that analyze the growth and persistence of historical vehicle adoption using common regression models. The significance and sign of effects are identical whether using LLM or gold standard data for all eight models tested, and the coefficient of interest is statistically indistinguishable in six of eight models.

Date: 2025-05, Revised 2026-07
New Economics Papers: this item is included in nep-gro and nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://arxiv.org/pdf/2505.11599 Latest version (application/pdf)

Related works:
Working Paper: Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables (2025) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2505.11599

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().

 
Page updated 2026-07-03
Handle: RePEc:arx:papers:2505.11599