Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables
Veronica Backer-Peral (),
Vitaly Meursault and
Christopher Severen
Additional contact information
Vitaly Meursault: https://www.philadelphiafed.org/our-people/meursault-vitaly
No 25-28, Working Papers from Federal Reserve Bank of Philadelphia
Abstract:
Multimodal LLMs offer a watershed change for the digitization of historical tables, enabling low-cost processing centered on domain expertise rather than technical skills. We rigorously validate an LLM-based pipeline on a new panel of historical county-level vehicle registrations. This pipeline is estimated to be 100 times less expensive than outsourcing options, reduces critical parsing errors from 40% to 0.3%, and matches human-validated gold standard data with an R2 of 98.6%. Analyses of growth and persistence in vehicle adoption are statistically indistinguishable whether using LLM or gold standard data. LLM-based digitization unlocks complex historical tables, enabling new economic analyses and broader researcher participation.
Keywords: OCR; Layout Parsing; Entity Linking; Multimodal LLM; Vehicle Adoption (search for similar items in EconPapers)
JEL-codes: C80 N32 N72 R40 (search for similar items in EconPapers)
Pages: 33
Date: 2025-09-30
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.philadelphiafed.org/-/media/FRBP/Asset ... ers/2025/wp25-28.pdf (application/pdf)
Related works:
Working Paper: Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables? (2025) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:fip:fedpwp:101850
Ordering information: This working paper can be ordered from
DOI: 10.21799/frbp.wp.2025.28
Access Statistics for this paper
More papers in Working Papers from Federal Reserve Bank of Philadelphia Contact information at EDIRC.
Bibliographic data for series maintained by Beth Paul ().