EconPapers    
Economics at your fingertips  
 

Textizing Statistical Tables using OCR at Scale

Yutaka Arimoto

Economic Review, 2022, vol. 73, issue 1, 15-28

Abstract: This study describes the requirements and methods for textizing statistical tables using OCR(optical character recognition)at scale. A major challenge of textizing statistical tables using OCR is analyzing the table layout with high accuracy. I develop a Python toolkit, ocrstats, which supports the task by providing batch processing, automation of routine processes, use of external OCR, and table layout analysis with practical accuracy. In addition, I explain the practical tips learned from the process of textizing the Japan Imperial Statistical Yearbook using ocrstats.

JEL-codes: Y1 (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

Downloads: (external link)
https://hermes-ir.lib.hit-u.ac.jp/hermes/ir/re/72558/keizaikenkyu07301015.pdf

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hit:ecorev:v:73:y:2022:i:1:p:15-28

DOI: 10.15057/72558

Access Statistics for this article

More articles in Economic Review from Hitotsubashi University Contact information at EDIRC.
Bibliographic data for series maintained by Digital Resources Section, Hitotsubashi University Library ().

 
Page updated 2025-03-19
Handle: RePEc:hit:ecorev:v:73:y:2022:i:1:p:15-28