Mining Chinese Historical Sources At Scale: A Machine Learning-Approach to Qing State Capacity
Wolfgang Keller,
Carol Shiue and
Sen Yan
No 19517, CEPR Discussion Papers from Centre for Economic Policy Research
Abstract:
Primary historical sources are often by-passed for secondary sources due to high human costs of accessing and extracting primary information–especially in lower-resource settings. We propose a supervised machine-learning approach to the natural language processing of Chinese historical data. An application to identifying different forms of social unrest in the Veritable Records of the Qing Dynasty shows that approach cuts dramatically down the cost of using primary source data at the same time when it is free from human bias, reproducible, and flexible enough to address particular questions. External evidence on triggers of unrest also suggests that the computer-based approach is no less successful in identifying social unrest than human researchers are.
Keywords: Natural; language; processing (search for similar items in EconPapers)
JEL-codes: C8 N45 (search for similar items in EconPapers)
Date: 2024-09
References: Add references at CitEc
Citations:
Downloads: (external link)
https://cepr.org/publications/DP19517 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cpr:ceprdp:19517
Ordering information: This working paper can be ordered from
https://cepr.org/publications/DP19517
Access Statistics for this paper
More papers in CEPR Discussion Papers from Centre for Economic Policy Research 33 Great Sutton Street, London EC1V 0DX, UK.
Bibliographic data for series maintained by CEPR ().