A Cross-verified Database of Notable People, 3500BC-2018AD
Etienne Wasmer,
Morgane Laouenan,
Palaash Bhargava,
Jean Benoit Eymeoud and
Guillaume Plique
Authors registered in the RePEc Author Service: Olivier Gergaud
No 15852, CEPR Discussion Papers from C.E.P.R. Discussion Papers
Abstract:
We add to the literature on notable individuals (famous, prominent, distinguished) in collecting first a massive amount of data from various editions of Wikipedia and Wikidata along with deduplication techniques; and then using these partially overlapping sources to cross-verify each retrieved information. This strategy results in a cross-verified database of 2.2 million individuals, including a third who are not present in the English edition of Wikipedia. An extension to 4.7 million entries is currently not recommended given the inaccuracy of the information and discrepancies between Wikidata and other sources. A non-negligible fraction of newly-added individuals were collected from non-English editions of Wikipedia. We adopt a social science approach: data collection is driven by specific social questions on gender, economic and cul- tural development and quantitative exploration of cultural trends, that we document in this paper. A sample of 100,000 individuals is available here http://medialab.github.io/bhht-datascape, together with the most recent version of this paper.
Keywords: Notable individuals; Creative class; Urban economics; Economic history (search for similar items in EconPapers)
JEL-codes: N01 N9 R00 (search for similar items in EconPapers)
Date: 2021-02
New Economics Papers: this item is included in nep-evo and nep-his
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://cepr.org/publications/DP15852 (application/pdf)
CEPR Discussion Papers are free to download for our researchers, subscribers and members. If you fall into one of these categories but have trouble downloading our papers, please contact us at subscribers@cepr.org
Related works:
Working Paper: A cross-verified database of notable people, 3500BC-2018AD (2022) 
Working Paper: A cross-verified database of notable people, 3500BC-2018AD (2022) 
Working Paper: A cross-verified database of notable people, 3500BC-2018AD (2022) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cpr:ceprdp:15852
Ordering information: This working paper can be ordered from
https://cepr.org/publications/DP15852
Access Statistics for this paper
More papers in CEPR Discussion Papers from C.E.P.R. Discussion Papers Centre for Economic Policy Research, 33 Great Sutton Street, London EC1V 0DX.
Bibliographic data for series maintained by ().