CatMapper: A user-friendly tool for integrating data across complex categories
Daniel Hruschka,
Robert Bischoff,
Matthew Peeples,
I-Han Hsiao and
Mohamed Sarwat
No n6rty, SocArXiv from Center for Open Science
Abstract:
We introduce CatMapper (catmapper.org), a set of user-friendly, web-based tools designed to help researchers overcome a common bottleneck in comparative research—integrating data across diverse datasets by complex categories (e.g., ethnicities, languages, religions, archaeological artifact types) that are often encoded very differently from dataset to dataset. We illustrate CatMapper's planned architecture and capabilities with the SocioMap tool (catmapper.org/sociomap) which focuses on four inter-related domains—ethnicities (>9000), religions (>1000), districts (> 200,000), and languages, language families and dialects (>25,000). Categories in these diverse domains share commonalities that make them challenging to work with, including large numbers of categories at multiple nested scales that can also change through time. To assist users in merging data by these categories, SocioMap will include four core functions: (1) explore contextual information about specific categories, (2) translate new sets of categories from existing datasets and published studies, (3) integrate novel combinations of datasets for researchers’ custom analysis needs, including automatically generated syntax (e.g., R, Stata) to merge datasets of interest, and (4) share merging templates for public re-use and open science. We outline current progress on the development of CatMapper/SocioMap, plans for future development, and potential expansion to other domains, such as artifact types in archaeology and material goods used in asset-based wealth indices.
Date: 2022-01-30
References: Add references at CitEc
Citations:
Downloads: (external link)
https://osf.io/download/61f581b5102f9203e7b17ffd/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:n6rty
DOI: 10.31219/osf.io/n6rty
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().