An Online Structured Political Event Dataset based on CAMEO Ontology
Sayeed Salam,
Patrick Brandt,
Vito D'Orazio,
Jennifer Holmes,
Javiar Osorio and
Latifur Khan
No vrt4a, SocArXiv from Center for Open Science
Abstract:
Political activities and interactions between different global entities are becoming growing field for data-intensive computing with a wide scope of research opportunities for both social science and computer science researchers. This research needs to be carried out at a local (limited to a particular region) and global scale, often divided in temporal manner. It is also useful to have the most recently updated dataset for relevant analysis. For these purposes, we need timestamped, geolocaated structured information about political interactions. Keeping this in mind, we develop a datatset that complies with Conflict and Mediation Event Observation (CAMEO) ontology inspired by the ”who-did-what-to- whom” format. We use a distributed framework for data collection and processing that works in real-time with Apache Kafka and SPARK in order to process a global collection of news data in different languages (i.e., Spanish, Arabic) and generate those structured event data in real-time. We also provide an API for easy access to the data. In this paper, we describe how the data is represented, collected, and processed, how we generate the most up-to-date dataset with dynamic ontology extension, and how to access the data and possible analytical problems that can be addressed by building a model on the dataset.
Date: 2020-03-20
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/5e722a270cd06c046c001ec7/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:vrt4a
DOI: 10.31219/osf.io/vrt4a
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().