EconPapers    
Economics at your fingertips  
 

A French Corpus for Event Detection on Twitter

Béatrice Mazoyer (), Julia Cagé, Nicolas Hervé and Céline Hudelot ()
Additional contact information
Béatrice Mazoyer: médialab - médialab (Sciences Po) - Sciences Po - Sciences Po
Céline Hudelot: MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay

Post-Print from HAL

Abstract: We present Event2018, a corpus annotated for event detection tasks, consisting of 38 million tweets in French (retweets excluded) including more than 130,000 tweets manually annotated by three annotators as related or unrelated to a given event. The 257 events were selected both from press articles and from subjects trending on Twitter during the annotation period (July to August 2018). In total, more than 95,000 tweets were annotated as related to one of the selected events. We also provide the titles and URLs of 15,500 news articles automatically detected as related to these events. In addition to this corpus, we detail the results of our event detection experiments on both this dataset and another publicly available dataset of tweets in English. We ran extensive tests with different types of text embeddings and a standard Topic Detection and Tracking algorithm, and detail our evaluation method. We show that tf-idf vectors allow the best performance for this task on both corpora. These results are intended to serve as a baseline for researchers wishing to test their own event detection systems on our corpus.

Keywords: Natural Language Processing; Twitter; Topic Detection and Tracking; Event detection; Dataset; French (search for similar items in EconPapers)
Date: 2020
Note: View the original document on HAL open archive server: https://sciencespo.hal.science/hal-03947820
References: View complete reference list from CitEc
Citations:

Published in Twelfth Language Resources and Evaluation Conference, 2020, Marseille, France. pp.6220-6227

Downloads: (external link)
https://sciencespo.hal.science/hal-03947820/document (application/pdf)

Related works:
Working Paper: A French Corpus for Event Detection on Twitter (2020) Downloads
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-03947820

Access Statistics for this paper

More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().

 
Page updated 2025-03-31
Handle: RePEc:hal:journl:hal-03947820