Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

Gattepaille, Lucie M.; Vidlin, Sara Hedfors; Bergvall, Tomas; Pierce, Carrie E.; Ellenius, Johan

Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project

Lucie M. Gattepaille (), Sara Hedfors Vidlin, Tomas Bergvall, Carrie E. Pierce and Johan Ellenius
Additional contact information
Lucie M. Gattepaille: Uppsala Monitoring Centre
Sara Hedfors Vidlin: Uppsala Monitoring Centre
Tomas Bergvall: Uppsala Monitoring Centre
Carrie E. Pierce: Uppsala Monitoring Centre
Johan Ellenius: Uppsala Monitoring Centre

Drug Safety, 2020, vol. 43, issue 8, No 11, 797-808

Abstract: Abstract Introduction A large number of studies on systems to detect and sometimes normalize adverse events (AEs) in social media have been published, but evidence of their practical utility is scarce. This raises the question of the transferability of such systems to new settings. Objectives The aims of this study were to develop an AE recognition system, prospectively evaluate its performance on an external benchmark dataset and identify potential factors influencing the transferability of AE recognition systems. Methods A pipeline based on dictionary lookups and logistic regression classifiers was developed using a proprietary dataset of 196,533 Tweets manually annotated for AE relations and prospectively evaluated the system on the publicly available WEB-RADR reference dataset, exploring different aspects affecting transferability. Results Our system achieved 0.53 precision, 0.52 recall and 0.52 F1-score on the development test set; however, when applied to the WEB-RADR reference dataset, system performance dropped to 0.38 precision, 0.20 recall and 0.26 F1-score. Similarly, a previously published method aiming at automatically detecting adverse event posts reported 0.5 precision, 0.92 recall and 0.65 F1-score on thus another dataset, while performance on the WEB-RADR reference dataset was reduced to 0.37 precision, 0.63 recall and 0.46 F1-score. We identified four potential factors leading to poor transferability: overfitting, selection bias, label bias and prevalence. Conclusion We warn the community about a potentially large discrepancy between the expected performance of automated AE recognition systems based on published results and the actual observed performance on independent data. This study highlights the difficulty of implementing an all-purpose system for automatic adverse event recognition in Twitter, which could explain the lack of such systems in practical pharmacovigilance settings. Our recommendation is to use benchmark independent datasets, such as the WEB-RADR reference, to investigate the transferability of the adverse event recognition systems and ultimately enforce rigorous comparisons across studies on the task.

Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40264-020-00942-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:drugsa:v:43:y:2020:i:8:d:10.1007_s40264-020-00942-3

Ordering information: This journal article can be ordered from
http://www.springer.com/adis/journal/40264

DOI: 10.1007/s40264-020-00942-3

Access Statistics for this article

Drug Safety is currently edited by Nitin Joshi

More articles in Drug Safety from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().