Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora
Alexander Sboev (),
Roman Rybka,
Anton Selivanov,
Ivan Moloshnikov,
Artem Gryaznov,
Alexander Naumov,
Sanna Sboeva,
Gleb Rylkov and
Soyora Zakirova
Additional contact information
Alexander Sboev: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Roman Rybka: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Anton Selivanov: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Ivan Moloshnikov: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Artem Gryaznov: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Alexander Naumov: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Sanna Sboeva: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Gleb Rylkov: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Soyora Zakirova: Complex of NBICS Technology, National Research Center “Kurchatov Institute”, Academic Kurchatov sq., 123182 Moscow, Russia
Mathematics, 2023, vol. 11, issue 2, 1-23
Abstract:
An extraction of significant information from Internet sources is an important task of pharmacovigilance due to the need for post-clinical drugs monitoring. This research considers the task of end-to-end recognition of pharmaceutically significant named entities and their relations in texts in natural language. The meaning of “end-to-end” is that both of the tasks are performed within a single process on the “raw” text without annotation. The study is based on the current version of the Russian Drug Review Corpus—a dataset of 3800 review texts from the Russian segment of the Internet. Currently, this is the only corpus in the Russian language appropriate for research of the mentioned type. We estimated the accuracy of the recognition of the pharmaceutically significant entities and their relations in two approaches based on neural-network language models. The first core approach is to sequentially solve tasks of named-entities recognition and relation extraction (the sequential approach). The second one solves both tasks simultaneously with a single neural network (the joint approach). The study includes a comparison of both approaches, along with the hyperparameters selection to maximize resulting accuracy. It is shown that both approaches solve the target task at the same level of accuracy: 52–53% macro-averaged F 1 - s c o r e , which is the current level of accuracy for “end-to-end” tasks on the Russian language. Additionally, the paper presents the results for English open datasets ADE and DDI based on the joint approach, and hyperparameter selection for the modern domain-specific language models. The result is that the achieved accuracies of 84.2% (ADE) and 73.3% (DDI) are comparable or better than other published results for the datasets.
Keywords: Russian Drug Review Corpus; deep learning; language models; named-entity recognition; relation extraction; joint model; natural language processing; pharmacovigilance; DDI; ADE (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/2/354/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/2/354/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:2:p:354-:d:1030251
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().