Similar Text Fragments Extraction for Identifying Common Wikipedia Communities
Svitlana Petrasova,
Nina Khairova,
Włodzimierz Lewoniewski,
Orken Mamyrbayev and
Kuralay Mukhsina
Additional contact information
Svitlana Petrasova: Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
Nina Khairova: Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
Włodzimierz Lewoniewski: Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland
Orken Mamyrbayev: Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
Kuralay Mukhsina: Department of Informatics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
Data, 2018, vol. 3, issue 4, 1-9
Abstract:
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.
Keywords: information extraction; short text fragment similarity; Wikipedia communities; NLP (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/3/4/66/pdf (application/pdf)
https://www.mdpi.com/2306-5729/3/4/66/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:3:y:2018:i:4:p:66-:d:190245
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().