EconPapers    
Economics at your fingertips  
 

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

Svitlana Petrasova, Nina Khairova, Włodzimierz Lewoniewski, Orken Mamyrbayev and Kuralay Mukhsina
Additional contact information
Svitlana Petrasova: Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
Nina Khairova: Department of Intelligent Computer Systems, National Technical University “Kharkiv Polytechnic Institute”, 61002 Kharkiv, Ukraine
Włodzimierz Lewoniewski: Department of Information Systems, Poznan University of Economics and Business, 61-875 Poznan, Poland
Orken Mamyrbayev: Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
Kuralay Mukhsina: Department of Informatics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

Data, 2018, vol. 3, issue 4, 1-9

Abstract: Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities.

Keywords: information extraction; short text fragment similarity; Wikipedia communities; NLP (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/3/4/66/pdf (application/pdf)
https://www.mdpi.com/2306-5729/3/4/66/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:3:y:2018:i:4:p:66-:d:190245

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:3:y:2018:i:4:p:66-:d:190245