EconPapers    
Economics at your fingertips  
 

Establishing Semantic Similarity of the Cluster Documents and Extracting Key Entities in the Problem of the Semantic Analysis of News Texts

Anastasia Soloshenko, Yulia Orlova, Vladimir Rozaliev and Alla Zaboleeva-Zotova

Modern Applied Science, 2015, vol. 9, issue 5, 246

Abstract: This paper is dedicated to the problem of establishing semantic similarity for the documents of the news cluster and extracting key entities from the article's text. The existing methods and algorithms for fuzzy duplicate detection texts are briefly reviewed and analysed, such as TF-IDF and its modifications, Long Sent, Megashingles and Log Shingles, and Lex Rand. The shingles algorithm essence and its main stages are described in detail. Several options of the parallel implementation for the shingles algorithm are presented- for multiprocessor heterogeneous computing systems using CUDA and Open CL and for distributed computing systems using Google App Engine. The parameters of the algorithm (operation time, acceleration) applied to the problem of the semantic analysis for news texts are assessed. In addition, the methods and algorithms for extracting key phrases from the news text are reviewed- graph methods, in particular TextRank, building horizontal visibility graphs, the Viterbi algorithm, types of Markov random fields method, as well as a comprehensive context-sensitive algorithm for news text analysis (a combination of statistical algorithms for extracting key words and algorithms for forming semantic coherence of the text blocks). These methods are analysed from the standpoint of applicability to the news articles analysis. Particular attention is paid to the peculiarities of the news text structure. Although the thematic classification and selection of key entities in text documents are powerful text processing tools, these stages of analysis cannot give a complete picture of the news piece semantics. The paper presents a methodology and a comprehensive analysis of news text, based on a combination of semantic analysis and subsequent text abstracting submitting it in a compressed format - so-called mind map.

Date: 2015
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://ccsenet.org/journal/index.php/mas/article/download/47313/25549 (application/pdf)
https://ccsenet.org/journal/index.php/mas/article/view/47313 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ibn:masjnl:v:9:y:2015:i:5:p:246

Access Statistics for this article

More articles in Modern Applied Science from Canadian Center of Science and Education Contact information at EDIRC.
Bibliographic data for series maintained by Canadian Center of Science and Education ().

 
Page updated 2025-03-19
Handle: RePEc:ibn:masjnl:v:9:y:2015:i:5:p:246