Finding story chains in newswire articles using random walks
Xianshu Zhu () and
Tim Oates ()
Additional contact information
Xianshu Zhu: University of Maryland, Baltimore County
Tim Oates: University of Maryland, Baltimore County
Information Systems Frontiers, 2014, vol. 16, issue 5, No 2, 753-769
Abstract:
Abstract Massive amounts of information about news events are published on the Internet every day in online newspapers, blogs, and social network messages. While search engines like Google help retrieve information using keywords, the large volumes of unstructured search results returned by search engines make it hard to track the evolution of an event. A story chain is composed of a set of news articles that reveal hidden relationships among different events. Traditional keyword-based search engines provide limited support for finding story chains. In this paper, we propose a random walk based algorithm to find story chains. When breaking news happens, many media outlets report the same event. We have two pruning mechanisms in the algorithm to automatically exclude redundant articles from the story chain and to ensure efficiency of the algorithm. We further explore how named entities and word relevance can help find relevant news articles and improve algorithm efficiency by creating a co-clustering based correlation graph. Experimental results show that our proposed algorithm can generate coherent story chains without redundancy. The efficiency of the algorithm is significantly improved on the correlation graph.
Keywords: Information overload; Random walk; Named entities (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://link.springer.com/10.1007/s10796-013-9420-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:infosf:v:16:y:2014:i:5:d:10.1007_s10796-013-9420-2
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10796
DOI: 10.1007/s10796-013-9420-2
Access Statistics for this article
Information Systems Frontiers is currently edited by Ram Ramesh and Raghav Rao
More articles in Information Systems Frontiers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().