Document summarisation based on sentence ranking using vector space model
Namita Gupta,
P.C. Saxena and
J.P. Gupta
International Journal of Data Mining, Modelling and Management, 2013, vol. 5, issue 4, 380-406
Abstract:
WWW is a repository of large collection of information available in the form of unstructured documents. Therefore, the identification of documents of interest from such a huge pool of documents is very challenging. Text summarisation technique is used in information retrieval for searching document in lesser time. Ranking of documents is made based on the summary or the abstract provided by the authors of the document which is not always possible as not all documents come with an abstract or summary. Also, when different summarisation tools are used to summarise the document, not all the topics covered within the document are reflected in its summary. In this paper, we propose a method to automate the process of text document summarisation based on the term frequency within the document at different levels - paragraph and sentence. To summarise the document, similarity between the paragraphs and sentences within the paragraph is considered using vector space model. Our proposed system evaluation on the standard reference corpus from DUC-2002 using the ROUGE package indicates comparable avg. recall, avg. precision and avg. F-measure to existing summarisation tools - Copernic, SweSum, Extractor, MSWord AutoSummariser, Intelligent, Brevity, Pertinence taking DUC-2002 (100 words) human summary as baseline summary.
Keywords: extract summary; information retrieval; recall-oriented understudy for gisting evaluation; ROUGE tool; text summarisation; vector space model; VSM; document summarisation; sentence ranking; unstructured documents; term frequency; paragraphs; sentences. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=57680 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:5:y:2013:i:4:p:380-406
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().