EconPapers    
Economics at your fingertips  
 

Document summarisation based on sentence ranking using vector space model

Namita Gupta, P.C. Saxena and J.P. Gupta

International Journal of Data Mining, Modelling and Management, 2013, vol. 5, issue 4, 380-406

Abstract: WWW is a repository of large collection of information available in the form of unstructured documents. Therefore, the identification of documents of interest from such a huge pool of documents is very challenging. Text summarisation technique is used in information retrieval for searching document in lesser time. Ranking of documents is made based on the summary or the abstract provided by the authors of the document which is not always possible as not all documents come with an abstract or summary. Also, when different summarisation tools are used to summarise the document, not all the topics covered within the document are reflected in its summary. In this paper, we propose a method to automate the process of text document summarisation based on the term frequency within the document at different levels - paragraph and sentence. To summarise the document, similarity between the paragraphs and sentences within the paragraph is considered using vector space model. Our proposed system evaluation on the standard reference corpus from DUC-2002 using the ROUGE package indicates comparable avg. recall, avg. precision and avg. F-measure to existing summarisation tools - Copernic, SweSum, Extractor, MSWord AutoSummariser, Intelligent, Brevity, Pertinence taking DUC-2002 (100 words) human summary as baseline summary.

Keywords: extract summary; information retrieval; recall-oriented understudy for gisting evaluation; ROUGE tool; text summarisation; vector space model; VSM; document summarisation; sentence ranking; unstructured documents; term frequency; paragraphs; sentences. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=57680 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:5:y:2013:i:4:p:380-406

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijdmmm:v:5:y:2013:i:4:p:380-406