EconPapers    
Economics at your fingertips  
 

Supporting Text Retrieval by Typographical Term Weighting

Lars Werner and Stefan Böttcher
Additional contact information
Lars Werner: University of Paderborn, Germany
Stefan Böttcher: University of Paderborn, Germany

International Journal of Intelligent Information Technologies (IJIIT), 2007, vol. 3, issue 2, 1-16

Abstract: Text documents stored in information systems usually consist of more information than the pure concatenation of words, i.e., they also contain typographic information. Because conventional text retrieval methods evaluate only the word frequency, they miss the in-formation provided by typography, e.g., regarding the importance of certain terms. In order to overcome this weakness, we present an approach which uses the typographical information of text documents and show how this improves the efficiency of text retrieval methods. Our approach uses weighting of typographic information in addition to term frequencies for separating relevant information in text documents from the noise. We have evaluated our approach on the basis of automated text classification algorithms. The results show that our weighting approach achieves very competitive classification results using at most 30% of the terms used by conventional approaches, which makes our approach significantly more efficient.

Date: 2007
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 4018/jiit.2007040101 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jiit00:v:3:y:2007:i:2:p:1-16

Access Statistics for this article

International Journal of Intelligent Information Technologies (IJIIT) is currently edited by Vijayan Sugumaran

More articles in International Journal of Intelligent Information Technologies (IJIIT) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jiit00:v:3:y:2007:i:2:p:1-16