EconPapers    
Economics at your fingertips  
 

Adapting measures of clumping strength to assess term‐term similarity

Abraham Bookstein, Vladimir Kulyukin, Timo Raita and John Nicholson

Journal of the American Society for Information Science and Technology, 2003, vol. 54, issue 7, 611-620

Abstract: Automated information retrieval relies heavily on statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated on a large text corpus built from an on‐line encyclopedia.

Date: 2003
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1002/asi.10249

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:54:y:2003:i:7:p:611-620

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:54:y:2003:i:7:p:611-620