EconPapers    
Economics at your fingertips  
 

A model for word clustering

James A. Thom and Justin Zobel

Journal of the American Society for Information Science, 1992, vol. 43, issue 9, 616-627

Abstract: It is common to model the distribution of words in text by measures such as the Poisson approximation. However, these measures ignore effects such as clustering: our analysis of document collections demonstrates that the Poisson approximation can significantly overestimate the probability that a document contains a word. Based on our analysis, we propose a new model for distribution of words in text, and show how this model can be used to estimate the probability that a document contains a word and the number of distinct words in a document. © 1992 John Wiley & Sons, Inc.

Date: 1992
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199210)43:93.0.CO;2-A

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:43:y:1992:i:9:p:616-627

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571

Access Statistics for this article

More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamest:v:43:y:1992:i:9:p:616-627