On the law of Zipf‐Mandelbrot for multi‐word phrases
L. Egghe
Journal of the American Society for Information Science, 1999, vol. 50, issue 3, 233-241
Abstract:
This article studies the probabilities of the occurrence of multi‐word (m‐word) phrases (m = 2,3,…) in relation to the probabilities of occurrence of the single words. It is well known that, in the latter case, the law of Zipf is valid (i.e., a power law). We prove that in the case of m‐word phrases (m ≥ 2), this is not the case. We present two independent proofs of this. We furthermore show that, in case we want to approximate the found distribution by Zipf's law, we obtain exponents βm in this power law for which the sequence (βm)m∈N is strictly decreasing. This explains experimental findings of Smith and Devine (1985), Hilberg (1988), and Meyer (1989a,b). Our results should be compared with a heuristic finding of Rousseau who states that the law of Zipf‐Mandelbrot is valid for multi‐word phrases. He, however, uses other—less classical—assumptions than we do.
Date: 1999
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(1999)50:33.0.CO;2-8
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:50:y:1999:i:3:p:233-241
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().