EconPapers    
Economics at your fingertips  
 

Zero-inflated beta distribution applied to word frequency and lexical dispersion in corpus linguistics

Brent Burch and Jesse Egbert

Journal of Applied Statistics, 2020, vol. 47, issue 2, 337-353

Abstract: Corpus linguistics is the study of language as expressed in a body of texts or documents. The relative frequency of a word within a text and the dispersion of the word across the collection of texts provide information about the word's prominence and diffusion, respectively. In practice, people tend to use a relatively small number of words in a language's inventory of words and thus a large number of words in the lexicon are rarely employed. The zero-inflated beta distribution enables one to model the relative frequency of a word in a text since some texts may not even contain the word under study. In this paper, the expectation of a word's prominence and dispersion are defined under the zero-inflated beta model. Estimates of a word's prominence and dispersion are computed for words in the British National Corpus 1994 (BNC), a 100 million word collection of written and spoken language of a wide range of British English. The relationship between a word's prominence and dispersion is discussed as well as measures that are functions of both prominence and dispersion.

Date: 2020
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/02664763.2019.1636941 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:japsta:v:47:y:2020:i:2:p:337-353

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/CJAS20

DOI: 10.1080/02664763.2019.1636941

Access Statistics for this article

Journal of Applied Statistics is currently edited by Robert Aykroyd

More articles in Journal of Applied Statistics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:japsta:v:47:y:2020:i:2:p:337-353