EconPapers    
Economics at your fingertips  
 

General type-token distribution

S. Hidaka

Biometrika, 2014, vol. 101, issue 4, 999-1002

Abstract: We consider the problem of estimating the number of types in a corpus using the number of types observed in a sample of tokens from that corpus. We derive exact and asymptotic distributions for the number of observed types, conditioned on the number of tokens and the latent type distribution. We use the asymptotic distributions to derive an estimator of the latent number of types and validate this estimator numerically.

Date: 2014
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1093/biomet/asu035 (application/pdf)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:oup:biomet:v:101:y:2014:i:4:p:999-1002.

Ordering information: This journal article can be ordered from
https://academic.oup.com/journals

Access Statistics for this article

Biometrika is currently edited by Paul Fearnhead

More articles in Biometrika from Biometrika Trust Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK.
Bibliographic data for series maintained by Oxford University Press ().

 
Page updated 2025-03-19
Handle: RePEc:oup:biomet:v:101:y:2014:i:4:p:999-1002.