EconPapers    
Economics at your fingertips  
 

Statistical inference for natural language processing algorithms with a demonstration using type 2 diabetes prediction from electronic health record notes

Brian L. Egleston, Tian Bai, Richard J. Bleicher, Stanford J. Taylor, Michael H. Lutz and Slobodan Vucetic

Biometrics, 2021, vol. 77, issue 3, 1089-1100

Abstract: The pointwise mutual information statistic (PMI), which measures how often two words occur together in a document corpus, is a cornerstone of recently proposed popular natural language processing algorithms such as word2vec. PMI and word2vec reveal semantic relationships between words and can be helpful in a range of applications such as document indexing, topic analysis, or document categorization. We use probability theory to demonstrate the relationship between PMI and word2vec. We use the theoretical results to demonstrate how the PMI can be modeled and estimated in a simple and straight forward manner. We further describe how one can obtain standard error estimates that account for within‐patient clustering that arises from patterns of repeated words within a patient's health record due to a unique health history. We then demonstrate the usefulness of PMI on the problem of predictive identification of disease from free text notes of electronic health records. Specifically, we use our methods to distinguish those with and without type 2 diabetes mellitus in electronic health record free text data using over 400 000 clinical notes from an academic medical center.

Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/biom.13338

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:biomet:v:77:y:2021:i:3:p:1089-1100

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0006-341X

Access Statistics for this article

More articles in Biometrics from The International Biometric Society
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:biomet:v:77:y:2021:i:3:p:1089-1100