Automatically Annotating Topics in Transcripts of Patient-Provider Interactions via Machine Learning

Wallace, Byron C.; Laws, M. Barton; Small, Kevin; Wilson, Ira B.; Trikalinos, Thomas A.

Automatically Annotating Topics in Transcripts of Patient-Provider Interactions via Machine Learning

Byron C. Wallace, M. Barton Laws, Kevin Small, Ira B. Wilson and Thomas A. Trikalinos

Medical Decision Making, 2014, vol. 34, issue 4, 503-512

Abstract: Background. Annotated patient-provider encounters can provide important insights into clinical communication, ultimately suggesting how it might be improved to effect better health outcomes. But annotating outpatient transcripts with Roter or General Medical Interaction Analysis System (GMIAS) codes is expensive, limiting the scope of such analyses. We propose automatically annotating transcripts of patient-provider interactions with topic codes via machine learning. Methods. We use a conditional random field (CRF) to model utterance topic probabilities. The model accounts for the sequential structure of conversations and the words comprising utterances. We assess predictive performance via 10-fold cross-validation over GMIAS-annotated transcripts of 360 outpatient visits (>230,000 utterances). We then use automated in place of manual annotations to reproduce an analysis of 116 additional visits from a randomized trial that used GMIAS to assess the efficacy of an intervention aimed at improving communication around antiretroviral (ARV) adherence. Results. With respect to 6 topic codes, the CRF achieved a mean pairwise kappa compared with human annotators of 0.49 (range: 0.47â€“0.53) and a mean overall accuracy of 0.64 (range: 0.62â€“0.66). With respect to the RCT reanalysis, results using automated annotations agreed with those obtained using manual ones. According to the manual annotations, the median number of ARV-related utterances without and with the intervention was 49.5 versus 76, respectively (paired sign test P = 0.07). When automated annotations were used, the respective numbers were 39 versus 55 ( P = 0.04). While moderately accurate, the predicted annotations are far from perfect. Conversational topics are intermediate outcomes, and their utility is still being researched. Conclusions. This foray into automated topic inference suggests that machine learning methods can classify utterances comprising patient-provider interactions into clinically relevant topics with reasonable accuracy.

Keywords: machine learning; natural language processing; speech acts; patient-provider interaction; CRF; communication; informatics (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/0272989X13514777 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:medema:v:34:y:2014:i:4:p:503-512

DOI: 10.1177/0272989X13514777

Access Statistics for this article

More articles in Medical Decision Making
Bibliographic data for series maintained by SAGE Publications ().