Assessing the accuracy and consistency of answers by ChatGPT to questions regarding carbon monoxide poisoning

Qiu, Jun; Zhou, Youlian

Assessing the accuracy and consistency of answers by ChatGPT to questions regarding carbon monoxide poisoning

Jun Qiu and Youlian Zhou

PLOS ONE, 2024, vol. 19, issue 11, 1-10

Abstract: Background: ChatGPT, developed by OpenAI, is an artificial intelligence software designed to generate text-based responses. The objective of this study is to evaluate the accuracy and consistency of ChatGPT’s responses to single-choice questions pertaining to carbon monoxide poisoning. This evaluation will contribute to our understanding of the reliability of ChatGPT-generated information in the medical field. Methods: The questions utilized in this study were selected from the "Medical Exam Assistant (Yi Kao Bang)" application and encompassed a range of topics related to carbon monoxide poisoning. A total of 44 single-choice questions were included in the study following a screening process. Each question was entered into ChatGPT ten times in Chinese, followed by a translation into English, where it was also entered ten times. The responses generated by ChatGPT were subjected to statistical analysis with the objective of assessing their accuracy and consistency in both languages. In this assessment process, the "Medical Exam Assistant (Yi Kao Bang)" reference responses were employed as benchmarks. The data analysis was conducted using the Python. Results: In approximately 50% of the cases, the responses generated by ChatGPT exhibited a high degree of consistency, whereas in approximately one-third of the cases, the responses exhibited unacceptable blurring of the answers. Meanwhile, the accuracy of these responses was less favorable, with an accuracy rate of 61.1% in Chinese and 57% in English. This indicates that ChatGPT could be enhanced with respect to both consistency and accuracy in responding to queries pertaining to carbon monoxide poisoning. Conclusions: It is currently evident that the consistency and accuracy of responses generated by ChatGPT regarding carbon monoxide poisoning is inadequate. Although it offers significant insights, it should not supersede the role of healthcare professionals in making clinical decisions.

Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0311937 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 11937&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0311937

DOI: 10.1371/journal.pone.0311937

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().