Quantifying Similarities: Oncology Documents from Google Bard and ChatGPT
Muhammad Shumail Naveed ()
Additional contact information
Muhammad Shumail Naveed: Department of Computer Science & Information Technology, University of Baluchistan, Quetta, Pakistan
International Journal of Innovations in Science & Technology, 2023, vol. 5, issue 4, 773-786
Abstract:
Large language models hold immense promise for the future of text generation. Google Bard and ChatGPT, two prominent large language models originating from different research laboratories, have been subjects of various studies since their introduction. Despite numerous perspectives explored in the studies, none has specifically delved into the analysis of the similarity between texts generated by these models within the same category. This study addresses this gap by comparing the document generation capabilities of Google Bard and ChatGPT. The analysis focuses on topic-wise comparable documents related to oncology. In this study, 50 oncology-related documents generated by Google Bard are juxtaposed with equivalent topic-wise documents produced by ChatGPT, utilizing both cosine similarity and Jaccard similarity for comparison. The analysis employed statistical tests including the Kolmogorov-Smirnov test, Shapiro-Wilk test, and the one-sample Wilcoxon signed-rank test. The findings revealed a significant level of resemblance among the documents generated by both models: cosine similarity (mean = 0.66, std. dev. = 0.11, min = 0.23, max = 0.80) and Jaccard similarity (mean = 0.88, std. dev. = 0.06, min = 0.7, max = 1.0). This suggests a probable commonality in their training datasets or sources of oncology-related information. The study also posited that the observed similarity could be attributed to the probabilistic nature of language models and the potential for overfitting during their training processes. This study stands out for offering a unique direction and outcomes that pave the way for further exploration in the domain of large language models.
Keywords: Large Language Models; ChatGPT; Google Bard; Cosine Similarity; Jaccard Similarity (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journal.50sea.com/index.php/IJIST/article/view/602/1185 (application/pdf)
https://journal.50sea.com/index.php/IJIST/article/view/602 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:abq:ijist1:v:5:y:2023:i:4:p:773-786
Access Statistics for this article
International Journal of Innovations in Science & Technology is currently edited by Prof. Dr. Syed Amer Mahmood
More articles in International Journal of Innovations in Science & Technology from 50sea
Bibliographic data for series maintained by Iqra Nazeer ().