EconPapers    
Economics at your fingertips  
 

Stylometry can reveal artificial intelligence authorship, but humans struggle: A comparison of human and seven large language models in Japanese

Wataru Zaitsu, Mingzhe Jin, Shunichi Ishihara, Satoru Tsuge and Mitsuyuki Inaba

PLOS ONE, 2025, vol. 20, issue 10, 1-18

Abstract: The purpose of this study was to estimate the artificial intelligence (AI) detection potential using stylometric analysis in Study 1 and examine the AI detection abilities of humans in Study 2. In Study 1, we compared 100 human-written public comments with 350 texts generated by seven large language models (LLMs) (ChatGPT [GPT-4o and o1], Claude3.5, Gemini, Microsoft Copilot, Llama3.1, and Perplexity) using multidimensional scaling (MDS) to visualize differences by focusing on three stylometric features (phrase patterns, part-of-speech bigrams, and unigrams of function words). In general, each stylometric feature can distinguish between LLM-generated and human-written texts. In particular, three integrated stylometric features achieved perfect discrimination on MDS dimensions. Interestingly, only Llama3.1 exhibited distinct characteristics compared with the other six LLMs. The random forest classifier also achieved 99.8% accuracy. In Study 2, we performed an online survey to assess the Japanese participants’ AI detection abilities by presenting LLM-generated and human-written texts, as used in Study 1. 403 participants tackled “AI or Human” judgment task and estimated their own confidence, revealing that overall human AI-detection ability was limited. Moreover, in our materials, more advanced ChatGPT(o1), plausibly reflecting relatively greater fluency and polish, tends to mislead the participants to believe “human-written” texts compared with ChatGPT(GPT-4o) and improves their confidence for their own judgments. Furthermore, an additional comment from the survey suggested that participants primarily relied on superficial impressions based on phraseology, expression, the ends of words, conjunctions, and punctuation marks in judgments. These findings have important implications for various scenarios, including public policy, education, and marketing, where the rapid and reliable detection of AI-generated content is increasing.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0335369 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 35369&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0335369

DOI: 10.1371/journal.pone.0335369

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-11-29
Handle: RePEc:plo:pone00:0335369