Stylometric comparisons of human versus AI-generated creative writing

O’Sullivan, James

Stylometric comparisons of human versus AI-generated creative writing

James O’Sullivan ()
Additional contact information
James O’Sullivan: University College Cork

Humanities and Social Sciences Communications, 2025, vol. 12, issue 1, 1-6

Abstract: Abstract This study employs stylometry to investigate whether the creative writing styles of humans and large language models (LLMs) such as GPT-3.5, GPT-4, and Llama 70b can be distinguished through quantitative analysis. A balanced dataset of short stories composed in response to predefined narrative prompts forms the basis of the analysis. Burrows’ Delta, a widely used metric in computational literary studies, is applied to measure stylistic similarity and difference across texts. By focusing on the distribution of the most frequent words, Burrows’ Delta allows for comparison that is largely independent of content and instead sensitive to latent stylistic fingerprints. The methodology combines this measure with clustering techniques, including hierarchical clustering and multidimensional scaling, to visualise relationships between texts and to test whether human and machine-generated stories cohere into distinct groups. The results reveal clear and consistent stylistic distinctions. Human-authored texts form broader, more heterogeneous clusters, reflecting the diversity of individual expression, writing ability, and interpretive engagement with the prompts. In contrast, LLM outputs, while fluent and coherent, display a higher degree of stylistic uniformity, clustering tightly by model. GPT-4 demonstrates greater internal consistency than GPT-3.5, suggesting refinement in the stylistic coherence of newer systems, yet both remain distinguishable from human writing. Llama 70b shows similar uniform clustering behaviour. Occasional overlaps occur, particularly between GPT-3.5 and human texts, but these are rare and insufficient to erase the broader distinction between categories. The findings indicate that, despite rapid advances in generative AI and its growing capacity to simulate creativity, LLMs retain detectable stylistic signatures that separate them from human authors. The study contributes to debates about authenticity, authorship, and the scope of machine creativity by moving beyond subjective literary judgment towards quantitative evidence. Stylometric analysis confirms that LLM outputs remain statistically and stylistically identifiable as machine-generated.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41599-025-05986-3 Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:12:y:2025:i:1:d:10.1057_s41599-025-05986-3

Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about

DOI: 10.1057/s41599-025-05986-3

Access Statistics for this article

More articles in Humanities and Social Sciences Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().