Sentiment analysis of classical Chinese literature: An unsupervised deep learning model with BERT and graph attention networks
Xiaohan Yu and
Jin Wang
PLOS ONE, 2025, vol. 20, issue 9, 1-23
Abstract:
Sentiment analysis has become a transformative technology in various contexts, particularly in Natural Language Processing (NLP), social media analytics, and literary analysis, as it can extract information from a wide range of texts. The advancements in deep learning, particularly with transformer models such as BERT and graph-based models like GATs, have enabled faster progress in analyzing complex language structures. However, the issue lies in incorporating these technologies into classical Chinese literature, which involves delicate syntax, semantics, and emotions that are difficult to harness using traditional methods. The existing methods, which rely on strictly labeled data or unsupervised learning methods that do not effectively manage contextual dependencies, are very limited in analyzing historical or philosophical texts that abound in metaphor and implicit sentiment. To minimize the limitations, this paper proposes an unsupervised deep learning framework that integrates BERT embeddings, sentiment lexicon enrichment, and graph attention networks (GATs) for sentiment analysis in classical Chinese literature. Firstly, the BERT-based model extracts contextualised embeddings from a raw text, providing a deep understanding of semantics. Secondly, embedding includes sentiment-specific data from the NTUSD lexicon, thus injecting it with emotional information. Thirdly, a graph-based formulation is developed, in which words are represented as nodes, and the relations between them are defined using GATs to modify the features of nodes based on their significance in the context. Finally, unsupervised sentiment labelling, or K-Means clustering, is used to classify sentiment. The experimental results demonstrate the proposed model’s efficiency – an accuracy of 0.95, precision of 0.97, recall of 0.96, and F1-score of 0.91 in several runs. These results surpass those of the traditional approach, which includes SentiCNN, MLT-ML4, and BERT-LLSTM-DL, which achieve an accuracy score of 0.90 to 0.95. Additionally, the comparison with large-scale foundation models (such as ChatGPT-4o and DeepSeek R1) in zero-shot prompt-based classification further validates the domain-adapted advantage of our model in the classical Chinese text processing. These results demonstrate that the proposed model significantly enhances the handling of the intricate linguistic features and cultural nuances in classical Chinese texts, providing a robust solution for sentiment analysis in low-resource domains.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0330919 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 30919&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0330919
DOI: 10.1371/journal.pone.0330919
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().