Multi-Domain Controversial Text Detection Based on a Machine Learning and Deep Learning Stacked Ensemble

Liu, Jiadi; Liu, Zhuodong; Li, Qiaoqi; Kong, Weihao; Li, Xiangyu

Multi-Domain Controversial Text Detection Based on a Machine Learning and Deep Learning Stacked Ensemble

Jiadi Liu, Zhuodong Liu, Qiaoqi Li, Weihao Kong and Xiangyu Li ()
Additional contact information
Jiadi Liu: School of Architecture and Design, Beijing Jiaotong University, Beijing 100044, China
Zhuodong Liu: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Qiaoqi Li: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Weihao Kong: School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Xiangyu Li: Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

Mathematics, 2025, vol. 13, issue 9, 1-25

Abstract: Due to the rapid proliferation of social media and online reviews, the accurate identification and classification of controversial texts has emerged as a significant challenge in the field of natural language processing. However, traditional text-classification methodologies frequently encounter critical limitations, such as feature sensitivity and inadequate generalization capabilities. This results in a notably suboptimal performance when confronted with diverse controversial content. To address these substantial limitations, this paper proposes a novel controversial text-detection framework based on stacked ensemble learning to enhance the accuracy and robustness of text classification. Firstly, considering the multidimensional complexity of textual features, we integrate comprehensive feature engineering, i.e., encompassing word frequency, statistical metrics, sentiment analysis, and comment tree structure features, as well as advanced feature selection methodologies, particularly lassonet, i.e., a neural network with feature sparsity, to effectively address dimensionality challenges while enhancing model interpretability and computational efficiency. Secondly, we design a two-tier stacked ensemble architecture, which not only combines the strengths of multiple machine learning algorithms, e.g., gradient-boosted decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost), with deep learning models, e.g., gated recurrent unit (GRU) and long short-term memory (LSTM), but also implements the support vector machine (SVM) for efficient meta-learning. Furthermore, we systematically compare three hyperparameter optimization algorithms, including the sparrow search algorithm (SSA), particle swarm optimization (PSO), and Bayesian optimization (BO). The experimental results demonstrate that the SSA exhibits a superior performance in exploring high-dimensional parameter spaces. Extensive experimentation across diverse topics and domains also confirms that our proposed methodology significantly outperforms the state-of-the-art approaches.

Keywords: controversial text detection; machine learning; deep learning; ensemble learning; hyperparameter optimization; feature engineering (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/9/1529/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/9/1529/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:9:p:1529-:d:1650216

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().