An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection

Tian, Yexin; Xu, Shuo; Cao, Yuchen; Wang, Zhongyan; Wei, Zijing

An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection

Yexin Tian, Shuo Xu, Yuchen Cao, Zhongyan Wang and Zijing Wei ()
Additional contact information
Yexin Tian: College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
Shuo Xu: Computer Science & Engineering Department, University of California San Diego, La Jolla, CA 92093, USA
Yuchen Cao: Khoury College of Computer Science, Northeastern University, Seattle, WA 98109, USA
Zhongyan Wang: Center of Data Science, New York University, New York, NY 10011, USA
Zijing Wei: College of Liberal Arts & Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Mathematics, 2025, vol. 13, issue 13, 1-24

Abstract: Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention to interpretability and computational tradeoffs—remain underexplored. In this study, we systematically evaluate the mathematical foundations and empirical performance of five representative models for automated fake news classification: three classical machine learning algorithms (Logistic Regression, Random Forest, and Light Gradient Boosting Machine) and two state-of-the-art deep learning architectures (A Lite Bidirectional Encoder Representations from Transformers—ALBERT and Gated Recurrent Units—GRUs). Leveraging the large-scale WELFake dataset, we conduct rigorous experiments under both headline-only and headline-plus-content input scenarios, providing a comprehensive assessment of each model’s capability to capture linguistic, contextual, and semantic cues. We analyze each model’s optimization framework, decision boundaries, and feature importance mechanisms, highlighting the empirical tradeoffs between representational capacity, generalization, and interpretability. Our results show that transformer-based models, especially ALBERT, achieve state-of-the-art performance (macro F1 up to 0.99) with rich context, while classical ensembles remain viable for constrained settings. These findings directly inform practical fake news detection.

Keywords: fake news detection; natural language processing; machine learning; deep learning; text classification; model interpretability (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/13/2086/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/13/2086/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:13:p:2086-:d:1686839

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().