Principal component conditional generative adversarial networks for imbalanced ECG classification enhancement

Tang, Chao

Principal component conditional generative adversarial networks for imbalanced ECG classification enhancement

Chao Tang

PLOS ONE, 2025, vol. 20, issue 8, 1-38

Abstract: With over a century of development, electrocardiogram (ECG) diagnostics has become the preferred tool for healthcare professionals in cardiovascular disease diagnosis and monitoring. As wearable devices and mobile monitoring technologies become widespread, ECG data are trending toward diversity and long-term collection, making traditional manual annotation methods inadequate for massive data analysis demands. This research addresses core challenges in ECG signal classification—extremely imbalanced data, significant individual physiological differences, and difficulties in long sequence fitting—by proposing a Principal Component Analysis-based Conditional Generative Adversarial Network (PCA-CGAN). Through in-depth analysis of ECG signal principal component distribution characteristics, we discovered that just a few principal components can explain over 90% of signal variance, revealing the inherent inefficiency and limitations of traditional complete waveform generation methods. Based on this theoretical foundation, we shift the data augmentation paradigm from generating surface waveforms to generating high information density principal component features, resolving waveform jitter and heterogeneity issues present in traditional methods. Simultaneously, we designed a two-stage conditional encoding-decoding architecture that builds category-independent feature spaces from early training stages, fundamentally breaking the feature space bias caused by the “Matthew effect” and effectively preventing majority classes from compressing minority class features during generation. Using the Transformer’s global attention mechanism, the model precisely captures key diagnostic features of various arrhythmias, maximizing inter-class differences while maintaining intra-class consistency. Experiments demonstrate that PCA-CGAN not only achieves stable convergence on a large-scale heterogeneous dataset comprising 43 patients for the first time but also resolves the “dilution effect” problem in data augmentation, avoiding the asymmetric phenomenon where Precision increases while Recall decreases. After data augmentation, the ResNet model’s average F1 score improved significantly, with particularly outstanding performance on rare categories such as atrial premature beats, far surpassing traditional methods like SigCWGAN and TD-GAN. This research redefines the objectives and methods of ECG signal generation from the theoretical perspectives of information entropy and feature manifolds, providing a systematic solution to data imbalance problems in the medical field while establishing a theoretical foundation for the application of ECG-assisted diagnostic systems in real clinical environments.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0330707 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 30707&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0330707

DOI: 10.1371/journal.pone.0330707

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().