EconPapers    
Economics at your fingertips  
 

Sparse Regularized Autoencoders-Based Radiomics Data Augmentation for Improved EGFR Mutation Prediction in NSCLC

Muhammad Asif Munir (), Reehan Ali Shah (), Urooj Waheed, Muhammad Aqeel Aslam, Zeeshan Rashid, Mohammed Aman, Muhammad I. Masud and Zeeshan Ahmad Arfeen
Additional contact information
Muhammad Asif Munir: Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
Reehan Ali Shah: Department of Computer Science, Shaheed Benazir Bhutto University, SBA (SBBU-SBA), Nawabshah 67450, Pakistan
Urooj Waheed: Department of Computer Science, DHA Suffa University, Karachi 75500, Pakistan
Muhammad Aqeel Aslam: Department of Electrical Engineering, GIFT University, Gujranwala 52250, Pakistan
Zeeshan Rashid: Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
Mohammed Aman: Department of Industrial Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia
Muhammad I. Masud: Department of Electrical Engineering, College of Engineering, University of Business and Technology, Jeddah 21361, Saudi Arabia
Zeeshan Ahmad Arfeen: Department of Electrical Engineering, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan

Future Internet, 2025, vol. 17, issue 11, 1-23

Abstract: Lung cancer (LC) remains a leading cause of cancer mortality worldwide, where accurate and early identification of gene mutations such as epidermal growth factor receptor (EGFR) is critical for precision treatment. However, machine learning-based radiomics approaches often face challenges due to the small and imbalanced nature of the datasets. This study proposes a comprehensive framework based on Generic Sparse Regularized Autoencoders with Kullback–Leibler divergence (GSRA-KL) to generate high-quality synthetic radiomics data and overcome these limitations. A systematic approach generated 63 synthetic radiomics datasets by tuning a novel kl_weight regularization hyperparameter across three hidden-layer sizes, optimized using Optuna for computational efficiency. A rigorous assessment was conducted to evaluate the impact of hyperparameter tuning across 63 synthetic datasets, with a focus on the EGFR gene mutation. This evaluation utilized resemblance-dimension scores (RDS), novel utility-dimension scores (UDS), and t-SNE visualizations to ensure the validation of data quality, revealing that GSRA-KL achieves excellent performance (RDS > 0.45, UDS > 0.7), especially when class distribution is balanced, while remaining competitive with the Tabular Variational Autoencoder (TVAE). Additionally, a comprehensive statistical correlation analysis demonstrated strong and significant monotonic relationships among resemblance-based performance metrics up to moderate scaling (≤1.0*), confirming the robustness and stability of inter-metric associations under varying configurations. Complementary computational cost evaluation further indicated that moderate kl_weight values yield an optimal balance between reconstruction accuracy and resource utilization, with Spearman correlations revealing improved reconstruction quality (MSE ρ = − 0.78 , p < 0.001 ) at reduced computational overhead. The ablation-style analysis confirmed that including the KL divergence term meaningfully enhances the generative capacity of GSRA-KL over its baseline counterpart. Furthermore, the GSRA-KL framework achieved substantial improvements in computational efficiency compared to prior PSO-based optimization methods, resulting in reduced memory usage and training time. Overall, GSRA-KL represents an incremental yet practical advancement for augmenting small and imbalanced high-dimensional radiomics datasets, showing promise for improved mutation prediction and downstream precision oncology studies.

Keywords: autoencoders; data augmentation; EGFR mutation; NSCLC; radiomics; synthetic data generation (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/17/11/495/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/11/495/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:11:p:495-:d:1782043

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-10-30
Handle: RePEc:gam:jftint:v:17:y:2025:i:11:p:495-:d:1782043