On the Synergy of Optimizers and Activation Functions: A CNN Benchmarking Study

Sayın, Khuraman Aziz; Gürsoy, Necla Kırcalı; Yolcu, Türkay; Gürsoy, Arif

On the Synergy of Optimizers and Activation Functions: A CNN Benchmarking Study

Khuraman Aziz Sayın, Necla Kırcalı Gürsoy, Türkay Yolcu () and Arif Gürsoy
Additional contact information
Khuraman Aziz Sayın: Department of Mathematics, Ege University, Bornova, Izmir 35040, Türkiye
Necla Kırcalı Gürsoy: Department of Computer Programming, Ege University, Bornova, Izmir 35040, Türkiye
Türkay Yolcu: Department of Mathematics, Bradley University, Peoria, IL 61625, USA
Arif Gürsoy: Department of Mathematics, Ege University, Bornova, Izmir 35040, Türkiye

Mathematics, 2025, vol. 13, issue 13, 1-36

Abstract: In this study, we present a comparative analysis of gradient descent-based optimizers frequently used in Convolutional Neural Networks (CNNs), including SGD, mSGD, RMSprop, Adadelta, Nadam, Adamax, Adam, and the recent EVE optimizer. To explore the interaction between optimization strategies and activation functions, we systematically evaluate all combinations of these optimizers with four activation functions—ReLU, LeakyReLU, Tanh, and GELU—across three benchmark image classification datasets: CIFAR-10, Fashion-MNIST (F-MNIST), and Labeled Faces in the Wild (LFW). Each configuration was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, mean absolute error (MAE), and mean squared error (MSE). All experiments were performed using k -fold cross-validation to ensure statistical robustness. Additionally, two-way ANOVA was employed to validate the significance of differences across optimizer–activation combinations. This study aims to highlight the importance of jointly selecting optimizers and activation functions to enhance training dynamics and generalization in CNNs. We also consider the role of critical hyperparameters, such as learning rate and regularization methods, in influencing optimization stability. This work provides valuable insights into the optimizer–activation interplay and offers practical guidance for improving architectural and hyperparameter configurations in CNN-based deep learning models.

Keywords: Stochastic Gradient Descent; optimization; convolutional neural network; activation functions; image processing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/13/2088/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/13/2088/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:13:p:2088-:d:1687106

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().