Inference and training efficiency in pruned multilayer perceptron networks

Zenoozi, Amirhossein Douzandeh; Erhan, Laura; Liotta, Antonio; Cavallaro, Lucia

Inference and training efficiency in pruned multilayer perceptron networks

Amirhossein Douzandeh Zenoozi, Laura Erhan, Antonio Liotta and Lucia Cavallaro

PLOS Complex Systems, 2026, vol. 3, issue 3, 1-28

Abstract: This study explores how pruning strategies can improve the efficiency of deep neural networks (DNNs), which are widely used for tasks like image processing, medical diagnosis, etc. Although DNNs are powerful, they often contain weaker connections that can lead to increased energy consumption both during training and inference. To address this, we compare two pruning approaches: global pruning, which applies to all layers of the network, and layer-wise pruning, which focuses on the hidden layers. These approaches are tested across two MLP models, small-scale and medium-scale, and are then extended to a VGG-16 model as a representative example of Convolutional Neural Networks (CNNs). We evaluate the impact of pruning on five datasets (MNIST, FashionMNIST, EMNIST, CIFAR-10, and OctMNIST), and considering different sparsity levels (50% and 80%). Our results show that, in comparison to the benchmark dense networks (0% sparsity), layer-wise pruning offers the best trade-offs, by consistently reducing inference time and inference energy usage while maintaining accuracy. For example, training the small-scale model with the MNIST dataset and 50% sparsity led to a 33% reduction in inference energy usage, 33% in inference time, and only a negligible 0.49% decrease in accuracy. Furthermore, we investigate training energy consumption, CO2 emissions estimations, and peak memory usage, which again leads to choosing the layer-wise approach over global pruning. Overall, our findings suggest that layer-wise pruning is a practical approach for designing energy-efficient neural networks, particularly in achieving efficient trade-offs between performance and energy consumption.Author summary: In this work, we explore how to make deep learning models more efficient by removing the weaker connections, through a method known as “pruning”. These models are widely used in everyday applications, from medical tools to smart devices. However, the energy consumption of dense “unpruned” models, while necessary for their configuration, is often suboptimal. The large number of connections in a dense model requires more energy to process inputs and compute the best output class. By reducing the number of connections, pruning strives for maintaining the model performance close to that of its dense counterpart, while using less energy. This makes pruning an effective way to improve energy efficiency, both during the model training and at model inference time. The reduced computational load results in a more energy-efficient model, which is especially beneficial for devices with limited power. We tested different pruning techniques and compared their performance in three different deep learning architectures, considering five different domain-specific datasets. One of our main findings is that using a layer-wise pruning approach leads to significant efficiency gains, at negligible accuracy losses.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/complexsystems/article?id=10.1371/journal.pcsy.0000095 (text/html)
https://journals.plos.org/complexsystems/article/f ... 00095&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcsy00:0000095

DOI: 10.1371/journal.pcsy.0000095

Access Statistics for this article

More articles in PLOS Complex Systems from Public Library of Science
Bibliographic data for series maintained by complexsystem ().