NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

Baskin, Chaim; Zheltonozhkii, Evgenii; Rozen, Tal; Liss, Natan; Chai, Yoav; Schwartz, Eli; Giryes, Raja; Bronstein, Alexander M.; Mendelson, Avi

NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

Chaim Baskin, Evgenii Zheltonozhkii, Tal Rozen, Natan Liss, Yoav Chai, Eli Schwartz, Raja Giryes, Alexander M. Bronstein and Avi Mendelson
Additional contact information
Chaim Baskin: Department of Computer Science, Technion, Haifa 3200003, Israel
Evgenii Zheltonozhkii: Department of Computer Science, Technion, Haifa 3200003, Israel
Tal Rozen: Department of Electrical Engineering, Technion, Haifa 3200003, Israel
Natan Liss: Department of Electrical Engineering, Technion, Haifa 3200003, Israel
Yoav Chai: School of Electrical Engineering, Tel-Aviv University, Tel-Aviv 6997801, Israel
Eli Schwartz: School of Electrical Engineering, Tel-Aviv University, Tel-Aviv 6997801, Israel
Raja Giryes: School of Electrical Engineering, Tel-Aviv University, Tel-Aviv 6997801, Israel
Alexander M. Bronstein: Department of Computer Science, Technion, Haifa 3200003, Israel
Avi Mendelson: Department of Computer Science, Technion, Haifa 3200003, Israel

Mathematics, 2021, vol. 9, issue 17, 1-12

Abstract: Convolutional Neural Networks (CNNs) are very popular in many fields including computer vision, speech recognition, natural language processing, etc. Though deep learning leads to groundbreaking performance in those domains, the networks used are very computationally demanding and are far from being able to perform in real-time applications even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error unless spatial adjustments are carried out. The method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with as low as 3 bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low-power real-time applications. The quantization code will become publicly available upon acceptance.

Keywords: neural networks; low power; quantization; CNN architecture (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/9/17/2144/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/17/2144/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:17:p:2144-:d:627901

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().