Make $$\ell _1$$ ℓ 1 regularization effective in training sparse CNN
Juncai He,
Xiaodong Jia,
Jinchao Xu (),
Lian Zhang and
Liang Zhao
Additional contact information
Juncai He: Pennsylvania State University
Xiaodong Jia: Pennsylvania State University
Jinchao Xu: Pennsylvania State University
Lian Zhang: Pennsylvania State University
Liang Zhao: Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Computational Optimization and Applications, 2020, vol. 77, issue 1, No 6, 163-182
Abstract:
Abstract Compressed Sensing using $$\ell _1$$ ℓ 1 regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network (CNN)? This paper is aimed to provide an answer to this question and to show how to make it work. Following Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010), We first demonstrate that the commonly used stochastic gradient decent and variants training algorithm is not an appropriate match with $$\ell _1$$ ℓ 1 regularization and then replace it with a different training algorithm based on a regularized dual averaging (RDA) method. The RDA method of Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010) was originally designed specifically for convex problem, but with new theoretical insight and algorithmic modifications (using proper initialization and adaptivity), we have made it an effective match with $$\ell _1$$ ℓ 1 regularization to achieve a state-of-the-art sparsity for the highly non-convex CNN compared to other weight pruning methods without compromising accuracy (achieving 95% sparsity for ResNet-18 on CIFAR-10, for example).
Keywords: Sparse optimization; $$\ell _1$$ ℓ 1 regularization; Dual averaging; CNN (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10589-020-00202-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:coopap:v:77:y:2020:i:1:d:10.1007_s10589-020-00202-1
Ordering information: This journal article can be ordered from
http://www.springer.com/math/journal/10589
DOI: 10.1007/s10589-020-00202-1
Access Statistics for this article
Computational Optimization and Applications is currently edited by William W. Hager
More articles in Computational Optimization and Applications from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().