Denoising in Representation Space via Data-Dependent Regularization for Better Representation

Chen, Muyi; Wang, Daling; Feng, Shi; Zhang, Yifei

Denoising in Representation Space via Data-Dependent Regularization for Better Representation

Muyi Chen (), Daling Wang, Shi Feng and Yifei Zhang
Additional contact information
Muyi Chen: School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Daling Wang: School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Shi Feng: School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
Yifei Zhang: School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China

Mathematics, 2023, vol. 11, issue 10, 1-33

Abstract: Despite the success of deep learning models, it remains challenging for the over-parameterized model to learn good representation under small-sample-size settings. In this paper, motivated by previous work on out-of-distribution (OoD) generalization, we study the representation learning problem from an OoD perspective to identify the fundamental factors affecting representation quality. We formulate a notion of “out-of-feature subspace (OoFS) noise” for the first time, and we link the OoFS noise in the feature extractor to the OoD performance of the model by proving two theorems that demonstrate that reducing OoFS noise in the feature extractor is beneficial in achieving better representation. Moreover, we identify two causes of OoFS noise and prove that the OoFS noise induced by random initialization can be filtered out via L 2 regularization. Finally, we propose a novel data-dependent regularizer that acts on the weights of the fully connected layer to reduce noise in the representations, thus implicitly forcing the feature extractor to focus on informative features and to rely less on noise via back-propagation. Experiments on synthetic datasets show that our method can learn hard-to-learn features; can filter out noise effectively; and outperforms GD, AdaGrad, and KFAC. Furthermore, experiments on the benchmark datasets show that our method achieves the best performance for three tasks among four.

Keywords: deep neural network; representation space; fully connected layer; feature extractor (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/10/2327/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/10/2327/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:10:p:2327-:d:1148505

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().