Improving Systematic Generalization of Linear Transformer Using Normalization Layers and Orthogonality Loss Function
Taewon Park and
Hyun-Chul Kim ()
Additional contact information
Taewon Park: Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea
Hyun-Chul Kim: Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Republic of Korea
Mathematics, 2024, vol. 12, issue 21, 1-17
Abstract:
A Linear Transformer linearizes the attention mechanism of the vanilla Transformer architecture, significantly improving efficiency and achieving linear theoretical complexity with respect to sequence length. However, few studies have explored the capabilities of the Linear Transformer beyond its efficiency. In this work, we investigate the systematic generalization capability of the Linear Transformer, a crucial property for strong generalization to unseen data. Through preliminary experiments, we identify two major issues contributing to its unstable systematic generalization performance: (i) unconstrained norms of Queries and Keys , and (ii) high correlation among Values across the sequence. To address these issues, we propose two simple yet effective methods: normalization layers for Queries and Keys , and an orthogonality loss function applied to Values during training. In experiments, we demonstrate that applying these methods to the Linear Transformer significantly improves its stability and systematic generalization performance across several well-known tasks. Furthermore, our proposed methods outperform the vanilla Transformer on specific systematic generalization tasks, such as the sort-of-CLEVR and SCAN tasks.
Keywords: transformer; linear transformer; systematic generalization; normalization; orthogonality loss (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/21/3390/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/21/3390/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:21:p:3390-:d:1509830
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().