EconPapers    
Economics at your fingertips  
 

Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model

Le Hoang Anh, Dang Thanh Vu, Seungmin Oh, Gwang-Hyun Yu, Nguyen Bui Ngoc Han, Hyoung-Gook Kim, Jin-Sul Kim () and Jin-Young Kim ()
Additional contact information
Le Hoang Anh: Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea
Dang Thanh Vu: Research Center, AISeed Inc., Gwangju 61186, Republic of Korea
Seungmin Oh: Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea
Gwang-Hyun Yu: Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea
Nguyen Bui Ngoc Han: Department of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of Korea
Hyoung-Gook Kim: Department of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of Korea
Jin-Sul Kim: Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea
Jin-Young Kim: Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea

Energies, 2024, vol. 17, issue 24, 1-18

Abstract: Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.

Keywords: multivariate time series forecasting; transfer learning; frequency analysis (search for similar items in EconPapers)
JEL-codes: Q Q0 Q4 Q40 Q41 Q42 Q43 Q47 Q48 Q49 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1996-1073/17/24/6452/pdf (application/pdf)
https://www.mdpi.com/1996-1073/17/24/6452/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jeners:v:17:y:2024:i:24:p:6452-:d:1549422

Access Statistics for this article

Energies is currently edited by Ms. Agatha Cao

More articles in Energies from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jeners:v:17:y:2024:i:24:p:6452-:d:1549422