Integrating Vision Transformer and Time–Frequency Analysis for Stock Volatility Prediction

Wooh, Myungjin; Cho, Poongjin

Integrating Vision Transformer and Time–Frequency Analysis for Stock Volatility Prediction

Myungjin Wooh and Poongjin Cho ()
Additional contact information
Myungjin Wooh: School of Computing, Gachon University, Seongnam 13120, Republic of Korea
Poongjin Cho: School of Computing, Gachon University, Seongnam 13120, Republic of Korea

Mathematics, 2025, vol. 13, issue 23, 1-35

Abstract: Financial market volatility prediction remains challenging due to data nonlinearity and non-stationarity. Existing quantitative approaches struggle to capture multi-scale information embedded in time series, while convolutional neural network (CNN)-based image approaches primarily emphasize local feature extraction, whereas Vision Transformers (ViTs) more directly capture global dependencies through self-attention. To address these limitations, we propose TF-ViTNet, a dual-path hybrid model that integrates time–frequency scalogram generated via Continuous Wavelet Transform (CWT) with ViTs for volatility prediction. While time–frequency analysis has been widely adopted in prior studies, the application of ViTs to CWT-based scalograms within parallel architecture provides a new perspective for capturing global spatiotemporal structures in financial volatility. The model employs a parallel architecture where a Vision Transformer pathway learns global spatiotemporal patterns from scalograms while a Long Short-Term Memory (LSTM) pathway captures temporal characteristics from technical indicators, with both streams integrated at the final stage for volatility prediction. Empirical analysis using NASDAQ and S&P 500 index data from 2010 to 2024 demonstrates that TF-ViTNet consistently outperforms LSTM models using numerical data alone and existing benchmarks. In parallel architectures, Vision Transformers capture global patterns in scalograms more effectively than CNNs, achieving significant performance improvements, particularly for NASDAQ. The model maintains stable predictive power even during high volatility regimes, demonstrating strong potential as a risk management tool. Data augmentation improves performance for the stable S&P 500 market but degrades results for the volatile NASDAQ market, emphasizing the need for market-specific augmentation strategies tailored to underlying signal-to-noise characteristics.

Keywords: time–frequency analysis; vision transformer; scalogram; stock volatility (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/23/3787/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/23/3787/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:23:p:3787-:d:1802850

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().