TIC-FusionNet: A multimodal deep learning framework with temporal decomposition and attention-based fusion for time series forecasting

Chen, Liyu; Fan, Xiangwei

TIC-FusionNet: A multimodal deep learning framework with temporal decomposition and attention-based fusion for time series forecasting

Liyu Chen and Xiangwei Fan

PLOS ONE, 2025, vol. 20, issue 10, 1-35

Abstract: We propose TIC-FusionNet, a trend-aware multimodal deep learning framework for time series forecasting with integrated visual signal analysis, aimed at addressing the limitations of unimodal and short-range dependency models in noisy financial environments. The architecture combines Exponential Moving Average (EMA) decomposition for denoising and trend extraction, a lightweight Linear Transformer for efficient long-sequence temporal modeling, and a spatial–channel CNN with CBAM attention to capture morphological patterns from candlestick chart images. A gated fusion mechanism adaptively integrates numerical and visual modalities based on context relevance, enabling dynamic feature weighting under varying market conditions. We evaluate TIC-FusionNet on six real-world stock datasets, including four major Chinese and U.S. companies—Amazon, Tesla, Kweichow Moutai, Ping An Insurance, China Vanke—and Apple—covering diverse market sectors and volatility patterns. The model is compared against a broad range of baselines, including statistical models (ARIMA), classical machine learning methods (Random Forest, SVR), recurrent and convolutional neural networks (LSTM, TCN, CNN-only), and recent Transformer-based architectures (Informer, Autoformer, Crossformer, iTransformer). Experimental results demonstrate that TIC-FusionNet achieves consistently superior predictive accuracy and generalization, outperforming state-of-the-art baselines across all datasets. Extensive ablation studies verify the critical role of each architectural component, while attention-based interpretability analysis highlights the dominant technical indicators under different volatility regimes. These findings not only confirm the effectiveness of multimodal integration in capturing complementary temporal–visual cues, but also provide valuable insights into model decision-making. The proposed framework offers a robust, scalable, and interpretable solution for multimodal temporal prediction tasks, with strong potential for deployment in intelligent forecasting, sensor fusion, and risk-aware decision-making systems.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0333379 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 33379&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0333379

DOI: 10.1371/journal.pone.0333379

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().