DSTnet: Deformable Spatio-Temporal Convolutional Residual Network for Video Super-Resolution

Khan, Anusha; Sargano, Allah Bux; Habib, Zulfiqar

DSTnet: Deformable Spatio-Temporal Convolutional Residual Network for Video Super-Resolution

Anusha Khan, Allah Bux Sargano and Zulfiqar Habib
Additional contact information
Anusha Khan: Department of Computer Science, COMSATS University Islamabad, Lahore 54000, Pakistan
Allah Bux Sargano: Department of Computer Science, COMSATS University Islamabad, Lahore 54000, Pakistan
Zulfiqar Habib: Department of Computer Science, COMSATS University Islamabad, Lahore 54000, Pakistan

Mathematics, 2021, vol. 9, issue 22, 1-15

Abstract: Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods.

Keywords: video super-resolution; deformable convolution; 3D convolution; spatio-temporal; residual neural network; deep learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/9/22/2873/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/22/2873/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:22:p:2873-:d:677315

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().