Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

Gwun, Woowoen; Choi, Kiho; Park, Gwang Hoon

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

Woowoen Gwun, Kiho Choi () and Gwang Hoon Park ()
Additional contact information
Woowoen Gwun: Department of Computer Science and Engineering, College of Software, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of Korea
Kiho Choi: Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of Korea
Gwang Hoon Park: Department of Computer Science and Engineering, College of Software, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of Korea

Mathematics, 2024, vol. 12, issue 18, 1-24

Abstract: Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos.

Keywords: video compression; AV1; self-attention; CNN (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/18/2874/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/18/2874/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:18:p:2874-:d:1478780

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().