SEND: Semantic-Aware Deep Unfolded Network with Diffusion Prior for Multi-Modal Image Fusion and Object Detection

Zhang, Rong; Xiong, Mao-Yi; Huang, Jun-Jie

SEND: Semantic-Aware Deep Unfolded Network with Diffusion Prior for Multi-Modal Image Fusion and Object Detection

Rong Zhang, Mao-Yi Xiong and Jun-Jie Huang ()
Additional contact information
Rong Zhang: College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
Mao-Yi Xiong: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Jun-Jie Huang: College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

Mathematics, 2025, vol. 13, issue 16, 1-16

Abstract: Multi-modality image fusion (MIF) aims to integrate complementary information from diverse imaging modalities into a single comprehensive representation and serves as an essential processing step for downstream high-level computer vision tasks. The existing deep unfolding-based processes demonstrate promising results; however, they often rely on deterministic priors with limited generalization ability and usually decouple from the training process of object detection. In this paper, we propose Semantic-Aware Deep Unfolded Network with Diffusion Prior (SEND), a novel framework designed for transparent and effective multi-modality fusion and object detection. SEND consists of a Denoising Prior Guided Fusion Module and a Fusion Object Detection Module. The Denoising Prior Guided Fusion Module does not utilize the traditional deterministic prior but combines the diffusion prior with deep unfolding, leading to improved multi-modal fusion performance and generalization ability. It is designed with a model-based optimization formulation for multi-modal image fusion, which is unfolded into two cascaded blocks: a Diffusion Denoising Fusion Block to generate informative diffusion priors and a Data Consistency Enhancement Block that explicitly aggregates complementary features from both the diffusion priors and input modalities. Additionally, SEND incorporates the Fusion Object Detection Module with the Denoising Prior Guided Fusion Module for object detection task optimization using a carefully designed two-stage training strategy. Experiments demonstrate that the proposed SEND method outperforms state-of-the-art methods, achieving superior fusion quality with improved efficiency and interpretability.

Keywords: multi-modality image fusion; deep unfolding; diffusion model; object detection (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/16/2584/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/16/2584/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:16:p:2584-:d:1723104

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().