EconPapers    
Economics at your fingertips  
 

Dual-stage framework with soft-label distillation and spatial prompting for image-text retrieval

Ran Jin, Zhengang Li, Fang Deng, Yanhong Zhang, Min Luo, Tao Jin, Tengda Hou, Chenjie Du, Xiaozhe Gu and Jie Yuan

PLOS ONE, 2025, vol. 20, issue 10, 1-13

Abstract: Vision-language pre-training (VLP) methods have significantly advanced cross-modal tasks in recent years. However, image-text retrieval still faces two critical challenges: inter-modal matching deficiency and intra-modal fine-grained localization deficiency. These issues significantly impede the accuracy of image-text retrieval. To address these challenges, we propose a novel dual-stage training framework. In the first stage, we employ Soft Label Distillation (SLD) to align the contrastive relationships between images and texts by mitigating the overfitting problem caused by hard labels. In the second stage, we introduce Spatial Text Prompt (STP) to enhance the model’s visual grounding capabilities by incorporating spatial prompt information, thereby achieving more precise fine-grained alignment. Extensive experiments on standard datasets show that our method outperforms state-of-the-art approaches in image-text retrieval.The code and supplementary files can be found at https://github.com/Leon001211/DSSLP.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0333084 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 33084&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0333084

DOI: 10.1371/journal.pone.0333084

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().

 
Page updated 2025-10-11
Handle: RePEc:plo:pone00:0333084