UFM: Unified feature matching pre-training with multi-modal image assistants
Yide Di,
Yun Liao,
Hao Zhou,
Kaijun Zhu,
Qing Duan,
Junhui Liu and
Mingyu Lu
PLOS ONE, 2025, vol. 20, issue 3, 1-27
Abstract:
Image feature matching, a foundational task in computer vision, remains challenging for multimodal image applications, often necessitating intricate training on specific datasets. In this paper, we introduce a Unified Feature Matching pre-trained model (UFM) designed to address feature matching challenges across a wide spectrum of modal images. We present Multimodal Image Assistant (MIA) transformers, finely tunable structures adept at handling diverse feature matching problems. UFM exhibits versatility in addressing both feature matching tasks within the same modal and those across different modals. Additionally, we propose a data augmentation algorithm and a staged pre-training strategy to effectively tackle challenges arising from sparse data in specific modals and imbalanced modal datasets. Experimental results demonstrate that UFM excels in generalization and performance across various feature matching tasks. The code will be released at: https://github.com/LiaoYun0x0/UFM.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0319051 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 19051&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0319051
DOI: 10.1371/journal.pone.0319051
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().