Optimizing document management and retrieval with multimodal transformers and knowledge graphs
Yali Chen,
Bin Hu and
Yajuan Liu
PLOS ONE, 2025, vol. 20, issue 6, 1-27
Abstract:
In the digital age, multimodal archival data is experiencing explosive growth, and how to efficiently and accurately retrieve information from it has become a key challenge. Traditional retrieval methods struggle to effectively handle multi-source heterogeneous multimodal data, leading to poor retrieval accuracy and efficiency. To address this issue, this paper proposes the MDKG-RL model, which organically integrates knowledge graph reasoning, deep reinforcement learning dynamic optimization, and multimodal Transformer architecture to achieve deep semantic understanding of multimodal data and intelligent optimization of retrieval strategies. The experiments, based on the ICDAR 2023 and AIDA Corpus datasets, show that MDKG-RL achieves a mean reciprocal rank (MRR) of 0.85, a normalized discounted cumulative gain (NDCG) of 0.88, and an entity linking accuracy of 92.4%. Compared to the baseline model, MRR improves by 13.3%, NDCG increases by 12.8%, and response time is reduced by 38.2%, significantly outperforming other comparison models. Ablation experiments also confirm the indispensability of each module. Visual analysis further demonstrates the model’s clear advantages in retrieval accuracy and efficiency, though error analysis reveals its shortcomings in handling long-tail entities and cross-modal ambiguity. The MDKG-RL model provides an innovative and effective solution for multimodal archival retrieval, not only improving retrieval performance but also laying the foundation for future research. In the future, model performance and generalization capabilities can be further enhanced by expanding data, optimizing strategies, and extending application scenarios, thereby promoting the development and application of multimodal retrieval technology in the fields of information management and knowledge discovery.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0323966 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 23966&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0323966
DOI: 10.1371/journal.pone.0323966
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().