RMPT: Reinforced Memory-Driven Pure Transformer for Automatic Chest X-Ray Report Generation

Qin, Caijie; Xiong, Yize; Chen, Weibin; Li, Yong

RMPT: Reinforced Memory-Driven Pure Transformer for Automatic Chest X-Ray Report Generation

Caijie Qin, Yize Xiong, Weibin Chen and Yong Li ()
Additional contact information
Caijie Qin: Institute of Information Engineering, Sanming University, Sanming 365004, China
Yize Xiong: Institute of Information Engineering, Sanming University, Sanming 365004, China
Weibin Chen: Qingdao Nuocheng Chemicals Safty Technology Co., Ltd., Qingdao 266071, China
Yong Li: Institute of Information Engineering, Sanming University, Sanming 365004, China

Mathematics, 2025, vol. 13, issue 9, 1-14

Abstract: Automatic generation of chest X-ray reports, designed to produce clinically precise descriptions from chest X-ray images, is gaining significant research attention because of its vast potential in clinical applications. Recently, despite considerable progress, current models typically adhere to a CNN–Transformer-based framework, which still fails to enhance the perceptual field during image feature extraction. To solve this problem, we propose the Reinforced Memory-driven Pure Transformer (RMPT), which is a novel Transformer–Transformer-based model. In implementation, our RMPT employs the Swin Transformer to extract visual features from given X-ray images, which has a larger perceptual field to better model the relationships between different regions. Furthermore, we adopt a memory-driven Transformer (MemTrans) to effectively model similar patterns in different reports, which is able to facilitate the model to generate long reports. Finally, we present an innovative training approach leveraging Reinforcement Learning (RL) that efficiently steers the model to focus on challenging samples, consequently improving its comprehensive performance across both straightforward and complex situations. Experimental results on the IU X-ray dataset show that our proposed RMPT achieves superior performance on various Natural Language Generation (NLG) evaluation metrics. Further ablation study results demonstrate that our RMPT model achieves 10.5% overall performance compared to the base mode.

Keywords: chest X-ray report generation; transformer; image-to-text; reinforcement learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/9/1492/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/9/1492/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:9:p:1492-:d:1647007

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().