Path-Wise Attention Memory Network for Visual Question Answering

Xiang, Yingxin; Zhang, Chengyuan; Han, Zhichao; Yu, Hao; Li, Jiaye; Zhu, Lei

Path-Wise Attention Memory Network for Visual Question Answering

Yingxin Xiang, Chengyuan Zhang (), Zhichao Han, Hao Yu, Jiaye Li and Lei Zhu ()
Additional contact information
Yingxin Xiang: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Chengyuan Zhang: College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
Zhichao Han: College of Science and Technology, Xiangsihu College Guangxi University for Nationalities, Nanning 530008, China
Hao Yu: College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
Jiaye Li: College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
Lei Zhu: College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China

Mathematics, 2022, vol. 10, issue 18, 1-19

Abstract: Visual question answering (VQA) is regarded as a multi-modal fine-grained feature fusion task, which requires the construction of multi-level and omnidirectional relations between nodes. One main solution is the composite attention model which is composed of co-attention (CA) and self-attention(SA). However, the existing composite models only consider the stack of single attention blocks, lack of path-wise historical memory, and overall adjustments. We propose a path attention memory network (PAM) to construct a more robust composite attention model. After each single-hop attention block (SA or CA), the importance of the cumulative nodes is used to calibrate the signal strength of nodes’ features. Four memoried single-hop attention matrices are used to obtain the path-wise co-attention matrix of path-wise attention (PA); therefore, the PA block is capable of synthesizing and strengthening the learning effect on the whole path. Moreover, we use guard gates of the target modal to check the source modal values in CA and conditioning gates of another modal to guide the query and key of the current modal in SA. The proposed PAM is beneficial to construct a robust multi-hop neighborhood relationship between visual and language and achieves excellent performance on both VQA2.0 and VQA-CP V2 datasets.

Keywords: attention mechanism; path-wise attention; attention memory; memory network (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/18/3244/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/18/3244/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:18:p:3244-:d:908669

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().