MLA-Net: Feature Pyramid Network with Multi-Level Local Attention for Object Detection

Yang, Xiaobao; Wang, Wentao; Wu, Junsheng; Ding, Chen; Ma, Sugang; Hou, Zhiqiang

MLA-Net: Feature Pyramid Network with Multi-Level Local Attention for Object Detection

Xiaobao Yang (), Wentao Wang, Junsheng Wu, Chen Ding, Sugang Ma and Zhiqiang Hou
Additional contact information
Xiaobao Yang: Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, Xi’an 710061, China
Wentao Wang: School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an 710061, China
Junsheng Wu: School of Software, Northwestern Polytechnical University, Xi’an 710072, China
Chen Ding: School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an 710061, China
Sugang Ma: School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an 710061, China
Zhiqiang Hou: School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an 710061, China

Mathematics, 2022, vol. 10, issue 24, 1-13

Abstract: Feature pyramid networks and attention mechanisms are the mainstream methods to improve the detection performance of many current models. However, when they are learned jointly, there is a lack of information association between multi-level features. Therefore, this paper proposes a feature pyramid of the multi-level local attention method, dubbed as MLA-Net (Feature Pyramid Network with Multi-Level Local Attention for Object Detection), which aims to establish a correlation mechanism for multi-level local information. First, the original multi-level features are deformed and rectified using the local pixel-rectification module, and global semantic enhancement is achieved through the multi-level spatial-attention module. After that, the original features are further fused through the residual connection to achieve the fusion of contextual features to enhance the feature representation. Extensive ablation experiments were conducted on the MS COCO (Microsoft Common Objects in Context) dataset, and the results demonstrate the effectiveness of the proposed method with a 0.5% enhancement. An improvement of 1.2% was obtained on the PASCAL VOC (Pattern Analysis Statistical Modelling and Computational Learning, Visual Object Classes) dataset, reaching 81.8%, thereby, indicating that the proposed method is robust and can compete with other advanced detection models.

Keywords: object detection; convolutional neural network; self-attention; feature pyramid network (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/24/4789/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/24/4789/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:24:p:4789-:d:1005441

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().