SaMfENet: Self-Attention Based Multi-Scale Feature Fusion Coding and Edge Information Constraint Network for 6D Pose Estimation

Li, Zhuoxiao; Li, Xiaobing; Chen, Shihao; Du, Jialong; Li, Yong

SaMfENet: Self-Attention Based Multi-Scale Feature Fusion Coding and Edge Information Constraint Network for 6D Pose Estimation

Zhuoxiao Li, Xiaobing Li, Shihao Chen, Jialong Du and Yong Li ()
Additional contact information
Zhuoxiao Li: Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, School of Electrical Engineering, Guangxi University, Nanning 530004, China
Xiaobing Li: Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, School of Electrical Engineering, Guangxi University, Nanning 530004, China
Shihao Chen: Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, School of Electrical Engineering, Guangxi University, Nanning 530004, China
Jialong Du: Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, School of Electrical Engineering, Guangxi University, Nanning 530004, China
Yong Li: Guangxi Key Laboratory of Manufacturing System and Advanced Manufacturing Technology, School of Electrical Engineering, Guangxi University, Nanning 530004, China

Mathematics, 2022, vol. 10, issue 19, 1-19

Abstract: Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D pose estimation more challenging. To estimate the 6D pose of the object accurately, a self-attention-based multi-scale feature fusion coding and edge information constraint 6D pose estimation network is proposed, which can achieve accurate 6D pose estimation by employing RGB-D images. The proposed algorithm first introduces the edge reconstruction module into the pose estimation network, which improves the attention of the feature extraction network to the edge features. Furthermore, a self-attention multi-scale point cloud feature extraction module, i.e., MSPNet, is proposed to extract point cloud geometric features, which are reconstructed from depth maps. Finally, the clustering feature encoding module, i.e., SE-NetVLAD, is proposed to encode multi-modal dense feature sequences to construct more expressive global features. The proposed method is evaluated on the LineMOD and YCB-Video datasets, and the experimental results illustrate that the proposed method has an outstanding performance, which is close to the current state-of-the-art methods.

Keywords: 6D pose estimation; multi-scale feature fusion; attention mechanism; edge information constraint (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/19/3671/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/19/3671/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:19:p:3671-:d:935720

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().