TSPconv-Net: Transformer and Sparse Convolution for 3D Instance Segmentation in Point Clouds

Ning, Xiaojuan; Liu, Yule; Ma, Yishu; Lu, Zhiwei; Jin, Haiyan; Shi, Zhenghao; Wang, Yinghui

TSPconv-Net: Transformer and Sparse Convolution for 3D Instance Segmentation in Point Clouds

Xiaojuan Ning (), Yule Liu, Yishu Ma, Zhiwei Lu, Haiyan Jin, Zhenghao Shi and Yinghui Wang
Additional contact information
Xiaojuan Ning: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Yule Liu: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Yishu Ma: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Zhiwei Lu: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Haiyan Jin: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Zhenghao Shi: Institute of Computer Science and Engineering, Xi’an University of Technology, No. 5 South of Jinhua Road, Xi’an 710048, China
Yinghui Wang: School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 of Lihu Road, Wuxi 214122, China

Mathematics, 2024, vol. 12, issue 18, 1-15

Abstract: Current deep learning approaches for indoor 3D instance segmentation often rely on multilayer perceptrons (MLPs) for feature extraction. However, MLPs struggle to effectively capture the complex spatial relationships inherent in 3D scene data. To address this issue, we propose a novel and efficient framework for 3D instance segmentation called TSPconv-Net. In contrast to existing methods that primarily depend on MLPs for feature extraction, our framework integrates a more robust feature extraction model comprising the offset-attention (OA) mechanism and submanifold sparse convolution (SSC). The proposed framework is an end-to-end network architecture. TSPconv-Net consists of a backbone network followed by a bounding box module. Specifically, the backbone network utilizes the OA mechanism to extract global features and employs SSC for local feature extraction. The bounding box module then conducts instance segmentation based on the extracted features. Experimental results demonstrate that our approach outperforms existing work on the S3DIS dataset while maintaining computational efficiency. TSPconv-Net achieves 68.6% mPrec, 52.5% mRec, and 60.1% mAP on the test set, surpassing 3D-BoNet by 3.0% mPrec, 5.4% mRec, and 2.6% mAP. Furthermore, it demonstrates high efficiency, completing computations in just 326 s.

Keywords: point clouds; indoor scenes; repetitive structures; instance segmentation; bounding boxes (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/18/2926/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/18/2926/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:18:p:2926-:d:1481921

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().