TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning

Nguyen, Thi-Hao; Van-Hung, Le; Do, Huu-Son; Te, Trung-Hieu; Van-Nam, Phan

TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning

Thi-Hao Nguyen, Le Van-Hung (), Huu-Son Do, Trung-Hieu Te and Phan Van-Nam
Additional contact information
Thi-Hao Nguyen: Faculty of Engineering Technology, Hung Vuong University, Viet Tri City 35100, Vietnam
Le Van-Hung: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Huu-Son Do: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Trung-Hieu Te: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam
Phan Van-Nam: Faculty of Basic Science, Tan Trao University, Tuyen Quang City 22000, Vietnam

Future Internet, 2024, vol. 16, issue 5, 1-21

Abstract: The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual and conditional conditions will create a more accurate visual SLAM and VO construction model. In this paper, we introduce the TQU-SLAM benchmark dataset, which includes 160,631 RGB-D frame pairs. It was collected from the corridors of three interconnected buildings comprising a length of about 230 m. The ground-truth data of the TQU-SLAM benchmark dataset were prepared manually, including 6-DOF camera poses, 3D point cloud data, intrinsic parameters, and the transformation matrix between the camera coordinate system and the real world. We also tested the TQU-SLAM benchmark dataset using the PySLAM framework with traditional features such as SHI_TOMASI, SIFT, SURF, ORB, ORB2, AKAZE, KAZE, and BRISK and features extracted from DL such as VGG, DPVO, and TartanVO. The camera pose estimation results are evaluated, and we show that the ORB2 features have the best results ( E r r d = 5.74 mm), while the ratio of the number of frames with detected keypoints of the SHI_TOMASI feature is the best ( r d = 98.97 % ). At the same time, we also present and analyze the challenges of the TQU-SLAM benchmark dataset for building visual SLAM and VO systems.

Keywords: TQU-SLAM benchmark dataset; visual odometry; RGB-D images; 3D trajectory; feature descriptors; deep learning; feature-based extraction (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1999-5903/16/5/174/pdf (application/pdf)
https://www.mdpi.com/1999-5903/16/5/174/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:16:y:2024:i:5:p:174-:d:1396772

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().