A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision

Tan, Jiahai; Gao, Ming; Duan, Tao; Gao, Xiaomei

A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision

Jiahai Tan (), Ming Gao, Tao Duan and Xiaomei Gao
Additional contact information
Jiahai Tan: School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, China
Ming Gao: School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, China
Tao Duan: State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
Xiaomei Gao: Xi’an Mapping and Printing of China National Administration of Coal Geology, Xi’an 710199, China

Mathematics, 2023, vol. 11, issue 22, 1-19

Abstract: Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks.

Keywords: monocular depth estimation; pseudo-depth net; transformer; encoder–decoder (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/22/4645/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/22/4645/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:22:p:4645-:d:1279911

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().