Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Teng, Qianru; Chen, Yimin; Huang, Chen

Occlusion-Aware Unsupervised Learning of Monocular Depth, Optical Flow and Camera Pose with Geometric Constraints

Qianru Teng, Yimin Chen and Chen Huang
Additional contact information
Qianru Teng: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Yimin Chen: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Chen Huang: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

Future Internet, 2018, vol. 10, issue 10, 1-14

Abstract: We present an occlusion-aware unsupervised neural network for jointly learning three low-level vision tasks from monocular videos: depth, optical flow, and camera motion. The system consists of three different predicting sub-networks simultaneously coupled by combined loss terms and is capable of computing each task independently on test samples. Geometric constraints extracted from scene geometry which have traditionally been used in bundle adjustment or pose-graph optimization are formed as various self-supervisory signals during our end-to-end learning approach. Different from prior works, our image reconstruction loss also takes account of optical flow. Moreover, we impose novel 3D flow consistency constraints over the predictions of all the three tasks. By explicitly modeling occlusion and taking utilization of both 2D and 3D geometry relationships, abundant geometric constraints are formed over estimated outputs, enabling the system to capture both low-level representations and high-level cues to infer thinner scene structures. Empirical evaluation on the KITTI dataset demonstrates the effectiveness and improvement of our approach: (1) monocular depth estimation outperforms state-of-the-art unsupervised methods and is comparable to stereo supervised ones; (2) optical flow prediction ranks top among prior works and even beats supervised and traditional ones especially in non-occluded regions; (3) pose estimation outperforms established SLAM systems under comparable input settings with a reasonable margin.

Keywords: monocular depth; camera pose; optical flow; joint learning; occlusion-aware; scene geometry (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2018
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/10/10/92/pdf (application/pdf)
https://www.mdpi.com/1999-5903/10/10/92/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:10:y:2018:i:10:p:92-:d:171298

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().