Mono-ViM: A Self-Supervised Mamba Framework for Monocular Depth Estimation in Endoscopic Scenes

Chen, Shengli; Chen, Yuming; Xu, Xiaoang; Li, Jiahao; Ye, Ke; Chang, Tianzuo

Mono-ViM: A Self-Supervised Mamba Framework for Monocular Depth Estimation in Endoscopic Scenes

Shengli Chen, Yuming Chen, Xiaoang Xu, Jiahao Li, Ke Ye and Tianzuo Chang ()
Additional contact information
Shengli Chen: Jiangsu Product Quality Testing and Inspection Institute, Nanjing 210007, China
Yuming Chen: Jiangsu Product Quality Testing and Inspection Institute, Nanjing 210007, China
Xiaoang Xu: Jiangsu Product Quality Testing and Inspection Institute, Nanjing 210007, China
Jiahao Li: College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Ke Ye: College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Tianzuo Chang: College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Mathematics, 2025, vol. 13, issue 21, 1-17

Abstract: Self-supervised depth estimation methods enable the recovery of scene depth information from monocular endoscopic images, thereby assisting endoscopic navigation. However, existing monocular endoscopic depth estimation methods generally fail to capture the inherent continuity of depth in intestinal structures. To address this limitation, this work presents the Mono-ViM framework, a CNN-Mamba hybrid architecture that enhances depth estimation accuracy through an innovative depth-first scanning mechanism. The proposed framework comprises a Depth Local Visual Mamba module employing depth-first scanning to extract rich structural features, and a cross-query layer, which reframes depth estimation as a soft classification problem to significantly enhance robustness and uncertainty handling in complex endoscopic environments. Experimental results on the SimCol Dataset and C3VD demonstrate that the proposed method achieves high depth estimation accuracy, with Abs Rel of 0.070 and 0.084, respectively. These results correspond to error reductions of 16.7% and 19.4% compared to existing methods, highlighting the efficacy of the proposed approach.

Keywords: monocular depth estimation; mamba-based; depth-first scan; cross-query; lightweight (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3538/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3538/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3538-:d:1787296

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().