☆ 4.7 Article

Real-Time Dense Monocular SLAM With Online Adapted Depth Prediction Network

IEEE TRANSACTIONS ON MULTIMEDIA (2019)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 21, 期 2, 页码 470-483

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2018.2859034

关键词

Monocular SLAM; dense mapping; convolutional neural network; fusion; online tuning

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Natural Science Foundation of China [61502188]
Wuhan Science and Technology Bureau [2017010201010111]
Program for HUST Acadamic Frontier Youth Team

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Considerable advances have been achieved in estimating the depth map from a single image via convolutional neural networks (CNNs) during the past few years. Combining depth prediction from CNNs with conventional monocular simultaneous localization and mapping (SLAM) is promising for accurate and dense monocular reconstruction, in particular addressing the two long-standing challenges in conventional monocular SLAM: low map completeness and scale ambiguity. However, depth estimated by pretrained CNNs usually fails to achieve sufficient accuracy for environments of different types from the training data, which are common for certain applications such as obstacle avoidance of drones in unknown scenes. Additionally, inaccurate depth prediction of CNN could yield large tracking errors in monocular SLAM. In this paper, we present a real-time dense monocular SLAM system, which effectively fuses direct monocular SLAM with an online-adapted depth prediction network for achieving accurate depth prediction of scenes of different types from the training data and providing absolute scale information for tracking and mapping. Specifically, on one hand, tracking pose (i.e., translation and rotation) from direct SLAM is used for selecting a small set of highly effective and reliable training images, which acts as ground truth for tuning the depth prediction network on-the-fly toward better generalization ability for scenes of different types. A stage-wise Stochastic Gradient Descent algorithm with a selective update strategy is introduced for efficient convergence of the tuning process. On the other hand, the dense map produced by the adapted network is applied to address scale ambiguity of direct monocular SLAM which in turn improves the accuracy of both tracking and overall reconstruction. The system with assistance of both CPUs and GPUs, can achieve real-time performance with progressively improved reconstruction accuracy. Experimental results on public datasets and live application to obstacle avoidance of drones demonstrate that our method outperforms the state-of-the-art methods with greater map completeness and accuracy, and a smaller tracking error.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

mVIL-Fusion: Monocular Visual-Inertial-LiDAR Simultaneous Localization and Mapping in Challenging Environments

Yan Wang, Hongwei Ma

Summary: We propose mVIL-Fusion, a three-level multisensor fusion system that achieves robust state estimation and globally consistent mapping in perceptually degraded environments. Our system uses LiDAR depth-assisted visual-inertial odometry (VIO) as the frontend, with synchronous prediction and distortion correction functions. It also applies a novel double-sliding-window-based optimization to enhance state estimation accuracy and robustness. Loop closures and pose-only factor graph smoothing are used in the backend to generate a global map. The system has been validated on public datasets and self-collected sequences.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)