4.7 Article

Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION
卷 128, 期 5, 页码 1239-1285

出版社

SPRINGER
DOI: 10.1007/s11263-019-01188-y

关键词

Semantic segmentation; Multimodal fusion; Scene understanding; Model adaptation; Deep learning

向作者/读者索取更多资源

Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on the relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\,\times $$\end{document} fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on Cityscapes, Synthia, SUN RGB-D, ScanNet and Freiburg Forest benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance while simultaneously being efficient in terms of parameters and inference time as well as demonstrating substantial robustness in adverse perceptual conditions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Robotics

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Nicolai Dorka, Tim Welschehold, Joschka Boedecker, Wolfram Burgard

Summary: This letter proposes a method called Adaptively Calibrated Critics (ACC) to alleviate the bias of low variance temporal difference targets by using recent high variance but unbiased on-policy rollouts. ACC is applied to Truncated Quantile Critics algorithm to regulate the bias with a hyperparameter. ACC achieves state-of-the-art results on the OpenAI gym continuous control benchmark and demonstrates improved performance on various tasks from the Meta-World robot benchmark.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)

Article Computer Science, Artificial Intelligence

Neural Architecture Search for Dense Prediction Tasks in Computer Vision

Rohit Mohan, Thomas Elsken, Arber Zela, Jan Hendrik Metzen, Benedikt Staffler, Thomas Brox, Abhinav Valada, Frank Hutter

Summary: The success of deep learning has resulted in an increased demand for neural network architecture engineering. Neural architecture search (NAS) has emerged as a popular field for automatically designing neural network architectures in a data-driven manner. NAS has become more applicable to dense prediction tasks in computer vision, such as semantic segmentation or object detection, by incorporating weight sharing strategies. This manuscript provides an overview of NAS for dense prediction tasks, discussing the unique challenges and surveying approaches for addressing them to facilitate future research and application.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2023)

Article Robotics

PADLoC: LiDAR-Based Deep Loop Closure Detection and Registration Using Panoptic Attention

Jose Arce, Niclas Voedisch, Daniele Cattaneo, Wolfram Burgard, Abhinav Valada

Summary: In this work, a novel transformer-based head for point cloud matching and registration is proposed for loop closure detection and registration in LiDAR-based SLAM frameworks. The panoptic information is leveraged during training to improve the matching problem. Extensive evaluations demonstrate that PADLoC achieves state-of-the-art results on multiple real-world datasets.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)

Article Robotics

Uncertainty-Aware Panoptic Segmentation

Kshitij Sirohi, Sajad Marvi, Daniel Buescher, Wolfram Burgard

Summary: This paper introduces a novel task of uncertainty-aware panoptic segmentation, aiming to predict per-pixel semantic and instance segmentations with per-pixel uncertainty estimates. The authors define two novel metrics, uncertainty-aware Panoptic Quality (uPQ) and panoptic Expected Calibration Error (pECE), for quantitative analysis. They propose a top-down Evidential Panoptic Segmentation Network (EvPSNet) with a panoptic fusion module leveraging predicted uncertainties.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)

Article Robotics

N2M2: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments

Daniel Honerkamp, Tim Welschehold, Abhinav Valada

Summary: Despite its importance, mobile manipulation remains a significant challenge due to the need for integration of end-effector trajectory generation and navigation skills. Existing methods struggle with controlling the large configuration space and navigating dynamic and unknown environments. In this work, we introduce a new approach called Neural Navigation for Mobile Manipulation (NM2-M-2) that extends the decomposition of tasks in complex obstacle environments, enabling robots to perform a broader range of tasks in real-world settings. The approach demonstrates capabilities in extensive simulation and real-world experiments.

IEEE TRANSACTIONS ON ROBOTICS (2023)

Article Robotics

INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields

Julia Hindel, Nikhil Gosala, Kevin Bregler, Abhinav Valada

Summary: Perception datasets for agriculture are limited, hindering supervised learning, but self-supervised learning methods are not optimized for agricultural tasks. To address this, we propose Injected Noise Discriminator (INoD) that uses feature replacement and dataset discrimination for self-supervised learning. INoD enables the network to learn explicit representations of objects from one dataset while observing similar features from another, improving performance on downstream tasks. We also introduce the Fraunhofer Potato 2022 dataset for potato field object detection, demonstrating state-of-the-art performance of our INoD pretraining strategy.

IEEE ROBOTICS AND AUTOMATION LETTERS (2023)

Proceedings Paper Robotics

Learning Long-Horizon Robot Exploration Strategies for Multi-object Search in Continuous Action Spaces

Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada

Summary: Recent advances in vision-based navigation and exploration have made significant progress in photorealistic indoor environments. However, these methods face challenges in long-horizon tasks and generalizing to unseen environments. This study proposes a novel reinforcement learning approach that combines short-term and long-term reasoning in a single model, achieving exceptional performance in continuous action spaces. Extensive experiments demonstrate its ability to generalize to unseen apartment environments with limited data, as well as achieving zero-shot transfer in real-world office environments.

ROBOTICS RESEARCH, ISRR 2022 (2023)

Proceedings Paper Robotics

Continual SLAM: Beyond Lifelong Simultaneous Localization and Mapping Through Continual Learning

Niclas Voedisch, Daniele Cattaneo, Wolfram Burgard, Abhinav Valada

Summary: In this work, we propose CL-SLAM, a novel task that extends the concept of lifelong SLAM from a single dynamically changing environment to sequential deployments in several drastically differing environments. To address this task, we introduce CL-SLAM, which leverages a dual-network architecture to adapt to new environments and retain knowledge from previously visited environments. We compare CL-SLAM to learning-based and classical SLAM methods, and demonstrate the advantages of leveraging online data.

ROBOTICS RESEARCH, ISRR 2022 (2023)

Proceedings Paper Automation & Control Systems

Realistic Real-Time Simulation of RGB and Depth Sensors for Dynamic Scenarios using Augmented Image Based Rendering

Johan Vertens, Wolfram Burgard

Summary: In this research, a real-time simulation method for synthesizing photorealistic RGB images and sensor-realistic depth maps is proposed. This method can include dynamic objects and improve the testing and validation of robotic perception systems. By using static samples and multimodal cues from CAD models, realistic images can be synthesized, which has been demonstrated on datasets recorded in different setups.

2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) (2022)

Proceedings Paper Automation & Control Systems

OpenDR: An Open Toolkit for Enabling High Performance, Low Footprint Deep Learning for Robotics

N. Passalis, S. Pedrazzi, R. Babuska, W. Burgard, D. Dias, F. Ferro, M. Gabbouj, O. Green, A. Iosifidis, E. Kayacan, J. Kober, O. Michel, N. Nikolaidis, P. Nousi, R. Pieters, M. Tzelepi, A. Valada, A. Tefas

Summary: Existing deep learning frameworks are not readily applicable to robotics due to the specific challenges in learning, reasoning, and embodiment. The high complexity and need for specialized hardware accelerators increase the effort and cost of employing deep learning models in robotics. Additionally, current deep learning methods lack active perception, limiting their ability to interact with the environment. This paper presents OpenDR, an open and modular deep learning toolkit for robotics, aiming to address these challenges.

2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Amodal Panoptic Segmentation

Rohit Mohan, Abhinav Valada

Summary: The article introduces the way humans perceive the world through modal perception and proposes a new task, namely amodal panoptic segmentation. To facilitate research on this task, the article extends two existing datasets and proposes a new segmentation network. The experimental results demonstrate that this method achieves state-of-the-art performance on the benchmarks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Robust Object Detection Using Knowledge Graph Embeddings

Christopher Lang, Alexander Braun, Abhinav Valada

Summary: This article challenges the prevalence of the one-hot approach in closed-set object detection and demonstrates through experimental results that knowledge-based class representations are more semantically reliable.

PATTERN RECOGNITION, DAGM GCPR 2022 (2022)

Proceedings Paper Robotics

Vision-Based Autonomous UAV Navigation and Landing for Urban Search and Rescue

Mayank Mittal, Rohit Mohan, Wolfram Burgard, Abhinav Valada

Summary: This paper introduces a life-saving technology using unmanned aerial vehicles equipped with bioradars to identify survivors after natural disasters. The technology requires UAVs to autonomously navigate and land on debris piles. The paper proposes a new landing site detection algorithm and conducts experiments using a synthetic dataset and a simulation environment.

ROBOTICS RESEARCH: THE 19TH INTERNATIONAL SYMPOSIUM ISRR (2022)

暂无数据