Article
Computer Science, Information Systems
Yifei Zhang, Olivier Morel, Ralph Seulin, Fabrice Meriaudeau, Desire Sidibe
Summary: The study introduces a novel central multimodal fusion framework for semantic image segmentation of road scenes, achieving significant performance improvement by combining joint low-level and high-level feature representations with statistical priors.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Review
Computer Science, Artificial Intelligence
Yifei Zhang, Desire Sidibe, Olivier Morel, Fabrice Meriaudeau
Summary: Recent advances in deep learning have shown excellent performance in scene understanding tasks, but in complex environments, multimodal fusion is necessary. Deep multimodal fusion significantly improves semantic image segmentation, with different fusion strategies such as early fusion, late fusion, and hybrid fusion.
IMAGE AND VISION COMPUTING
(2021)
Article
Geography, Physical
Ning Zhang, Francesco Nex, Norman Kerle, George Vosselman
Summary: This paper presents a novel cascade network for studying semantic segmentation in low-light indoor environments, utilizing real and rendered images datasets. The proposed method achieves high accuracy in segmentation by decomposing low-light images and incorporating illumination invariant features. The results also demonstrate the importance of semantic information in enhancing reflectance restoration and segmentation accuracy.
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
(2022)
Article
Engineering, Civil
Shuai Di, Qi Feng, Chun-Guang Li, Mei Zhang, Honggang Zhang, Semir Elezovikj, Chiu C. Tan, Haibin Ling
Summary: This paper proposes a method for semantic segmentation in rainy night scenes, using knowledge transfer between near scenes and daytime images. By adapting at the representation level and segmentation space level, the impact of domain shift is effectively reduced.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
(2021)
Article
Computer Science, Information Systems
Lin Guo, Guoliang Fan
Summary: This study proposes a new method flow for instance-level object detection in indoor scenes utilizing pixel-level labeling information, aiming to integrate semantic labeling and instance segmentation for comprehensive understanding. By optimizing instance segmentation through considering spatial fitness and relational context encoded by three graphical models, the method shows significant improvement in small object segmentation according to experimental results.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Engineering, Electrical & Electronic
Fengmao Lv, Guosheng Lin, Peng Liu, Guowu Yang, Sinno Jialin Pan, Lixin Duan
Summary: This paper proposes a method to improve cross-domain segmentation performance using easily-collected image-level annotations, constructing domain adaptation curriculums to address performance issues resulting from using synthetic images. The approach demonstrates effectiveness through experiments on GTA5 -> Cityscapes and SYNTHIA -> Cityscapes settings, outperforming existing baselines.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Review
Computer Science, Information Systems
Zhiyang Guo, Yingping Huang, Xing Hu, Hongjian Wei, Baigan Zhao
Summary: This paper provides a comprehensive survey of deep learning-based approaches for scene understanding in autonomous driving, categorizing them into four work streams and analyzing their characteristics, advantages, and disadvantages. It also summarizes benchmark datasets and evaluation criteria used in the research community, compares the performance of some latest works, and discusses future challenges in the research domain.
Article
Engineering, Electrical & Electronic
Yu Pei, Bin Sun, Shutao Li
Summary: The article introduces an efficient deep neural network called FSFnet for real-time semantic segmentation of road scenes, achieving precise segmentation results on Cityscapes and CamVid datasets. The proposed network combines features in different levels or scales through feature selective fusion module (FSFM) and aggregates context information using a multiscale context enhancement module.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
(2021)
Article
Computer Science, Artificial Intelligence
Parvin Razzaghi, Karim Abbasi, Mahmoud Shirazi, Shima Rashidi
Summary: MRI brain image analysis, especially in the context of multimodal medical image analysis, is a challenging task. This paper proposes a new multimodal deep transfer learning approach, which addresses the problem of different distribution between training and test sets by introducing a new multimodal feature encoder and adaptation technique. The experimental results demonstrate the superior performance of the proposed approach compared to other methods, particularly in brain tumor detection.
APPLIED SOFT COMPUTING
(2022)
Article
Computer Science, Artificial Intelligence
Liang Liao, Wenyi Chen, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh
Summary: In this study, a method is proposed to improve the adaptive model by exploiting the characteristics of foggy image sequences in driving scenes. The method uses spatial similarity and temporal correspondence to diffuse confident pseudo labels and introduces local spatial similarity loss and temporal contrastive loss to ensure feature similarity. Experimental results demonstrate that the proposed method outperforms other methods on natural foggy datasets.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Article
Geochemistry & Geophysics
Sudipan Saha, Muhammad Shahzad, Lichao Mou, Qian Song, Xiao Xiang Zhu
Summary: Earth observation data has great potential to enhance our understanding of the planet. We propose a semantic segmentation method that learns to segment from a single scene without any annotations. The method samples smaller unlabeled patches from the scene and generates alternate views through transformations, which are then processed by a network with iterative weight refinement. Experimental results demonstrate the effectiveness of the proposed method.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)
Article
Construction & Building Technology
Vahid Zamani, Hosein Taghaddos, Yaghob Gholipour, Hamidreza Pourreza
Summary: This study proposes a vision-based approach for soil-included scene understanding and classification using semantic segmentation. By training deep learning models and validating with annotated datasets, the remarkable performance of this approach is demonstrated.
AUTOMATION IN CONSTRUCTION
(2022)
Article
Computer Science, Artificial Intelligence
Wujie Zhou, Shaohua Dong, Jingsheng Lei, Lu Yu
Summary: This study proposes a multitask-aware network (MTANet) with hierarchical multimodal fusion to improve the segmentation accuracy of RGB-T urban scene understanding. The network uses a hierarchical fusion module and a high-level semantic module to enhance feature fusion and improve segmentation accuracy. Experimental results on two benchmark datasets demonstrate the improved performance of the proposed MTANet compared with state-of-the-art methods.
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
(2023)
Article
Computer Science, Interdisciplinary Applications
Liu Yang, Hubo Cai
Summary: This paper proposes a weakly supervised segmentation approach that uses inexpensive image-level labels. The missing boundary information in image-level labels is compensated by BIM-extracted object information. The proposed method consists of three modules: detect initial object locations from image-level labels, extract object information from BIM as prior knowledge, and incorporate the prior knowledge into the network to enhance the detected object locations. Experimental results demonstrate the effectiveness of the proposed method in improving object detection using prior knowledge from BIM and outperforming existing weakly supervised methods.
JOURNAL OF COMPUTING IN CIVIL ENGINEERING
(2023)
Article
Geochemistry & Geophysics
Heng Zhou, Chunna Tian, Zhenxi Zhang, Qizheng Huo, Yongqiang Xie, Zhongbo Li
Summary: Semantic segmentation is crucial for autonomous vehicles. The introduction of a multispectral fusion transformer network (MFTNet) enhances the performance of RGB-T semantic segmentation by fusing the rich details of RGB image and the illumination robustness of thermal image. The MFT module and the optimization strategy enable precise segmentation.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2022)
Article
Robotics
Nicolai Dorka, Tim Welschehold, Joschka Boedecker, Wolfram Burgard
Summary: This letter proposes a method called Adaptively Calibrated Critics (ACC) to alleviate the bias of low variance temporal difference targets by using recent high variance but unbiased on-policy rollouts. ACC is applied to Truncated Quantile Critics algorithm to regulate the bias with a hyperparameter. ACC achieves state-of-the-art results on the OpenAI gym continuous control benchmark and demonstrates improved performance on various tasks from the Meta-World robot benchmark.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2023)
Article
Computer Science, Artificial Intelligence
Rohit Mohan, Thomas Elsken, Arber Zela, Jan Hendrik Metzen, Benedikt Staffler, Thomas Brox, Abhinav Valada, Frank Hutter
Summary: The success of deep learning has resulted in an increased demand for neural network architecture engineering. Neural architecture search (NAS) has emerged as a popular field for automatically designing neural network architectures in a data-driven manner. NAS has become more applicable to dense prediction tasks in computer vision, such as semantic segmentation or object detection, by incorporating weight sharing strategies. This manuscript provides an overview of NAS for dense prediction tasks, discussing the unique challenges and surveying approaches for addressing them to facilitate future research and application.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2023)
Article
Robotics
Jose Arce, Niclas Voedisch, Daniele Cattaneo, Wolfram Burgard, Abhinav Valada
Summary: In this work, a novel transformer-based head for point cloud matching and registration is proposed for loop closure detection and registration in LiDAR-based SLAM frameworks. The panoptic information is leveraged during training to improve the matching problem. Extensive evaluations demonstrate that PADLoC achieves state-of-the-art results on multiple real-world datasets.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2023)
Article
Robotics
Kshitij Sirohi, Sajad Marvi, Daniel Buescher, Wolfram Burgard
Summary: This paper introduces a novel task of uncertainty-aware panoptic segmentation, aiming to predict per-pixel semantic and instance segmentations with per-pixel uncertainty estimates. The authors define two novel metrics, uncertainty-aware Panoptic Quality (uPQ) and panoptic Expected Calibration Error (pECE), for quantitative analysis. They propose a top-down Evidential Panoptic Segmentation Network (EvPSNet) with a panoptic fusion module leveraging predicted uncertainties.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2023)
Article
Robotics
Daniel Honerkamp, Tim Welschehold, Abhinav Valada
Summary: Despite its importance, mobile manipulation remains a significant challenge due to the need for integration of end-effector trajectory generation and navigation skills. Existing methods struggle with controlling the large configuration space and navigating dynamic and unknown environments. In this work, we introduce a new approach called Neural Navigation for Mobile Manipulation (NM2-M-2) that extends the decomposition of tasks in complex obstacle environments, enabling robots to perform a broader range of tasks in real-world settings. The approach demonstrates capabilities in extensive simulation and real-world experiments.
IEEE TRANSACTIONS ON ROBOTICS
(2023)
Article
Robotics
Julia Hindel, Nikhil Gosala, Kevin Bregler, Abhinav Valada
Summary: Perception datasets for agriculture are limited, hindering supervised learning, but self-supervised learning methods are not optimized for agricultural tasks. To address this, we propose Injected Noise Discriminator (INoD) that uses feature replacement and dataset discrimination for self-supervised learning. INoD enables the network to learn explicit representations of objects from one dataset while observing similar features from another, improving performance on downstream tasks. We also introduce the Fraunhofer Potato 2022 dataset for potato field object detection, demonstrating state-of-the-art performance of our INoD pretraining strategy.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2023)
Proceedings Paper
Robotics
Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
Summary: Recent advances in vision-based navigation and exploration have made significant progress in photorealistic indoor environments. However, these methods face challenges in long-horizon tasks and generalizing to unseen environments. This study proposes a novel reinforcement learning approach that combines short-term and long-term reasoning in a single model, achieving exceptional performance in continuous action spaces. Extensive experiments demonstrate its ability to generalize to unseen apartment environments with limited data, as well as achieving zero-shot transfer in real-world office environments.
ROBOTICS RESEARCH, ISRR 2022
(2023)
Proceedings Paper
Robotics
Niclas Voedisch, Daniele Cattaneo, Wolfram Burgard, Abhinav Valada
Summary: In this work, we propose CL-SLAM, a novel task that extends the concept of lifelong SLAM from a single dynamically changing environment to sequential deployments in several drastically differing environments. To address this task, we introduce CL-SLAM, which leverages a dual-network architecture to adapt to new environments and retain knowledge from previously visited environments. We compare CL-SLAM to learning-based and classical SLAM methods, and demonstrate the advantages of leveraging online data.
ROBOTICS RESEARCH, ISRR 2022
(2023)
Proceedings Paper
Automation & Control Systems
Johan Vertens, Wolfram Burgard
Summary: In this research, a real-time simulation method for synthesizing photorealistic RGB images and sensor-realistic depth maps is proposed. This method can include dynamic objects and improve the testing and validation of robotic perception systems. By using static samples and multimodal cues from CAD models, realistic images can be synthesized, which has been demonstrated on datasets recorded in different setups.
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS)
(2022)
Proceedings Paper
Automation & Control Systems
N. Passalis, S. Pedrazzi, R. Babuska, W. Burgard, D. Dias, F. Ferro, M. Gabbouj, O. Green, A. Iosifidis, E. Kayacan, J. Kober, O. Michel, N. Nikolaidis, P. Nousi, R. Pieters, M. Tzelepi, A. Valada, A. Tefas
Summary: Existing deep learning frameworks are not readily applicable to robotics due to the specific challenges in learning, reasoning, and embodiment. The high complexity and need for specialized hardware accelerators increase the effort and cost of employing deep learning models in robotics. Additionally, current deep learning methods lack active perception, limiting their ability to interact with the environment. This paper presents OpenDR, an open and modular deep learning toolkit for robotics, aiming to address these challenges.
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Rohit Mohan, Abhinav Valada
Summary: The article introduces the way humans perceive the world through modal perception and proposes a new task, namely amodal panoptic segmentation. To facilitate research on this task, the article extends two existing datasets and proposes a new segmentation network. The experimental results demonstrate that this method achieves state-of-the-art performance on the benchmarks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Christopher Lang, Alexander Braun, Abhinav Valada
Summary: This article challenges the prevalence of the one-hot approach in closed-set object detection and demonstrates through experimental results that knowledge-based class representations are more semantically reliable.
PATTERN RECOGNITION, DAGM GCPR 2022
(2022)
Proceedings Paper
Robotics
Mayank Mittal, Rohit Mohan, Wolfram Burgard, Abhinav Valada
Summary: This paper introduces a life-saving technology using unmanned aerial vehicles equipped with bioradars to identify survivors after natural disasters. The technology requires UAVs to autonomously navigate and land on debris piles. The paper proposes a new landing site detection algorithm and conducts experiments using a synthetic dataset and a simulation environment.
ROBOTICS RESEARCH: THE 19TH INTERNATIONAL SYMPOSIUM ISRR
(2022)