☆ 4.7 Article

Multimodal End-to-End Autonomous Driving

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2022)

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

卷 23, 期 1, 页码 537-547

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TITS.2020.3013234

关键词

Semantics; Task analysis; Laser radar; Autonomous vehicles; Cameras; Multimodal scene understanding; end-to-end autonomous driving; imitation learning

类别

Engineering, Civil Engineering, Electrical & Electronic Transportation Science & Technology

资金

Chinese Scholarship Council (CSC) [201808390010]
Spanish (MINECO/AEI/FEDER, UE) [TIN2017-88709-R]
Catalan AGAUR [2017FI-B1-00162]
ICREA under the ICREA Academia Program

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A crucial component of an autonomous vehicle is the AI driver, and today there are different paradigms for its development. This paper focuses on end-to-end autonomous driving and analyzes whether combining multiple modalities can produce better AI drivers. The study shows that early fusion multimodality outperforms single modality.

A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g., LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.

Multimodal End-to-End Autonomous Driving

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multimodal End-to-End Autonomous Driving

期刊

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文