☆ 4.7 Article

Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding

AEROSPACE (2023)

Journal

AEROSPACE

Volume 10, Issue 10, Pages -

Publisher

MDPI

DOI: 10.3390/aerospace10100898

Keywords

air traffic control communications; automatic speech recognition and understanding; OpenSky Network; callsign recognition; ADS-B data

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper discusses the integration of artificial intelligence into air traffic control (ATC) communications in order to lessen the workload of air traffic controllers. The lessons learned from the ATCO2 project, which developed a platform to collect, preprocess, and transcribe real-time ATC audio data, are explored. The paper reviews various techniques, including automatic speech recognition (ASR), natural language processing, English language identification, and contextual ASR biasing with surveillance data. The release of the ATCO2 corpora, along with the open-sourcing of its data, encourages research in the field and allows the development of ASR systems when little to no ATC audio transcribed data is available. The proposed ASR system trained with ATCO2 achieves a lower word error rate (WER) compared to out-of-domain transcriptions, indicating its effectiveness.

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at integrating artificial intelligence (AI) into ATC communications in order to lessen ATCos's workload. However, the development of data-driven AI systems for understanding of spoken ATC communications demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, which aimed to develop an unique platform to collect, preprocess, and transcribe large amounts of ATC audio data from airspace in real time. This paper reviews (i) robust automatic speech recognition (ASR), (ii) natural language processing, (iii) English language identification, and (iv) contextual ASR biasing with surveillance data. The pipeline developed during the ATCO2 project, along with the open-sourcing of its data, encourages research in the ATC field, while the full corpus can be purchased through ELDA. ATCO2 corpora is suitable for developing ASR systems when little or near to no ATC audio transcribed data are available. For instance, the proposed ASR system trained with ATCO2 reaches as low as 17.9% WER on public ATC datasets which is 6.6% absolute WER better than with out-of-domain but gold transcriptions. Finally, the release of 5000 h of ASR transcribed speech-covering more than 10 airports worldwide-is a step forward towards more robust automatic speech understanding systems for ATC communications.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Effects of Language Ontology on Transatlantic Automatic Speech Understanding Research Collaboration in the Air Traffic Management Domain

Shuo Chen, Hartmut Helmke, Robert M. M. Tarakan, Oliver Ohneiser, Hunter Kopald, Matthias Kleinert

Summary: As the use of Automatic Speech Recognition and Understanding (ASRU) in Air Traffic Management (ATM) is developed worldwide, the importance of Air Traffic Control (ATC) language ontologies in facilitating research collaboration becomes evident. This paper extends the topic by discussing the specific ways in which ontologies enable the sharing and collaboration of data, models, algorithms, metrics, and applications in the ATM domain. Additionally, a comparative analysis of word frequencies in ATC speech between the United States and Europe highlights the need for region-specific models due to differences in underlying corpus data.