Article
Neurosciences
Steven Brown, Ye Yuan, Michel Belyk
Summary: The frame/content theory of the origins of speech proposes that jaw oscillations provided an evolutionary scaffold for syllable structure, with the human primary motor cortex showing overlapping representations of the larynx and jaw muscles, supporting the coupling between vocalization and jaw oscillations to generate syllables. This suggests that humans inherited voluntary control of jaw oscillations from ancestral species and added voluntary control of vocalization via the evolution of a new brain area near the jaw region in the human motor cortex.
JOURNAL OF COMPARATIVE NEUROLOGY
(2021)
Article
Neurosciences
Hasini R. Weerathunge, Tiffany Voon, Monique Tardif, Dante Cilento, Cara E. Stepp
Summary: This paper examines the relationships between different subsystems involved in speech production to understand its underlying mechanisms. The results suggest that the laryngeal and articulatory speech production subsystems operate with differential auditory and somatosensory feedback control mechanisms. This indicates that current speech motor control models should consider decoupling the laryngeal and articulatory domains for a better understanding of speech motor control processes.
EXPERIMENTAL BRAIN RESEARCH
(2022)
Article
Neurosciences
Michel Belyk, Rachel Brown, Deryk S. Beal, Alard Roebroeck, Carolyn McGettigan, Stella Guldner, Sonja A. Kotz
Summary: Vocal flexibility, particularly in speaking and singing, is a key feature of the human species thanks to the neural pathways and coordination in the brain. Two larynx motor areas have been identified in the human brain, with their functional roles in speech motor control still under investigation. The integration of laryngeal and respiratory systems in these areas plays a crucial role in vocalization and may have implications for the evolution of speech.
Article
Automation & Control Systems
Ke Wang, Xi Liu, Chien-Ming Chen, Saru Kumari, Mohammad Shojafar, M. Shamim Hossain
Summary: The article focuses on studying the principle of voice cloning attack technology and proposing a method for voice clone attack in preparation for the construction of a specific voice recognition system in the future. The key technology lies in solving the problem of synthesizing high-quality personalized speech of the target speaker under small samples. The transductive voice transfer learning method is introduced to effectively synthesize the speech of the target speaker.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
(2021)
Article
Neurosciences
Anubhav Jain, Kian Abedinpour, Ozgur Polat, Mine Melodi caliskan, Afsaneh Asaei, Franz M. J. Pfister, Urban M. Fietzek, Milos Cernak
Summary: This study utilized deep learning-based speech processing to analyze voice recordings of PD patients before and after medication, showcasing higher accuracy with p-CRNN (82.35%) compared to PAC (73.08%) in distinguishing ON and OFF motor states. Both methods provide insights into the distinctive components of PD patients' speech, including features amenable to dopaminergic treatment.
FRONTIERS IN HUMAN NEUROSCIENCE
(2021)
Article
Multidisciplinary Sciences
Abigail R. Bradshaw, Carolyn McGettigan
Summary: Research has found that when two speakers engage in synchronous speech, there is a certain level of convergence in their voice fundamental frequency, which may have a short-term fluency-enhancing effect in people who stutter. However, under visual synchronous speech conditions, there was no evidence of voice convergence, highlighting the importance of self- and other-speech feedback in speech production processes.
Article
Neurosciences
Konstantina Kilteni, Christian Houborg, Henrik Ehrsson
Summary: The brain compensates for intrinsic delays in sensory feedback by predicting the sensory consequences of movement through a forward model. However, even minimal temporal errors disrupt this predictive attenuation, leading to perceptual and neural changes.
JOURNAL OF NEUROSCIENCE
(2023)
Article
Multidisciplinary Sciences
Tongjie Ouyang, Zhijun Yang, Huilong Xie, Tianlin Hu, Qingmei Liu
Summary: A model called Voice Style Unification Generative Adversarial Network (VSUGAN) is proposed to transfer voice style and improve the quality of recorded audio in different environments without retraining the network. The VSUGAN successfully reduces style differences and enhances the overall audio recording quality.
SCIENTIFIC REPORTS
(2021)
Article
Chemistry, Analytical
Jenifa Gnanamanickam, Yuvaraj Natarajan, K. R. Sri Preethaa
Summary: Speech recognition technology has become more common, and speech enhancement algorithms can improve accuracy, especially in the presence of background noise. Hybrid algorithms can effectively reduce external noise and enhance speech recognition accuracy.
Article
Computer Science, Hardware & Architecture
Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, Xiangyang Li
Summary: A "Speech Sanitizer" is designed to perturb users' speech recordings for safe sharing with third parties, effectively reducing the chance of voice identification and maintaining speech recognition accuracy. This method has been proven to be efficient and effective in experiments, offering the ability to adjust privacy levels to enhance speech recognition accuracy.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING
(2021)
Article
Acoustics
Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa, Hiroshi Ishiguro
Summary: Emotional voice conversion aims to convert neutral voice to emotional voice while retaining linguistic information and speaker identity. The proposed Source-Filter-based Emotional VC model (SFEVC) effectively filters speaker-independent emotion cues from timbre and pitch features. A novel training strategy based on the 2D Valence-Arousal (VA) space further improves conversion quality. Experimental results show that the proposed SFEVC model outperforms baselines and achieves state-of-the-art performance in speaker-independent emotional VC.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
(2023)
Review
Neurosciences
Gary Weismer
Summary: This position paper presents a perspective on the debate about the role of oromotor nonverbal gestures in understanding speech motor control. The paper argues that a coherent rationale is needed for the use of oromotor nonverbal tasks. The contrasting predictions of two models of speech motor control, the Integrative Model and Task-Dependent Model, are discussed.
Article
Biology
Sheena Waters, Elise Kanber, Nadine Lavan, Michel Belyk, Daniel Carey, Valentina Cartei, Clare Lally, Marc Miquel, Carolyn McGettigan
Summary: The study found that highly trained singers demonstrated more accurate laryngeal modulation during speech imitation tasks, with stronger representation of vocal tract length in the right somatosensory cortex. This suggests a common neural basis for enhanced vocal control in speech and song.
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES
(2021)
Article
Psychology, Multidisciplinary
Sarah Cheung, Molly Babel
Summary: This study examines the self-voice benefit in early bilingual individuals. The results show that listeners are more accurate in recognizing minimal pairs produced in their own voice compared to those produced by others with similar degrees of acoustic contrast.
FRONTIERS IN PSYCHOLOGY
(2022)
Article
Audiology & Speech-Language Pathology
Carla Franco Hoffmann, Carla Aparecida Cielo
Summary: This study described the auditory-perceptual and acoustic characteristics of dysphonic schoolchildren aged 4-7 from private and public schools, finding that their vocal characteristics improved with age, while auditory parameters were moderate and MPT values were reduced.