☆ 4.7 Article

VisemeNet: Audio-Driven Animator-Centric Speech Animation

ACM TRANSACTIONS ON GRAPHICS (2018)

期刊

ACM TRANSACTIONS ON GRAPHICS

卷 37, 期 4, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3197517.3201292

关键词

facial animation; neural networks

类别

Computer Science, Software Engineering

资金

NSERC
NSF [CHS-1422441, CHS-1617333, IIS-1617917]
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [1422441] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We present a novel deep-learning based approach to producing animatorcentric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.

VisemeNet: Audio-Driven Animator-Centric Speech Animation

期刊

ACM TRANSACTIONS ON GRAPHICS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

VisemeNet: Audio-Driven Animator-Centric Speech Animation

期刊

ACM TRANSACTIONS ON GRAPHICS

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文