期刊
ACM TRANSACTIONS ON GRAPHICS
卷 37, 期 4, 页码 -出版社
ASSOC COMPUTING MACHINERY
DOI: 10.1145/3197517.3201292
关键词
facial animation; neural networks
资金
- NSERC
- NSF [CHS-1422441, CHS-1617333, IIS-1617917]
- Direct For Computer & Info Scie & Enginr
- Div Of Information & Intelligent Systems [1422441] Funding Source: National Science Foundation
We present a novel deep-learning based approach to producing animatorcentric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据