HomePublications

A deep learning approach for generalized speech animation

Research output: Contribution to journalArticle

Open Access permissions

Open

Documents

DOI

Authors

Organisational units

Abstract

We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech. Our approach uses a sliding window predictor that learns arbitrary nonlinear mappings from phoneme label input sequences to mouth movements in a way that accurately captures natural motion and visual coarticulation effects. Our deep learning approach enjoys several attractive properties: it runs in real-time, requires minimal parameter tuning, generalizes well to novel input speech sequences, is easily edited to create stylized and emotional speech, and is compatible with existing animation retargeting approaches. One important focus of our work is to develop an effective approach for speech animation that can be easily integrated into existing production pipelines. We provide a detailed description of our end-to-end approach, including machine learning design decisions. Generalized speech animation results are demonstrated over a wide range of animation clips on a variety of characters and voices, including singing and foreign language input. Our approach can also generate on-demand speech animation in real-time from user speech input.

Details

Original languageEnglish
Article number93
Number of pages11
JournalACM Transactions on Graphics
Volume36
Issue number4
DOIs
StatePublished - Jul 2017
Peer-reviewedYes

View graph of relations

ID: 118977142

Related by author
  1. A Decision Tree Framework for Spatiotemporal Sequence Prediction

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  2. Joint Learning of Facial Expression and Head Pose from Speech

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  3. Predicting Head Pose in Dyadic Conversation

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

  4. Predicting Head Pose from Speech with a Conditional Variational Autoencoder

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related by journal