Nvidia’s Audio2Face Omniverse Kit harnesses the power of deep learning AI technology to provide real-time facial animation from a single audio source. Audio2Face allows artists to simplify 3D character facial animation and instantly generate facial expression’s and reactions from voice-overs. Audio2Face also allows users to retarget the captured animations to any 3D human or human-esque face whether realistic or Stylized.
See below our initial testing of Audio2Face running on the GODBOX On-Set workstation.
How Audio2Face works?
Audio2Face is by default preloaded with “Digital Mark” a base 3D character to provide previs of generated blend shapes and full facial animation. To start animating Mark, it’s a simple streamed lined pipeline of just uploading a audio source (.WAV) into Audio2Face and instantly generating a adjustable facial animation. Behind the interface the audio source input is being fed into a pre-trained Deep Neuron Network and the output drives 3D vertices of the character mesh providing the real-time facial animation. Audio2Face also gives users the ability to adjust various post processing parameters providing the perfect expressions and fidelity for the selected voice-over.
Additional & Future Features
Live Audio – Currently the audio input only supports pre-recorded files but in future Nvidia plans to support live audio source input from microphones. Being able to process any language in real-time. Check out the below tests in English, French, Italian and Russian.
Emotion Control – Audio2face in future will allow for more dynamic emotion control and the AI neuron network will provide automatic manipulation of the face, eyes, mouth, tongue, and head motion to better match the desired emotional range.