Details: Written by: DigiTrends4U.com; Category: News; Published: 29 February 2024

Alibaba's EMO AI System Creates Realistic Talking and Singing Videos from Photos

Alibaba's Institute for Intelligent Computing has developed a new AI system called "EMO", short for "Emote Portrait Alive" that can animate a single portrait photo and generate realistic talking and singing videos.

Just in 👀

this is the most amazing audio2video I have ever seen.
It is called EMO: Emote Portrait Alive pic.twitter.com/3b1AQMzPYu
— Stelfie the Time Traveller (@StelfieTT) February 28, 2024

The system, described in a research paper published on arXiv, is able to create fluid and expressive facial movements and head poses that closely match the nuances of a provided audio track. This represents a major advance in audio-driven talking head video generation, an area that has challenged AI researchers for years.

EMO uses a direct audio-to-video synthesis approach, bypassing the need for 3D models or facial landmarks.

The system employs a diffusion model and has been trained on a dataset of over 250 hours of talking head videos. EMO outperforms existing methods in video quality, identity preservation, and expressiveness.

It can also create singing videos with appropriate mouth shapes and facial expressions synchronized to the vocals.