Alibaba's EMO AI System Creates Realistic Talking and Singing Videos from Photos
Alibaba's Institute for Intelligent Computing has developed a new AI system called "EMO", short for "Emote Portrait Alive" that can animate a single portrait photo and generate realistic talking and singing videos.
Just in 👀
— Stelfie the Time Traveller (@StelfieTT) February 28, 2024
this is the most amazing audio2video I have ever seen.
It is called EMO: Emote Portrait Alive pic.twitter.com/3b1AQMzPYu
The system, described in a research paper published on arXiv, is able to create fluid and expressive facial movements and head poses that closely match the nuances of a provided audio track. This represents a major advance in audio-driven talking head video generation, an area that has challenged AI researchers for years.
EMO uses a direct audio-to-video synthesis approach, bypassing the need for 3D models or facial landmarks.
The system employs a diffusion model and has been trained on a dataset of over 250 hours of talking head videos. EMO outperforms existing methods in video quality, identity preservation, and expressiveness.
It can also create singing videos with appropriate mouth shapes and facial expressions synchronized to the vocals.