Barack Obama is the Gold Standard for Fake Lip-Sync Videos.

“Fake news” was named the Collins Dictionary’s Word of the Year for 2017, and AI has just made it a lot more credible. The Montreal Institute for Learning Algorithms (MILA) recently unveiled ObamaNet, a photorealistic lip-sync neural network that can make anyone appear to be saying anything.

What is ObamaNet?

ObamaNet is made up of three trainable neural modules: a text-to-speech network based on Char2Wav; a time-delayed LSTM to generate mouth-keypoints synced to the audio; and a network based on Pix2Pix to generate video frames conditioned on the key points — which is emerging as a good general-purpose solution for image-to-image translation problems.

Using generative networks for images and videos is nothing new, but speech synthesis has been the subject of decades of research. ObamaNet combines the best features of both. MILA researchers claim that their system “can be trained on any set of close shot videos of a person speaking, along with the corresponding transcript.” The end result is a system that generates speech from arbitrary text and modifies it to look natural and realistic based on the mouth area of an existing video.”

According to the MILA researchers, they chose the former US President because “his videos are commonly used to benchmark lip-sync methods.” There is also more online video data for public figures like Obama. MILA extracted 17 hours of footage from Obama’s 300 weekly presidential addresses for the project.

Supasorn Suwajanakorn of the University of Washington’s Graphics and Imaging Laboratory wrote the paper Synthesizing Obama: Learning Lip Sync from Audio, which was published in July 2017. (GRAIL). Suwajanakorn’s results were quickly viewed by over 750,000 people after he uploaded them to YouTube.

GRAIL’s synthesised Obama video sparked quite a debate online. The model was so effective that many YouTube viewers were terrified, with one warning, “I see the first episode of Black Mirror has begun.” Others praised the videos, saying, “It’s better that the public is aware of such technology than that it is oblivious.” If we were ignorant, we would believe things without questioning whether they were tampered with.”

The project was dubbed “the future of fake news” by The Guardian. We’ve been told for a long time not to believe everything we read, but soon we’ll have to question everything we see and hear.”

MILA claims that their model differs from the GRAIL project in that it employs a neural network topped with a text-to-speech synthesizer rather than a traditional computer vision model. ObamaNet was developed in collaboration with Lyrebird.ai, a beta voice synthesis product that allows users to generate text-to-speech files in their own “digital voice” after providing just one minute of sample speech.

Conclusion

While applications are still limited, projects like Synthesizing Obama and ObamaNet offer a glimpse of the technology’s future potential, including the chilling prospect of just how realistic fake news may become.

You May Also Like

About the Author: Prak