A team out of the University of North Carolina at Chapel Hill are working on an AI that can add sound effects to video clips, and humans can’t tell the difference.
Today we get an answer thanks to the work of Yipin Zhou and pals at the University of North Carolina at Chapel Hill and a few buddies at Adobe Research. These guys have trained a machine-learning algorithm to generate realistic soundtracks for short video clips.
Indeed, the sounds are so realistic that they fool most humans into thinking they are real. You can take a test yourself here to see if you can tell the difference.
The team take the standard approach to machine learning. Algorithms are only ever as good as the data used to train them, so the first step is to create a large, high-quality annotated data set of video examples.
The team create this data set by selecting a subset of clips from a Google collection called Audioset, which consists of over two million 10-second clips from YouTube that all include audio events. These videos are divided into human-labeled categories focusing on things like dogs, chainsaws, helicopters, and so on
To train a machine, the team must have clips in which the sound source is clearly visible. So any video that contains audio from off-screen events is unsuitable. The team filters these out using crowdsourced workers from Amazon’s Mechanical Turk service to find clips in which the audio source is clearly visible and dominates the soundtrack.
Read More at Technology Review