Algorithm for Sound

Last month, we reported on a new level of computer artificial intelligence (AI) that used Turing learning to identify the rules governing patterns of movement in a robot swarm without prior input. Now a team of MIT researchers is applying the same kind of deep learning to generate realistic sounds using AI.

To develop the algorithm, the researchers fed approximately 1,000 videos of an estimated 46,000 distinct sounds into an AI system that analyzed the pitch, loudness and other qualities of the sounds generated by various different objects as they were hit, scraped and prodded. A drumstick was used throughout this phase of the experiment to provide a consistent method for producing the sounds.

The system was then presented with silent video clips of objects being struck to see whether or not it could simulate an appropriate sound. Results showed that it could accurately distinguish between a vast array of sounds, from a low-pitched thud to a high-pitched click and from short, tapping sounds to long, slow waves. And when human subjects were asked to compare actual recorded sounds to those generated by the algorithm, they selected the fake sound twice as often as a baseline algorithm.

This research represents an important first step in integrating sight and sound to mimic the way humans actually learn and interact with the world.

For information: Andrew Owens, Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory (CSAIL), The Stata Center, Building 32, 32 Vassar Street, Cambridge, MA 02139; phone: 617-253-5851; fax: 617-258-8682; email: acowens@mit.edu; website: http://www.csail.mit.edu/