Deep neural network models of sound localization reveal how perception is adapted to real-world environments [video]
Date Posted:
January 27, 2022
Date Recorded:
January 27, 2022
CBMM Speaker(s):
Andrew Francl ,
Josh McDermott All Captioned Videos Publication Releases
Description:
CBMM researchers and authors, MIT graduate student Andrew Francl and MIT Prof. Josh McDermott, discuss their latest research as published in Nature Human Behavior.
[MUSIC PLAYING] ANDREW FRANCL: Hi, I'm Andrew Francl, I'm a graduate student in the McDermott Lab in the Department of Brain and Cognitive Sciences at MIT. One of the key things that the human auditory system does is allow you to tell where a sound is coming from in your environment and this has been studied for a really long time.
JOSH MCDERMOTT: And the problem is interesting because a sound's location is not made explicit on the manifold of sensory receptors. So you have these two ears and each ear kind of creates a map of frequency but the spatial location isn't really clearly indicated in what happens in the ear. But there's nonetheless cues that are provided, in particular, by the fact that we have two ears, and by other things that give you information about where sounds occur in the world.
ANDREW FRANCL: But when we get to a real situation where we have background noise and echoes in your environment, and multiple sound sources, we really don't understand how the human brain does this.
JOSH MCDERMOTT: So we study human hearing. And our long-term goal is to build good predictive models of human hearing. And by that I mean, we'd like to end up with a computer program that takes sounds as input and then can make all the kinds of judgments that a person can make about sound.
Andrew Francl, who led this project, came up with a way to artificially generate an enormous data set of realistic binaural audio where the sound source location was known. The computer program simulates what the world would do to the sound. And in particular, it outputs the binaural audio that would enter the left and the right ears of a person if the person was at a particular location in that world relative to the sound source that you need to localize and the noise sources. And so with that very large data set, we can then use machine learning methods in order to train a model to produce sound locations given binaural audio.
ANDREW FRANCL: The simulation we thought was pretty high fidelity but one of the most amazing things we found is that after training in this simulated environment, the model actually generalized to real-world data. So once we built a model that could localize sounds in real environments the next question was, does it work the same way as humans? And to answer that we ran a series of psychophysics experiments on the model that had previously been run on humans from the prior literature. One of the really exciting moments for us was when we realized that across all of these different human psychophysics experiments the model really mirrored and replicated the same sorts of behaviors. And so despite being trained just to localize sounds in naturalistic environments, this model was able to replicate human localization behavior across a really wide variety of results that it hadn't been fit to in any way.
JOSH MCDERMOTT: Because he had trained this model in this virtual training environment, we could do something really interesting, which is we can ask the extent to which the behavioral phenotype of human spatial hearing is really critically dependent on optimizing for the problem of sound localization in natural conditions because you can take this virtual world and do things that would not be possible in the real world. An example of this is training in an anechoic world, and that's a world where there are no echoes. Which is very, very different from what you and I experience in everyday life.
ANDREW FRANCL: And what we did with these models is after training them we compared them on the same set of human psychophysics that we had compared our original model to. What we found is that when we deviated in training from the natural world, the final models deviated from human behavior. And what this suggests is that human perception is really shaped and constrained by aspects of our environment.
JOSH MCDERMOTT: So I think there are two key contributions of the work. The first is that this is a step forward in the general research program of building a model of the auditory system because we've got a system now that localizes sounds like humans localize sounds, and there's a lot of interesting applications that stem from that. But the second main contribution is that this illustrates what I think is a very powerful method for understanding why the human mind is the way that it is.
ANDREW FRANCL: And the way we do this is by optimizing a model under different environmental constraints and then looking at how that model behaves in comparing it to human behavior. And we look for where the model aligns with human behavior and where it deviates from human behavior and in doing so we can learn something about how specific aspects of perception are adapted to the environment.
Associated Research Module: