On the Neural Machinery of Faces (1:03:18)
June 8, 2016
June 8, 2016
All Captioned Videos CBMM Summer Lecture Series
Winrich Freiwald, Professor of Neurosciences and Behavior at The Rockefeller University, discusses the connection between sociality and intelligence, the importance of face processing for social cognition, and the neural machinery underlying face recognition in the primate brain. Dr. Freiwald summarizes the stages of visual processing from the retina to primary visual cortex, and the history of the discovery of face selective cells in inferotemporal (IT) cortex. The talk then focuses on the behavior of neurons in the face processing network within IT cortex.
WINRICH FREIWALD: So I'll tell you a little bit like private social cognition. On my background side, I studied biology primarily as a major in Germany. I studied in Gottingen and Tubingen. I did my PhD with Wolf Singer at the Max Planck Institute for Brain Research in Frankfurt. And this is where I was introduced to visual neuroscience.
So the question we're trying to tackle is, how is it that we perceive the world as a three-dimensional space composed of objects. So how is it that we don't have piecemeal information about individual attributes of the outside world? But how do we actually group them together into functional units that we call objects in our perception. And this really was one of the major themes of my line of research.
Then about 10 years ago, a bit more I got interested in faces was in part because I did a post doc with Nancy Kanwisher. And then I teamed up with Doris Tsao, at the time was a grad student with Marge Livingston here in town. And so we were looking for, and I'm going to tell you more about that because this has been the focus of our research for a long time, for areas and sales that are specialized in face processing. And I'll tell you a little bit like why we're interested in faces.
But originally one way it could be motivated in studying faces is that faces are just like one object category. So it's an example of a particularly well-defined class of objects that you can study and try to figure out how the brain is recognizing them.
But faces also have all these social dimensions. And so over time, the last years, so become more and more of a social neuroscientist. And so I'll also tell you a bit about those dimensions as we go along.
So we studied this in primates, and there are reasons why we should be all interested in primates because we are primates. And so if we're asking the question, why we're here in an evolutionary sense, if you're asking why we are smart, then likely there are certain traces of what makes us smart or some other species, and these species could be primates.
AUDIENCE: What's the relationship between teacher and [INAUDIBLE].
WINRICH FREIWALD: Yes, so anyone that's a guest-- good question. So what do you actually see in this picture? So Laurel and Hardy, this is early TV. And so they have these social interactions between them where you might guess that he's not the smartest guy on the planet. But sometimes he gets happy about things, and then that makes him upset. So this is an example of a social interaction.
And so in some sense what I'm showing is just another picture, right? It's just a collection of pixels that vary in grayscale. If you put these pixels together, the images that we see in natural environments, actually just a tiny fragment of other possible images you could compose of all the pixels that are in this picture.
And then if you collapse them more and more, you see things that look like a face. And in this case, they are a face. But how do you actually do this? How do you recognize the face? And then things become more subtle that you can recognize, OK, he's smiling, you take it for granted.
But you also recognize interaction between the two. So if I ask you what do you see in this picture, the description would be pretty high level, right? Like one is smiling, the other one is upset. And then maybe you might have some background in the social history of these guys to understand what the interaction of the two is about.
Neither of them is known to be particularly intelligent. And so this is also the reason why I like to put this up and then combine this with a title that talks about intelligence or cognition because a lot of the things that really make us smart are things that we don't in everyday language associate with the word intelligence.
So if someone is really smart, what do we mean? He's maybe good at math, he might be incredible at memorizing things and do all kinds of things. But these are often things that computers are already better than us. It has been better than us for a long time. Adding two large numbers, someone could be really smart doing this in his head, but any calculator can do this better. Does this really make us smart?
[INAUDIBLE] are sometimes difficult for us to realize that [INAUDIBLE] because we take them for granted. So the fact that you can recognize a face automatically, for example, that's actually an expression of intelligence that we all have in that old evolutionary heritage. It's a very difficult computational problem-- we are still doing it somewhat better than computers. And so I like this contrast between two people that at least the older audiences are very well known as not being the most intelligent people on the planet. And then talking about primate intelligence.
So the other species who are known to be intelligent, like the octopus genome has just been sequenced. And there are lots of indications in the genome that they're very highly developed nervous systems. Octopi are not very social animals, so it's possible to be smart and solve problems without being a social animal.
It's also possible to be highly social. This is just their biology. They are born by the thousands, and then few survive, which is very different from primates where most primates survive to have offspring themselves. Then for short period of times, their social interactions, even when it comes to sex, they're not necessarily very social. But they are very smart solving problems.
The are other animals that might be very social, like if you live in a big group of wildebeest. But that doesn't mean that you necessarily have to be very smart, so you can be either intelligent or social. But what really sets primates apart, maybe not from all of the animals, is that we are both.
So there is this peculiar intersection about sociality and intelligence, and there is an hypothesis that's been phrased several times, most clearly by Nick Humphrey in 1976, that it's not by coincidence that there might be an evolutionary basis for why we are smart, and the reasons that the other primates are also smart. We live in a social environment with other smart individuals, and for us to be able to survive and to reproduce in this environment, we have to try to outsmart the others.
And so the idea that there was this social intelligence arms race where the better you were to predict the next move of someone else, the more successful you were in this environment. And through this positive feedback, then primate intelligence might have evolved.
And so it's even things like telenovelas, for example. You might think that people watching telenovelas, this is not necessarily an expression of intellectual challenge. But there's a reason why lots of people are interested in watching sitcoms or telenovelas. In essence, they're actually very complicated. People naturally memorize, oh, who is this, and what is the social relationship of this person to another person.
And we do this automatically. And this is what we are really good at, and other animals are not really good at this but at other things. And so we really have as a species, but also as a larger group of related animals, we have the specialization that we are really smart in social contexts. And we are oftentimes not necessarily so smart in nonsocial contexts.
So here's just one example. OK, this doesn't play. Not so bad. I have this later, I can show it to you as well. So here's a little story to illustrate this. So this is a female baboon, Ahla. And so she lived in South West Africa where there was a habit among some farmers to use these baboons as herding animals for their goats. And so they used them instead of dogs, they used these baboons.
And so Ahla was one of these baboons. And so her behavior pattern-- it's maybe a little hard to see. So she's here, can you see her here? You can see the goats here. So she's there to drink, she would also engage in behaviors that baboons would normally not engage in, like licking salt, which is something that goats would do. And so she would blend in with the goats to some extent.
So here she is showing some grooming behavior on some goats. That's something that monkeys do, but goats, because of lack of dexterity or for other reasons, are not so good at. And here you can see her groom this goat further.
But what's really remarkable about Ahla is that she did something obsessively that the farmers actually did not want her to do, that the farmers would also themselves not have been able to do, and that she could not stop herself from doing. So what was that? So when the herds came back at night, the farmers would typically separate the winnings from the older animals.
And Ahla wouldn't have that, so she would not rest until she paired all the little goats, the yearlings, with their mothers. And she knew exactly how to pair them. So she knew exactly which one of the little goats belonged to which one, who was the mother of whom.
And so you have to ask yourself, how is that possible. So how did she recognize, and then why did she do it? No one was giving her any benefit for this. If anything, the farmers did not want her to do it.
It just seemed like a natural behavior. To understand her social environment, which in this case wasn't even her own species but was now her social environment. And she would cognitively and then also by execution organize the social environment in terms of who's the mother of whom, and then assume, of course, the kid belongs to her mother, and then put them together.
So it's one example of cognitive structures that we find in other primates. And the other story is about individual animals. Actually, one worked at the train station and would actually do some work to direct the trains in different directions. So that they can do pretty smart things. And here, I don't know if you can see the outline here, so here she's carrying one of these yearlings around.
So there is extensive behavioral evidence in primates that they have certain kinds of social knowledge. So they know the individuals around them. You might think this is trivia, but it's not clear for other animals that they know it. They can be social animals. Like for mice, as far as I know, it's still not known if mice recognizes other mice in its local social group individually, or just as another animal that it's affiliated with.
Primates do this. And they also differentiate the other individual surrounding them by their social status, whether it's the age or the gender, and things like this. Primates also know interactions between individuals. So they understand that they can be grooming, that they can be mothering, they can be fighting.
It might not surprise you, but from a computational vision point-of-view, someone might actually teach you about this, this is very, very difficult to do. It's difficult to recognize a face or a body, but to recognize two faces, two bodies and interactions.
We get a lot of subtle signs from this. If two people embrace, for example, we get immediately, if it's just like a formal hug that they have to exchange on an occasion or if they are affectionate to each other, and the visual clues to this are minimal. But again, our brains are tuned to pick out this information because we are so invested into understanding our social environment.
And from these interactions, they can build up knowledge about the relationships between individuals. And this is what I was referring to before. Things like friendship, kinship, hierarchy, this is all the information that we have. And the way it's drawn here is to imply certain data structures. And right now we're really very far away in neuroscience to understand how these data structures could be implemented in the brain.
Let's hope that this one plays here. This is a three-day-old macaque monkey. And I've been showing this video many times. And I see a few smiles on your faces, and that's because he is really adorable. And I can say this with authority because I've seen this video now a couple hundred times, and it still gives me this warm fuzzy feeling.
The reason I was showing this video originally was not for that, but it's a great demonstration of something I'm going to tell you about in a second. So the reason this video was shot is, again, this is only a three-day-old macaque monkey in the study. There was really a newborn one, so minimal visual experience. You see he is orienting to the face of the experimenter. And now he's going to be mimicking the facial movement of the experimenter. And he's really interested, he's going up like this and mimicking the facial movement.
So it's one of the examples, maybe one of the best examples, one of the cutest examples, for the fact that faces really are very, very powerful stimuli. And therefore, there seems to be something in the brain of this little critter that already makes him orient to the face. Again, there's no proof in this video, could also have been a banana maybe oriented the same way. But the other experiments to suggest this or even prove it, that there is something different about faces to us right after birth. And there's this mimicking behavior which these rhesus monkeys engage in for about two to three weeks after birth, and then it disappears.
In human babies, it's around for two to three months, then it disappears. So there also seems to be some automatic combination between face perception and facial movement related. And all of this seems to be inborn.
But what does it mean if you had a smile on your face when you are seeing that? So we have an emotional response, right? And so again this seems trivial to you, but it's not trivial. So Charles Darwin has a whole book about this, the expression of the emotions in animals. And when we smile, this is our way to express our emotional state.
So what's really happening here is I showed you a video. Again, this is only a collection of pixels of different color put together in a certain spatial temporal configuration. But there are certain spatial temporal configurations of pixels that will get into your emotional brain in a way that is at least very hard, if not impossible, for you to control.
So in a way what's happening is that this video is manipulating you in a way that you cannot control, right? It's not just that there is some information there that you can gather, think about, and then make of it what you want. It's doing something with you automatically in ways that are beyond your control. And this is very typical of faces, so faces are doing this in multiple ways.
So this is one way that we looked at. If I show you this picture of Charles Darwin, most if you are going to recognize him. What does it mean that you activate in your memory of a person who you know? Recognizing a face and knowing the person is two different things. You could describe this as an older man with a beard and whatnot, but now you know it's this person who who you know. It's an interaction between your perception system and your memory again in the way that you cannot control.
Then there's a phenomenon called case following. The best illustration I found for this is here. So you might realize that these people are wearing goggles with really extra large eyes. And that these goggles are made in a way that their gaze is going in this direction.
If you're analyzing yourself right know, likely you've been paying attention to this location up here, and then going backwards again because you figured out there's really nothing interesting up there. But then again, looking up there or paying attention up there and going back and forth.
So I can't have this on for too long, right, because there's an automatic process where your attention is going to be first drawn to the face, and then directed away by the gaze direction. This can be controlled, but it takes some time. So there's an automatic response that occurs automatically. And again, this is just a certain configuration of pixels that is doing that to you. And I apologize for it, but that's the way it is.
If there are automatic behaviors that are triggered by face perception, they are likely specialized circuits in the brain that are automatically activated when these different things are happening to us. And so faces like a great way to get very deep into our emotional brain and to our cognitive brains, and so on and so forth, because all of these processes are triggered automatically. So it's going to be a very powerful inroad into not just perceptual systems as a model for object recognition, but also in these downstream emotional and cognitive systems.
So some examples how good we are recognizing faces. So if I show you a sequence here of different faces, this is a pretty fast presentation, but likely you're going to be able to make out that they are different identities, that maybe there are some people in there who you know.
Does anyone know who this is? Norbert Wiener was a professor at MIT, founder of Cybernetics. No? Very important, you should memorize that face.
So there's another task I like to do. So I'm going to show a couple of pictures of different objects, sometimes there's a face included, sometimes not. Shout it out when you think this is a face. Yeah, shout it out, OK.
AUDIENCE: Face. Can't see it.
WINRICH FREIWALD: I'm sorry.
WINRICH FREIWALD: So, yeah, there were lots of faces, yeah. So there's a paper that says that our face detection, so our ability to know whether there is a face and where the face is, it's actually working practically in parallel.
So if you have lots of green elements and there's one red item, it's not going to take you any time to figure out that there's a red item-- it pops out. You don't have to search for it. But if you have conjunctions of different features, typically you have to search one item after the other to figure out if this item is there or not.
But with faces, I cannot prove it, but the theory is that you are pretty much as fast in this array with fewer elements to figure out where the face was as in this array with many more elements to figure out where the face was. While when you didn't really see the face in the beginning, you would likely start to scan through the different images to convince yourself that really there is no face there. But you knew it pretty early on.
So again, this is evidence that there might be something special about faces. And then also like this demonstration by Pawan Sinha, who's a professor here at MIT. Does any one of you recognize any of of these people here?
The point here is that you were able to recognize these people who you knew. If you knew them, you recognized them, even though all these pictures are very blurry, right? So you can do this without any detailed information.
There's something about the gist of the face that you get. Sometimes it can be augmented by glasses. It's not totally fair, but you can recognize people based on this very reduced information already.
So the task that I have and other people have is to figure out how is the brain doing that. And so what I'd like to show you first is to acquaint you a little bit with some background about systems neuroscience, some basic concepts, just to get on the same page here. But at this point, I'd also just like to remind you how complicated the human brain is. So this is not quite to scale, it's bigger than it actually is. You know this because it has to fit into your skull.
But the human brain really is markedly complicated. And so if you think about it, it might just be an impossible task to figure out how it's going to work. And some of what I'm going to show to you is really going to demonstrate that there are individual cells in the brain that are actually doing things that we can understand, that make sense, that there is something about the brain that's not just like a big hologram of different things that we cannot possibly understand that's just put together in different ways of individual, but that they are understandable subunits that we can make sense of.
So do you know what know what action potentials are? Do you know something about the visual system? I'll walk you through. You know that there's something that's called the eye that's collecting the light. So it's actually the back of the eye, do you know what's in the back of the eye?
WINRICH FREIWALD: It's the part of the brain or not?
AUDIENCE: No. Yes.
WINRICH FREIWALD: It's part of the brain, it's a protrusion of the brain. What's called the optic nerve-- not really nerve, but it's connecting the eye to the rest of the brain.
Anyone knows what's here? The thalamus and then the visual part of the thalamus. Anyone? Actually says here, the lateral geniculate nucleus body.
Then from there, there's the projection to the back of your head, which is the primary visual cortex. And then there are some 30, 40-- we don't even know how many visual areas after that that are processing visual information further.
So in the back of the eye there's a thing called the photo receptor. Do you know what it's doing? It might actually say this here. Do you know what it's doing, what this means? So it's converting one form of energy into another, right? And so it's converting the light energy in the outside world into the language of the brain.
So it doesn't start with action potentials right away, but there's a greater potential at the back of the cells, and then scripted to other cells that are also showing greater potentials first. And then there's an output with action potentials.
So why do I emphasize? Anyone knows about the brain and the vat? It's actually connected to this point here, right? You can see something if someone hits you in the eye because that's when the photo receptor is being activated in a way that's not quite appropriate. But it is activated when you see something, right?
And so at that moment there is a conversion into the language of the brain, and that's it. That's all the connection that there is to the outside world. And so you can look at the eye of someone else and you can say, OK, likely what's happening is that there is a conversion of light into this language of the brain. Therefore, that person sees something in the outside world. But for ourselves, we're not able to prove that.
And that's because of this conversion. There's more to light conversion, to brain activity. Who saw this movie? Good, so I don't have. OK. It's not an advertisement.
So what happens if you're now recording from the retina in the back of the eye is that you find that if there's some light, the cells will fire. But they will not always fire. So there are different kinds of cells in the back of the eye.
So if you just show a uniform spot of light, the cell might not fire at all. It might just stay spontaneously active, might not raise its firing rate. It could be the same.
So there are these rings here. And what's indicated here is something that's called the receptive field. So does anyone know what a receptive field is? So the receptive field in visual neuroscience is defined as the region of visual space at which a stimulus can exert an influence on a given cell.
And this can take forms that are not so trivial. This is exactly what we're going to be talking about in this entire session. But there are some cells that actually like light, and then only very small region of space, and ideally a bright region surrounded by a dark region.
And so this is indicated here. So if there's something bright in the center and something dark in the surround, then the cell will be active. And that's also the opposite kind of retinal ganglia cell that has the opposite receptive field. So I'm not going to go into details here, but the point is that there's already some computation happening where the cells are not just one-to-one reflecting what the outside world is like, but they are computing a local contrast.
What we're interested in, is my local region here, is it brighter than the surround. If yes, then I'm going to get active. And if it's a lot brighter, then I'm going to be getting active a lot. Or the reverse, is it darker than surround or not.
So it's compressing the information. And it's thought that this is happening because all information has to pass to the optic nerve. And the optic nerve only has so many fibers, and therefore the amount of information that is passing through has to be limited. And so it makes sense to preprocess the information in the back of the eye and send it back to the brain.
And then what's happening there in the back of the brain? So this is really what David Hubel and Torsten Wiesel figured out what and what they won the Nobel Prize for. They did their work initially--
AUDIENCE: [INAUDIBLE] cats and the with the moving part? Is that [INAUDIBLE].
WINRICH FREIWALD: So the cats didn't figure it out, but David Hubel and Torsten Wiesel, they figured it out. Yeah, but it has to do with cats.
WINRICH FREIWALD: Yeah, yeah, yeah, yeah. So this is them here. So they had a projector. And what people knew was this organization of receptive fields in the retina, and they knew that at the next station the lateral geniculate nucleus body that I was pointing out before, that the cells also had this organization. So when people were trying to investigate what's happening in the back of the brain, the next station where information is processed, they were trying the same stimuli. They thought small spots of light should be activating the cells there, and they were not successful to do that.
And so, as you already indicated, what David and Torsten figured out with a slide projector that they could move around is that these cells actually liked oriented stimuli. And I'm going to show you a movie actually of this. But before, this is like a diagram, so they can map again a receptive field. It now has a bit of a different shape of what we've seen before.
And now, by the way, there is not a logical reason anymore why it should be at one particular location. Because you could imagine that any kind of photo receptor could be wired here with any other photo receptor input. But this is not what's happening. There are still spatially confined regions, individual fields that these cells are responding to. And most of the rest of the visual field, they're not going to respond to.
So this is one of the organizing principles there, that there is still a retinotopy there in this area. And it turns out that neighboring cells have similar locations of receptive fields. There are organized maps, and so this all makes sense.
But the main point here is that if you're moving a bar of light in this orientation over the cell, the cell will fire. If you change the orientation of this bar of light, the cell is not going to fire. What this means is that there is a new quality computed here that did not exist at the earlier stages.
If you have a cell that's only interested in whether a small spot of light is brighter than the surrounding environment, it doesn't care for orientation. It's going to fire the same way no matter which way the bar is oriented. But these cells are really going to care for that.
So I have a video which I hope is going to play. And I hope you're going to have the sound with it. It's going to illustrate how these experiments work.
So you can see the basics. What you don't see is there's an anesthetized cat from which they're recording the back of its brain. It's looking onto the screen here where they can be projecting the stimuli with a slide projector. And you see Torsten here, he's holding a pen in his hand. And so with this he's going to mark the location of receptive field.
So video quality, sorry, [INAUDIBLE].
WINRICH FREIWALD: So the video quality is not great. So reality is a little bit less blurred than it appears in this video. But you will see that there is there's this bar that's moving along. And the sound you're hearing is the amplified electrical potential that they're recording with a micro-electrode that's place next to one cell in the primary visual cortex of the cat.
And so what it means is that they can see what they're doing with the stimulus. And at the same time, they can listen to the response that the cell is generating to that stimulus. And you might hear them talk, and I don't know if you can hear what they're talking about, but it's not important.
So you can hear the response. This is a good. This is not super precise. So this cell is also direction selective. It's not just the orientation that matters, but it has to move this direction, not the opposite direction. It's another new quality that's computed in this part of the brain.
This is like a long bar more than a short one. So it works all the time, if you're doing-- yeah.
So I don't know if you've got the sense that this is really cool.
WINRICH FREIWALD: You could do this experiment, it works every time. It's a really fundamental [? finding. ?] I'll bring it back to this microphone here.
So if you look at the brain as a whole, again, there's no reason why-- well, there are reasons. But if you think about it, it could be different, right? It wouldn't necessarily have to be the case that there is something that we can understand like orientations and activity, direction, selectivity, that there are things like receptor fields that we can make sense of, that they're organized in a meaningful way.
So there are models, like how you go from here to here. So I mentioned, there's a new quality, and so what you want to explain is how can neural circuits generate this new quality of selectivity. And the model that Torsten and David came up with was intuitive, but it actually turned out to be correct.
So what they basically assumed is that you get an orientation selective cell in visual cortex by precisely wiring these concentrically receptive fields at locations that are oriented in one orientation or another orientation together. And there's been actually decades of research to prove that that model in essence is correct.
There are other things happening as well, this is not the only thing that's happening. But this precise wiring is behind what you could listen to in this video before. So there's a precise wiring behind this new quality.
Soon after David and Torsten discovered the orientation selective cells, actually starting here at MIT, Charles Gross was doing recordings in a different part of the brain. And so this is a side view of the brain of the rhesus monkey I showed you pictures of before. Again, this is the eye, early vision we what is happening here. And Charles was not recording in the back of the brain, but he was recording in this part here. It's the ventral portion of the temporal lobe. So there are three lobes of the cerebral cortex-- occipital parietal, temporal, and frontal.
And he already knew that this ventral portion of the temporal lobe was essential for object recognition. So when lesions occur to this part of the brain, object recognition was impaired. And when he lowered the recording electrode in here, he could record action potentials. And then he would present more complicated stimuli like a drawing of a monkey face or a drawing of a hand or a scrambled version of the monkey face, and so on and so forth.
And what he found were neurons like this-- so during the period of time that this stimulus was presented, a stimulus of the monkey face, you can see that there are many action potentials fired, so the cell responded to a face. And then he had controls, there were controls, but here's a biological control, the hand, another biologically valid object and the cell did not respond to it.
And I don't know if I have the slide here with other controls. So here is a human face, again, a response. You take out the eyes, still a response, but a bit weaker. A pumpkin face, also a response, but a bit weaker. Just a number of oriented bars because all these pictures have oriented bars in them. Not a response. Monkey face, again, big response. Monkey without eyes, still a response, but not so good. Scrambled, no response, even though locally all the information here is the same as here. And then again the hand, there's no response.
So this is great. This is a face cell. And I actually did not realize it, but I don't know if Bob Desimone is going to give a talk here as well. He might actually be the person in the picture, not quite sure. He worked with Charles early on. And he was describing a meeting we had last year about the history of face recognition in a most amazing way. He was describing how difficult it was to convince people that these cells actually existed.
So this paper from 1981, but the first cells they've found actually were more than a decade before. The difficulty they had was it was very difficult at the time to document any of the activities. So here you see some traces. And you were nodding as I was explaining what's actually shown here, but this is only one instance of a presentation. There are not many of them, right?
And so 12 years earlier, 15 years earlier, it was even not possible to produce these kinds of graphs. So the way papers were written is that people were describing what they did, and then they were describing the response that they got.
So here comes Charles Gross, and he's describing, OK, I found a cell that's responding selectively to a face, and I call this is a face cell. And he was reluctant to put this in because he knew that people would not be very receptive to it. And this turns out to be the case. It was very difficult to convince people that they are cells that are responding to individual faces.
And why is that? Well, I think behind it really is this intuition that if you have something as complicated as the human brain, what chance is there that this whole thing is wired up in a way that you can find one neuron at one location here in the brain that's going to respond selectively to a face. It's a very meaningful stimulus. It could be very, very different. And so it's not clear that things should be organized in this way.
So by now we found lots and lots of cells. And actually following Charles Gross, people have found many face selective cells. And so, little by little, this got more and more accepted. But initially, it was a very difficult concept for people to believe.
The other reason why it was difficult at least for some people to believe this story is that there were different notions about how the brain should work around. And one of you already mentioned the concept of sparseness. They're different kinds of sparseness, but one idea that's around is to say that you want to be efficient in the way that you're representing outside information. So for any given stimulus you only want a very few neurons to be active.
So when it comes to a face, you would just maybe want a couple of dozens of cells to be active. Or, this is a thought experiment that Jay Lettvin came up with, you might only have one cell active for one individual. So he actually coined the term grandmother neuron that maybe some of you have heard.
Anyone has heard this? One of you has heard of it? Or two of you, three, maybe. If I wait long enough, all of you will heard, all right. Because I just I just mentioned it, right, so you all have heard of it.
So here's the idea. So he was making up this tall tale of a neurosurgeon, and he would perform a virtual surgery on one of his patients where he would take out a small set of neurons. And then afterwards this person was not able anymore to recognize his mother. And so from this and other stories like this the term was coined grandmother neuron.
And when I grew up in neuroscience, which is now a couple decades ago, that was the thing that everyone believed could not be true, that there would be just one cell and one cell only to represent one individual person that you knew. Instead, people talked about other concepts.
So Jerzy Konorski, he turned to the term gnostic unit, but it's a similar idea that it would be one unit that's responding when you see your grandmother or some other person or some other thing that you know. And so that this is almost similar to the neural doctrine of perception, that there's a one-to-one correspondence between the activity of one cell and your percept of a person.
There were more relaxed ideas of Horace Barlow. He talked about the pontifical cell. He thought about the nervous system as organized like a hierarchy, a church hierarchy, for example, where there's only one pope at the top, but there are lots of lower level clerics reporting to the next higher level. But there are ever fewer people at the higher levels. So that the representation, if you think of this as an analogy to nerve cells who are active, will become more and more sparse at the top.
But at the same time, there were other concepts around, like Donald Hebb's cell assembly concept, or Karl Lashley's mass action concept. He viewed the brain as a hologram where every information would be completely distributed across the entire brain. And this of course is the complete opposite view to that of the grandmother neuron.
And so if you are in this environment where there's these vastly different views about how the brain should work, and now you want to report that there's one cell that you found that's responding selectively to a face, you might imagine that you might be met with quite a bit of skepticism.
So if you want to recognize objects, right, you want to be prepared to recognize any object because there's certain constraints on what might be out there, but they're not necessarily so tight that you could rule out lots of possibilities. There are always chances for things to be combined in ways that have never been combined before. You should have a system that should be able to recognize it.
And so, yes, it doesn't make sense to think that you could pre-wire one cell for every possible thing that you could possibly ever encounter because you couldn't have enough cells. And second, how would you even know how to wire it up in the first place, right?
So the extreme version which no one ever really advocated is clearly not the case. When it comes to ensemble codes, however, there is a broad range of ensemble codes that you could think about. So one particular feature of the ensemble code that [? Donald Happ ?] talked about was not just that there are lots of neurons and coding in a combinatorial fashion, a certain stimuli, but also that these ensembles are formed through associative learning so that they actually are functionally interacting with each other. And so the idea of an ensemble, it's also the notion of cohesiveness, that activity of some neurons in the ensemble could be triggering the activity of others. And this could lead to pattern completion, for example. And this is one line of research in neural network that's been explored. It's also being explored again at the center here.
But then there's the notion of sparseness, right, that there are certain stimuli that maybe you're going to be devoting lots of neurons to represent there may be others that you only representing a few to.
And faces, which is really the center of the work that I'm doing and my lab is doing are a great model to actually address that. And the brief answer is this-- no single answer. And hopefully I'm going to get to this in the next 20 minutes why there's no answer. If I don't, then I can actually tell you afterwards.
Let's just think very briefly what a face is. So from a vision science perspective, one way to say it, this is a particular class of three-dimensional objects that have certain structural features to them on which you can have a certain texture. And that's basically what a face is.
In addition, there's also some dynamics, so we can say it's a four-dimensional stimulus. Not all combinations are possible. There's a certain configuration that's the same for all faces, otherwise we wouldn't call them faces.
You can debate if there's a third eye if you would still call a face, but normally it's only two eyes, for example. But then the features are differing the texture is differing, and the spatial relationships between the features might be varying to some degree. But overall all the faces have the same structure.
And, yes, when it comes to texture, of course, textures can be very complicated. Hair can be very complicated, there can be a mustache. And as we're aging, there are interesting features in our faces emerging. And then what you see is the intrinsic property of this face that can vary a lot depending on the orientation of the face, illumination, and other things, like how close is the face, how far apart is it.
And so the challenge for an object recognition system in general and for face recognition systems in particular is to be able to recognize faces under all these different circumstances. So what do you mean by facial recognition? There are very different qualities involved.
From an anatomical point-of-view, your faces are really very complicated. So I mentioned there's a texture, which you might think of there's the skin mostly. Or whenever the eye is [? sustained ?] to the outside view. But hidden below the skin, you know there are muscles, there's bone below.
And our facial musculature is really, really very highly differentiated, and much more than in other animals. So I like this by comparison. So if you're a fish or a frog, you can do really cool things, like sit on a front porch and enjoy the day. But there are certain things that you cannot do because they require certain anatomical specializations that only mammals have.
So these two rats that are shown from the top, you can see that they're interacting with their faces. Rats have these amazing whiskers that are enhanced here by these by these dots so you can see them move. And so you can see how they're moving their whiskers-- this is slow motion-- back and forth to explore each other.
Now that's possible because there are muscles in mammals that are attaching directly to the skin. At this case, at the whiskers, and so that they can move the whiskers around. And so the most high fidelity system with this facial musculature really is the primate facial musculature system with 23 facial muscles that can be used to express the emotions, like you did when I showed you the movie of the cute macaque monkeys. And there are similar expression also in rhesus monkeys and other macaque monkeys.
So if I show you a picture of a face, there's a lot of information that you get from the face. There's mood, attention, species, race, gender, age, identity, familiarity, attractiveness, even things like trustworthiness you get from the face. Not claiming it's reliable, but you get something like a sense of trustworthiness from a face, and it's one important psychological quality that you get from a face.
And all of this you're getting very fast. You don't have to think about it, it's not an active effort. Therefore, again, you think it's not really intelligence. But believe me, it's a very, very complicated computational problem that your brain somehow solves all the time automatically. So this is really one of the aspects of what makes us really smart. And I emphasized before that some of these signals are not just there passively for you to get, but some of these signals are being actively sent and get under your skin.
So just to illustrate some of the computational problems here, so this is a scene from The Godfather. It's not here for a specific reason other than to illustrate the fact that we can detect faces, even when they're not looking at us, even when the illumination conditions are really difficult. We have to figure out where the faces are. And only if we know where they are, then making them process more detailed information.
So for example if I'm showing these pictures here, those of you who have seen the first movie of The Godfather will maybe not have problems to realize that these two pictures are pictures of the same individual. And these two pictures are of the same individual. And we can do this despite the fact that these two pictures on a pixel-by-pixel basis are much more similar to each other.
Or also on the basis if you do a decomposition of different orientations, these two are much more similar to each other than these two here. Yet we know that these two belong together in some kind of representation more closely together because they belong to the same individual. And so the question is, how can the brain do that.
So there is about 1% of the population who are face blind and to whom the social world might look something like this. And 1% is really not aware. So when I talk to audiences a little bit bigger than this, and oftentimes someone comes up afterwards and says, got difficulty recognizing faces.
So this face recognition problem typically is not a problem to detect faces, so people know that there's a face out there, but they cannot really tell one individual from the others. And here really it's the same individual. I hope that you noticed that.
So I told you about Charles Gross's finding of face cells. This is a summary of the locations where people, including Charles and afterwards, have found face-selective cells in the brain of the macaque monkey. And the conclusion of many people was that the face cells are really distributed all over this object-selective region of the brain.
And so when I first heard about face cells, I thought this was interesting. But I thought it was impossible to study these cells because it was difficult to find then, maybe 10 in 1,000 cells. And so it makes it practically impossible to figure how they're processing facial information.
But then Nancy Kanwisher, who might be teaching here as well, so she's one of the faculty at MIT and one of the main people in the Center for Brains, Minds, and Machines. She was one of the first to use fMRI, functional magnetic brain imaging doing presentation of visual stimuli. And when she was contrasting activation in the brain to faces to non-face object controlled stimuli, she found one region in the brain. And subsequently people have found more than this one region that is selectively responding more to faces than to non-face objects.
Now so this signal that's being used here is an indirect signal of blood oxygenation and blood flow that is correlated with neural activity, but it's not a direct measure of neural activity. It's also a measure that's much coarser because it's at the scale of cubic millimeters, as opposed to single cells, which are much, much smaller.
And so it's not really clear if you get a map like this that's indicating that this region is responding more to faces than not, what does it really mean? Does it mean that here in this region there maybe 10% face-selective cells, but more than in the neighboring regions that didn't show up. Or does it mean that all the cells are face selective? So these things are not clear?
And so this is why Doris Tsao and I now about 15 years ago, we thought we should use fMRI in macaque monkeys because then we can combine the experiment that Charles Gross has done where he recorded from individual cells with experiment that Nancy has done to localize different face areas. So the first question was, are there face areas in macaque monkeys as well. And the answer is yes. And they're very reliable across different individuals. This is a computer inflated version of this brain here. So you can look into the [? gyri, ?] these folds, but otherwise you couldn't look inside.
This is the same contrast that I showed before. So these yellow and red regions are the ones that are responding significantly more to faces than to non-face stimuli. The blue ones have the opposite specialization.
So Doris and I discovered six regions here that are face selective. And we gave names to them based on their anatomical location. I will tell you probably most about these two areas, a little bit about this, and a little bit about this area here.
So the first question we had was about cells in these regions, are they face selective or not. And so what we did was we lowered [INAUDIBLE] electrode into this region that we knew from fMRI was overall responding more to faces than to non-face objects to see how the cells are responding, showing the exact same stimuli that we showed during the fMRI experiment. And I'm going to show you a video of the first cell we recorded from this area here. And so you get some intuition about how these cells are responding.
The quality of the video is not great, but I think you're going to get the idea. The clicks again are going to be the responses of the cell.
So these clicks are the responses of the cell.
This black square, by the way, is where the animal is looking at the moment. It didn't see that black square itself. So I hope that you can hear that every time there's a face, there's also a response. The cell however is not perfectly quiet when there's not a face. So sometimes there's a bit of a response to some other stimuli.
So we can quantify the response of the cells. So in this experiment we had 96 different stimuli. And so we could measure for a given cell how strongly does it respond and respond on average to each of the 96 different stimuli.
So we get a histogram. The way I'm going to organize this histogram always is going to be the faces on the left-hand side, and then the other stimuli from the left to the right. So there were 16 faces and then 80 non-face stimuli. And then we color-coded the responses. I'm going to use the same color code throughout. So red is for response enhancement, blue is for no response, response suppression.
And what is shown here, so this would give you a vector, right. It would give you a vector of 96 different responses of a given cell to these 96 different stimuli. What we're showing here is a matrix where we [AUDIO OUT] the vectors of all the cells we recorded from the top of each other. So the cell number goes from top to bottom, and the picture number from left to right. Again, red is for enhancement and blue is for suppression.
So if you are to describe this matrix, I think the most apparent feature is that here on the left-hand side there's about 90% of the cells that are responding to the faces than to the non-face stimuli. There's a smaller portion you can see in the upper left corner that's selectively suppressed by faces.
But maybe you can also see that there are some stripes running down here, some orange stripes. So if we average the activity of the entire population, this is the population response, [INAUDIBLE] to the faces. But then there are some objects in here that are not faces to which we get intermediate responses. And these objects are shown here-- it's clock faces, apples, pears, sliced tomatoes. So it's things that are also roundish like faces that are bilaterally symmetric or roughly bilaterally symmetric, and then there's some structure inside. So with these kinds of stimuli, we can fool these cells a little bit to respond, but they're not responding as much to them as to faces.
I really love this. So this is to illustrate this point, right. So sometimes we recognize faces even though we know that they are not there. And I love this because they've just been sliced through, and so you know they have reasons to scream and shout. But you know that they're not faces, but you still recognize them.
I just want to show you recordings from other face areas. So what I didn't tell you is that in this region the particular cell I showed you we only show [AUDIO OUT] faces. It turns out there was a small group of cells I did not actually talk about. It seemed not to be face-selective, but in reality they liked the profile view. So these cells are actually view-selective. So, yes, they are face-selective, but a given cell might not respond much to a front view of a face, but only to a left view or right view.
If you're going up one level, [AUDIO OUT] what's happening. This is just one cell, but it's typical for the entire population.
What is it doing?
AUDIENCE: Sides. [INAUDIBLE]
WINRICH FREIWALD: So it likes the profile view. And we don't test the backs as possible. We would like more, but we did like a three-dimensional thing on these cells. So this cell likes two profile views, the opposite ones.
And so [? Tommy ?] [? Paju ?] developed some theory now that explains why this is happening. It's very counter-intuitive. If we go up one level higher, we find cells like this one. This is going to be a very hard. But maybe one of you is going to figure out what it's doing. And I'm saying this based on past experience because it's really hard.
So this is a very sophisticated cell, it doesn't just like any face.
AUDIENCE: How it tells the face is friendly?
WINRICH FREIWALD: So one of the hypothesis frequently is that it's head orientation. And head orientation does play a role, but it's not the main factor. What we showed were 25 different individuals, only one of which the animal used. This is not the one who's in here.
Yes, there's a bigger response when it's a front view of the face than the side views. But in this whole video, there are [? two ?] action potentials fired to anything else but this one individual. There are two guys in the end that's responding to only one of them. The animal has never seen this person in real life. I play this again.
So of you are still not convinced. It's always the same guy in different orientations.
AUDIENCE: You end with a girl.
WINRICH FREIWALD: OK, good. So we could end my presentation based on that question. So what would the answer be? So the experiment we're running is 25 different individuals and eight orientations. And if you now plot the responses of the 25 individuals based on head orientation, the way it looks you get a red stripe. So if one individual across different head orientations, you get a response.
It's not completely the same across head orientation, but the main factor that matters is which identity it is. However, we only tried 25 samples, right? So there's a good possibility there's another person there, likely one that looks similar to him, that cell is also going to respond to.
It's also not a grandmother cell in the sense that this subject from which the cell was recorded is never seen this person in real life, so it's not an individual that is a represented person, like Charles Darwin I showed you in the beginning. What is the newer code here, right?
So what I didn't show you, so these areas are connected to each other. Early vision is starting here. There's a feed forward rate of activity traveling through. This is what we all assume right now. And as this wave is traveling through, in only two steps information is transformed from a format that is picture-based, it's [? vertical ?] to head orientation, into formal [INAUDIBLE], identity-specific that loses information, or at least transforms information about other aspects other than identity. And the codes that might occur at this level could be completely different.
What I should emphasize is that there are lots of cells in this region as well that are responding to all faces. There's a broad distribution of cells that are very, very identity selective. And in fact, there might be cells that are so identity selective that we've never seen them active because we never showed the one face that they would like to respond to.
There are other cells that are going to respond to every face, every orientation, every position, every size, lots of manipulations could apply to. Why are this in area? We don't know.
So, what I didn't have time to show you, guys, so we know that these areas are directly connected to each other. There's now beautiful data with which we going to trace this that shows that. Every time you place a trace onto one of these areas, 90% of the cell bodies you find labeled are inside the other face areas. Similar results from other experiments which we've done before.
So it's a small network that's connected to each other, indicated in the beginning and really working this out right now. There are specific connections from different face areas to other parts of the brain that suggest [INAUDIBLE] the fact that faces draw attention, that gaze is directing attention away. The fact that faces are eliciting emotional responses, they can elicit communicative responses, and the fact that faces can activate memory.
But we think that this is happening outside of this core system here. What I also didn't show it to you is that now that we have this great access to faces, we actually decompose face stimuli and so we can figure out something about how these cells are coding faces.
You could recognize Woody Allen, even though it was blurred. I told you, something to do with the gist of the face as an example of what's called holistic face processing. This term comes in different kinds of meaning. But what it means is that you don't just need local information. You can recognize a face based on the overall layout. Let's call this holistic for the time being.
But also some faces might really just have different certain features. There are differences in the noses, the eyes, and whatnot. And so you can recognize Ernie and Bert here in this way. And so this is one of the questions we're trying to solve. How are the parts represented and how is the whole represented?
So the way we approach this the going to stories and me. So we designed this 19 dimensional face space. These faces are cartoons, they consist of very, very simple geometric shapes. There are only ellipses here, there are lines here, there are triangles there. If you put them together in the right configuration, you see a face. Again, your face processing system being active, interpreting these simple geometric shapes in the right configuration as a face.
Now we could vary these stimuli according to 19 different dimensions. So we have things like face-aspect ratio. So it goes from one extreme Ernie to another extreme Bert. And we chose the extreme to go way beyond our natural exposure to faces. So we're never going to see a face as narrow as this one or one as squashed as this one in real life. Well, in Sesame Street they exist. But even if elsewhere you won't see them.
So then there are things like pupil size. There are no pupils here, very big pupils here. They are things like inter-eye distance. This is a little hard to see. [INAUDIBLE] cyclopean arrangement where the eyes are together. Here the eyes are straddling the outside of the face, and so on and so forth.
So now what we can do is we can randomly vary this cartoon stimulus along these 19 different dimensions. It looks a bit funny, cartoon character is trying to talk to you. What's behind it is really that every moment in time, this is roughly nine times per second, we are randomly choosing from this 19 different dimensions a particular value, use this to compute the face and display it. That's all that's behind it.
So now they're recording neural activity at the same time. And now we can ask, does the response of the neuron depend on the stimulus that we're showing. So more, specifically, we can ask, does the activity of the neuron we're recording depend on variation along one of these dimensions. So whether the feature assembly is at the bottom or at the top of the face.
Independently of all the variations that are happening in the other 18 dimensions. Then we can ask [INAUDIBLE], does the response depend on variation along this axis independently of the other 18 dimensions. And do this 19 times.
The result that we're getting for one cell is shown here. So for all these 19 different dimensions that we constructed before, and we can construct this tuning curve. Each of these feature values could take one of 11 possible values from one extreme to the other.
And so we can see, does the activity of the neuron vary as we are changing this feature value. And you can see here, it's very clear that there's variation here, there's variation here, variation here, variation here. Maybe here, but not statistically significant.
For this particular cell, we found that four parameters were important. So this cell likes Bert not Ernie. It likes the eyes in cyclopean fashion, not far apart. It's hard to see, it likes the eyes narrow together, not wide open. And it likes big pupils, not small ones.
And this is very typical for these neurons, that they are showing this ramp-shape tuning curve. It's almost as if this cell would take a ruler and measure the feature directly, and then relay the feature value in a one-to-one fashion its output fighting weight. It usually matters more than one dimension, but not all 19. And so these are features of these cells.
And so this is one way that we now know that these cells at this level of the face areas are encoding individual features. It turns out that this does depend on the overall arrangement of the features. So after we figured out for a given cell what the feature was that it liked, for some of the cells we did another experiment. I'm going to show it to you here as a movie. And that is, that we would only vary now one of the features that we know the cell was tuned to.
The cell likes small eyes, and it's going to be very hard for you to see that. But you're going to see is the eyes are varying in size. There's a response [? relation ?] of the cell. And if you analyze it, it turns out that it likes small eyes, not big ones.
And I'm going to show you this in the context of the entire face. And then I'm going to remove the rest of the face and put it back on, and you're going to hear what the effect is. [CLICKING SOUND]
This is an extreme case. The eyes are still varying, but the response is gone. [INAUDIBLE] cases the response is going to be reduced a lot. It's still there, and the tuning is still here, but it's weaker. And so this is the typical result here is that the cell is going to have tuning to a particular feature, but it's much stronger when the feature is correctly embedded in the context of the face. So there is holistic effect if [INAUDIBLE] that is embedded in the whole face. And there's a correlate for this in human face perception than when the feature is presented in isolation.
So this is just one example of the things that we can do now to really analyze what is it that these cells are picking up. How is it that they become face-selective? We could then predict, are they going to respond to an animal face, for example.
And we can do this in quantitative ways. So this is a fully parameterized stimulus that can be described quantitatively. And so I think we are at a time where for the case of faces which arguably are the most complicated objects that you can think of, we can actually understand mechanistically as well what these cells are doing as [? we understand what ?] cells are doing. We're a little bit behind, but I think we're going to get this level of detailed mechanistic understanding.
And that's because primates are devoted to social information processing. Because they are devoted to social information processing, they developed a specialized system that's dealing with faces and faces only. Otherwise, it would not be a good choice to work with.
It might be better to work with triangles and squares and circles, really simple objects as the next step to go from orientation to something in object recognition. But because of this stimulation, it turns out that this very complicated stimuli is actually a great model system to study object recognition.
Associated Research Thrust: