The Neural Circuits for Face Recognition
Date Posted:
July 27, 2021
Date Recorded:
July 27, 2021
CBMM Speaker(s):
Winrich Freiwald All Captioned Videos Brains, Minds and Machines Summer Course 2020
Description:
Winrich Freiwald, The Rockefeller University
The analysis of faces plays a key role in visual intelligence and social cognition. Winrich Freiwald explores the function of a specialized network of face processing regions in the primate brain, including a newly discovered region of the temporal pole (TP) that appears to be engaged in person recognition. Observations from fMRI and physiological studies have important implications for the representations of face features and computational processes underlying face recognition in the brain.
PRESENTER: It's a great pleasure to introduce Professor Winrich Freiwald from Rockefeller University, who is going to tell us about all the secrets about how facial recognition works at the neural circuit level. Winrich, all yours.
WINRICH FREIWALD: Well, thank you so much for having me. It's a great pleasure. I should also say it's pretty much my first Zoom lecture and I will be struggling with the fact that I'm not seeing any of the audience's faces going through this. So what I will talk today about are the neural circuits of face recognition. And I will focus on how we discovered the basics of these areas, I will touch upon how the findings that we have in different parts of the system explain some of the face recognition abilities that we have as humans, and I will at the end then talk about what this helped us learn about computational principles in the brain, in particular through computational modeling.
I could also, because it was announced, talk about interaction analysis, but I thought because this talk is relatively brief and we want to have enough room for discussions that I would focus on face recognition, but if you have questions about perception beyond faces we can also discuss that. I would try to make brief breaks through my talk where they're natural, and if you want to ask questions through all this would be great, and I'll be happy to answer them as we go along.
So maybe first general question, like why should you guys be interested in face recognition? Why should you be interested in faces? After all, you signed up for a course that's supposedly on intelligence and it's natural and artificial implementations, so why care for faces? So there's several answers to this. Number one is all primates are interested in faces. Marge probably just told you a version of that. You are primates, you should be interested in faces.
Second, it is a great model system. If we want to understand complex brains like ours or those of monkeys, they're very big and very complex in organization, and to make any progress to understand them we need model systems that are smaller versions of these entire brains. And as I will tell you, for faces we actually have such a model system that really makes it possible for us to gain the kind of mechanistic insights that we need to identify computational principles.
As social beings we rely a lot on faces. And I started out here with the picture of my two twin daughters who were born in March this year. And as you might be able to tell they're identical twins. And so it's a bit of a pun of nature on me as a face recognition researcher now to put my face recognition abilities to the test every day because I have to tell them apart.
So why faces? Faces convey a lot of information that is essential for us to interact with our social environment. So just a brief glance of this face here, a fraction of a second, you will get information about the identity of the person, gender, age, race, species, familiarity, attractiveness, perceived trustworthiness, not actual trustworthiness, the mood and attention of the person in the picture. And you get all of this at a glance at the very, very fast pace, and it's really essential for you to make sense and be able to interact with your social environment. And the social individuals and social and visual individuals rely a lot on this ability.
Imagine a world that would look like this. I took this from the fifth season of Curb Your Enthusiasm. This is how the world might look like to about 1% of the general population, which suffer from a condition known as prosopagnosia or face blindness. And to them faces look very, very similar. Or in another way they look like the faces of my two daughters looked like, very similar to each other, and to tell them apart is very difficult for people with face blindness. They often suffer from social difficulties in their everyday lives.
So I would like to do a test with you guys and you can see this here. I'm going to show you this picture for about 15 seconds and you will see how that is a collection of photographs of faces and your task will be to say how many individuals are shown in these different pictures. I'll give you a bit of time to look at them and make up your mind. Normally I will take a vote at the end. I don't think they're going to do this today. But make up your mind like how many individuals you see in these pictures.
OK, so how many people did you see? Normally I would ask did you see one person? Do you see two, three, four, five, six or more? And the answer actually is two. So I'm highlighting here in green outlines the pictures from one individual and the ones that are not outlined are the ones from the other individual. And it's like a physical study in which people were asked to answer this question. The average answer was about seven.
And you might guess what the difficulty is here. These are pictures taken with different illumination conditions, different facial expressions, so there's quite a bit of variation. Here there's also change in hairstyle now and then. Certain variations that could occur, like in size of the image or in head orientation, are actually minimal. Also there's a minimal difference in age, just because these two people happen to be the relatively young authors of this particular paper.
So this should tell you that not only if you're face blind but even when you're typical, your ability to discriminate between different faces actually it's not perfect, and it's particularly not perfect when you are tasked at telling faces apart that you don't know from everyday life. It's a common occurrence in eyewitness identification that eyewitnesses make mistakes and actually the leading source of false convictions. And so this might give you a brief glimpse of why this might be.
So when you think about face recognition, we can distinguish different abilities and they correspond to different computation across the states. So if you have a social scene like this one here from The Godfather, none of you will have difficulty recognizing the fact that there are three faces in this image even though the lighting conditions are really difficult, even though the orientations and the distance of the faces are different. And we call this process face detection. So the ability to know that there is a face and where is a face, or that there are multiple faces and where they are.
But of course to get much more information from this image than just the fact that they are faces, you will be able to discriminate faces. So here I put six pictures, and these two pictures are quite similar to each other in the pixel space because they're taken from a similar direction, but they show two different individuals.
And so when we recognize an individual in an image, what we have to do is to put together representations for images that are different from each other by virtue of the hidden cause, and this hidden cause is the identity of the person in the image. And so this ability to discriminate not based on what meets your eye at this very moment but based on underlying causes, in this case identity, that is called face discrimination, so your ability to tell one face from another based on the identity that is shown there.
So what are the neurosubstrates of facial recognition? Charles Gross was a pioneer. He passed away recently. He found cells and they were really astonishing at the time-- actually so astonishing that many people did not believe him for many years-- that responded selectively to the sight of faces. And so here's one example neuron. During this period of time indicated by the spot here, he would show this picture up here, and you can see that there are more action potentials fired by this neuron during the presentation of this line drawing of a monkey face.
He and other people found face-selective cells in this part of the brain that I'm sure other people have talked about before. It's the inferior temporal cortex. And this part here is the superior temporal sulcus, which is opened. And all these red symbols here indicate locations where Charles or David or other people have found face-selective cells. So the conclusion that people had was that face cells are intermingled with cells of other selectivity. It sometimes took recording of thousands of cells to find a few that are face-selective. Other people never found a face-selective cell, and so it was very difficult to study them or even get to the mechanisms of face processing.
But the perception was from compilations like this one here, which was from the early '90s, that face processing, at least a monkey's brain, is completely intermingled with the processing of other objects. And that changed with Nancy Kanwisher's pioneering work a little later, a few years later in the late '90s, when she found with fMRI indirect evidence-- indirect because the bold response that you mentioned fMRI is a change in blood flow and oxygenation, which is an indirect reflection of neural activity-- she found regions in the human temporal lobe, regions that might correspond to the monkey inferior temporal cortex, that are responding more to faces that to non-face object. She discovered this region here, which she called the fusiform face area based on its location and specialization for faces.
So the question that arose is, if you have this picture with fMRI that is suggesting that there is a spatial specialization for faces, and you have this other picture from monkeys where it seems that faces intermingled with other objects, which one is true? Was there a difference in organization between the monkey brain and the human brain? Or are we just looking at different techniques, such that the MRI might be misleading because the way it's representing data with thresholded activation maps might overemphasize spatial specialization?
So the question basically arises like, what's actually going on in areas that fMRI identifies as being face-selective in the sense that they respond more to faces than to non-face objects? So Doris and I, almost 20 years ago, we've been asking this question, like what's happening in fMRI face areas? And the way we wanted to address this was first to ask, do monkeys also have face-selective areas? And then to record from these areas, basically combining of the approaches from Kanwisher and Gross, to try to see what's actually happening in fMRI-identified face areas.
So first we used fMRI-contrasted activity when the monkeys saw faces-- showing you at the top human faces or monkey faces-- to activity when they saw objects. And this is the pattern that we got. It's computer unfolded map of cortex. And you can see that a lot is happening here in inferior temporal cortex and in the STS. The yellow regions are indicating regions that are responding selectively more to faces than to objects, and the blue regions show the opposite response profile.
And we find a very repeatable pattern across animals of six areas that we can give names based on the anatomy that are reliably responding more strongly to faces than to non-face objects. They exist in both hemispheres, so there's a total of 12 areas on the temporal lobe. I should also say that I'm not going to talk about this today, they're also areas in prefrontal cortex that are face-selective as well. OK, so we have six areas that by fMRI are responding more to face than to non-face objects. We can now lower a recording electrode into these areas and record individual cells and ask what their properties are.
And that was actually great fun when we did it. And hopefully, I can convey this fun and excitement that we felt to you almost 20 years later by showing the video that we took from this very first recording. So this is a recording of the first cell we recorded from this face area here. And I will show a video that shows-- at lower quality, but it shows what the monkey saw at the time-- and you will hear clicks every time the neuron we recorded from responded.
[VIDEO PLAYBACK]
WINRICH FREIWALD: Can you hear this? [INAUDIBLE]
[END PLAYBACK]
WINRICH FREIWALD: What the video basically shows is like a random sequence of images of faces and unface objects, the same stimuli we used in the fMRI experiment. And then you would hear clicks when the neuron is responding. And what's pretty clear that everyone can hear listening to the video and watching the video, is that there are clicks whenever there are faces shown. Sometimes there are also clicks when non-face objects are shown, but every time there's a face there's a response of the neuron. And so we were recording cell after cell. And then about like six cells we recorded from, we were looking at each other and saying, well, it seems that all the cells are face-selective.
And so we are recording more cells, of course. This is from our first paper on this, some 280 cells showing you from top to bottom. And the response actually is using the color map that Marge just mentioned that she'd invented. So I'm always going to use warm and red colors for response enhancement and blue for response suppression. And the way that it is organized, the cell number runs from top to bottom, picture number from left to right, the 16 faces I've shown you on the left hand side, and you can see that the vast majority of cells are either selectively enhanced by faces or selectively suppressed by faces, and that there are only a few cells in between for which it is not so clear as to what they are responding to.
You might also see that there's some vertical stripes here of orange colors. These are stimuli that are not faces but that share some properties with faces, like a clock face for example, that is also roundish and has some internal structure, also some internal symmetry just like a face. And so from this finding that with fMRI we can localize face areas at reproducible locations, and that whenever we record from these face areas the vast majority of cells are face-selective. And third, that doing selective inactivation of these face areas-- something I will not talk about today-- we can actually interfere with even the most basic of face processing task and that is face detection.
So from these three sets of information we would conclude that the middle face areas that I've talked about, that more generally that these are face processing modules. This is a very strong claim. The claim here is that these areas are there for one and one purpose only, and that is to process face information. And furthermore, we suspect that what the middle face areas are doing to this area, I was trying to show you the video of, is that they are performing shape analysis and thereby are able to tell faces from non-face objects. The reason is that most of these cells have our C curves that are close to one, and so they are very, very informative about telling faces from non-face objects.
So what we think this is also doing is that it forces us to see faces in the outside world when we know that there are no faces there. So I really love these peppers. They've just been sliced in half, so they have reasons you know why the teeth seem to be falling out, why they're screaming, and are unhappy. And even though you know that these are peppers and not faces, you are forced to see them as faces. And so we believe that face processing system is there to try to find evidence in the outside world that there's a face, then process it this information as a face, and then forces you to see faces even when they're not there.
On a pragmatic level, what does this cover? Your face selective areas packed with face-selective cells does, is that now we have an unprecedented access to functionally homogeneous group of cells that are coding for one high-level object category, and we have very high certainty that we know this high-level object category. So this is a huge advantage over other parts of the visual system where, especially in high levels in inferior temporal cortex, where we don't have this information. And I think it's greatly influencing the approach that we can take, and also the depth of understanding of the computational principles of the system that we can study.
So I just want to give you one story about a coding of faces where we have taken advantage of this and really gotten into the mechanisms of face recognition. And this story is about a question that's been pertinent in the domain of face processing, and that's about the question of how the part of the whole integrated. Why the whole? If you see a picture like this of a highly blurred face, you don't have a problem recognizing this as a face.
Those of you have been around a bit longer and might also not have a problem recognizing this highly blurred face as Woody Allen. So in a sense you don't need a lot of detail to recognize the face. It could be sufficient for you to get the gist either through blurring or through other kinds of manipulation. So there seems to be something holistic about processing of faces, but of course you can also process individual facial features, focus on them, and describe them, and use those to tell one face from another.
So we were asking the questions, do the neurons give us any clue about how this processing might actually take place in the brain? And so what we did was we took pictures of individuals. We designed highly simplified cartoon characters like those below that consisted of very simple geometric features that could be described by just lighting numbers like here.
So we did one dimension, that was aspect ratio, that ranged from a very flat face, horizontally elongated wide face, to a very narrow face like Ernie and Bert in Sesame Street. We had parameters like pupil size, with very big pupils and one size and no pupils on the other. And then we had parameters like inter eye distance, so we have an almost cyclopean arrangement here on the left, and the eyes almost straddling the outside of the face on the other side.
And so for these 19 different parameters we actually had 11 different values that we varied randomly. And so the stimulus we'd show to the cells would look something like this. It looks a bit like a cartoon character that's trying to talk to you, but really what's happening here is that we're updating every 160 milliseconds these 19 different dimensions, choosing a random value of one of these 11, and this is how the pictures look like.
So that we can ask questions, like how these neurons are encoding different features. And one way of doing this is to ask for one of these dimensions at a time, whether the neurons change their firing rate as we are changing for example here the height of the feature assembly independent of all the other changes that are taking place. You can ask the same question for a second dimension, like the face width, or the pupil size, and so on and so forth.
And so for the 19 different dimensions we would get for every cell 19 different tuning curves, which I'm showing here for one example cell. This cell was significantly tuned to four out of these 19 different dimensions, which are highlighted here and I'm showing enlarged here. So this cell didn't like Ernie it liked Bert, it liked the eyes in a cyclopean arrangement, not far apart, it liked the eyes wide open, not closed, and it liked big iris not a small one.
So it's almost as if the cell took a ruler and measured for facial properties, meaningful facial properties in the face, and would relay these properties in an almost one-to-one manner in its output firing rate. And this kind of very simple ram-shaped coding, of course, can be very efficient and very beneficial for describing a stimulus space like that of faces. It's a little bit-- I don't know if Marge talked about this in the domain of color, like color-coding.
So for faces you could measure a number of properties, let's say the 19 that I mentioned before. You can construct a face space, and for a given face like this one here that's called Jim, you can construct an antiface that's as far away from it as possible in face shape. So here it's the jaw width for example that's different, or features of the eyes and the nose. The mouth is fuller here than here.
And if you're adapting to this stimulus, the shown neutral one, what you will see is something that will look a bit more like the opposite. And so this gives us a clue that there is something like a face space in our mental world of faces, and these very broad tuning curves here that are ranging from one extreme to the other would be a way to actually implement this kind of face space in a very efficient manner.
So this is encoding of the different parts. I would have liked to show to you a video that explains the very strong effect that the overall face has, but [INAUDIBLE] I suppose I will describe it here. So what happens in this video and is typical for the cells in the middle face area is the following. So if you have a whole face but now you're changing only the eyes not everything else, in this particular cell you would get a big response when the eyes are small, and a small response when the eyes are big. But when you're now presenting the eyes in isolation without the outline of the face, the tuning is gone, the cell's not going to respond at all.
And this was very typical for the population. So here we have cells that are very sensitive to a particular detail of the face, a particular feature which is the eyes, but this measurement actually requires this individual feature to be placed in the whole face. So we have a local measurement, and then an augmentation of this measurement by the placement of the feature in the whole face. You can think of it as a gain modulation that turns out this simple multiplication, so it will be a good description of this data.
So what have we accomplished? And I will make it a brief pause here. I can see that some questions are coming in. What we know about the middle face patches is that they are highly face-selective. They implement a mechanism for face detection. This is something I did not talk about, but would be happy to discuss. I did talk about encoding of facial features through ram-shaped tuning curves, and encoding of holistic properties or configurations as well.
Just very briefly I mentioned that this coding scheme could explain the psychophysics of a face space. And actually the other properties of the cells in the middle face patches that can explain other psychophysical properties of face recognition, like the caricature effect. We get the biggest and the smallest response for extreme faces that are exaggerating the naturally occurring features, and so therefore there's a better representation of caricatures than of the originals.
We found evidence for an inversion effect, so the coding of features is impaired when you show them in an inverted face. And there's evidence for the part-whole effect, the effect I mentioned at the end. And that is that there's better coding of individual parts when they're part of a whole face. It's kind of counter-intuitive because you would think that the presence of other features might distract your ability to process individual features, but it's actually augmenting your ability to discriminate.
Guy's question, why selecting the particular set of 19 face dimensions? That's a very good question. We just picked them randomly. At the time we wanted to focus on meaningful facial features. There were some that we did not pick, like the ear size for example, because they would interact in a physical manner with other features like the hair. Also we didn't have any teeth in there, which teeth are very important clues especially for macaque monkeys. So there's nothing specific about those 19 features. It's just like 19 features that base in part of serving the psychophysics we knew were important for face processing, but these were our choices.
From Ken, how can we ensure the face was actually coded? Yeah, so the answer is no we don't. And so you can rotate the space, you can make up other dimensions, and that is just as good. It's just helpful to talk about this with these meaningful dimensions. OK, so I'm sorry if I missed some questions here. I kind of keep on getting Marge's questions. I will close this window now and move on. But if you have questions on this part, I can also answer them later.
OK, so I so far talked about recordings we have taken from the middle face patches, which are located here. The reason was that these were the easiest for us to target in the beginning. I wanted to show you some video that shows some of the properties of these cells, that they are very face-selective, and we then targeted other face areas with the same analysis I can't show you the videos, but I can describe in words what you would have seen.
So when we record from this area here, which we call AL which is further up the processing stream, what we found is that it's also packed with face-selective cells but now the cells show a funny property that we did not see in the middle face patches before. So in the middle face patches cells are very strongly tuned to head orientation, so a given cell might prefer left profile or front view or right profile. When you go to AL you find cells that also profile-selective, but when they like the left profile they also like the right profile. So in a weird manner there's confused by mirror symmetric views of faces.
And then if you go up to area AM, even further up in the face processing system, again almost all the cells are face-selective. But now you find cells that can be very identity-selective and very robust to changes in head orientation. And I will come back to this later in my talk because these three qualitative properties are very, very typical for these three populations of cells, and they give us some important clues about the computational system that the face processing system in the brain might be implementing.
OK, one interpretation is that we now understand this network of different areas, and didn't mention but I'm mentioning now in passing, that these face areas are also selectively interconnected to form a face processing network. And what we understand is that there's a transformation of information from one face area to the next to the next.
So there are two main transformations from this area, to this area, to this area that takes us from one more image-level description of faces, that is very sensitive to head orientation, to a description that is more identity-selective and less selective and more robust to changes in head orientation. And so the first presentation would be good for face detection, the last one would be very good for face discrimination, for telling the identity of different images across different head orientations or other kinds of transformations.
So there's a third level of face recognition that's often confounded with face recognition, and I come back to the study that I mentioned in the beginning. When these authors, when the subjects I mentioned that people overestimated the number of faces that were in the images. In this case they said it was seven different individuals, but it was only two.
However they did a very clever controlled experiment in which they showed the faces of two Dutch celebrities in pretty much the same way as they showed their own faces here. And they ran this with British subjects and with Dutch subjects. British subjects again on average said it was about seven faces in this arrangement of pictures. And Dutch subjects had like no problem at all. They would always say it's two individuals. This is just one example that there's a fundamental distinction in face recognition between the discrimination of faces and the recognition of people that we know.
So if you have seen the Godfather recently you might even remember the names of the different characters, you might know something about them aside of how their faces look like, and that is information that is activated by the sight of faces. So there's this extra step that takes you from face discrimination to person recognition. Of course there are other ways to recognize a person, but faces are very important in everyday life to do it. So we were interested to know, where does this happen? So does this happen in the visual system itself?
Marge emphasized the importance of learning. If you are growing up with your parents, siblings, friends that you know very well, will this impact how you perceive faces, how it does? So maybe the similarity effect of person recognition is completely imbued in the face processing system that I showed you before. Alternatively, maybe the other parts of the brain that are important for connecting perceptional faces, as in face discrimination, to the memory that you stored about individuals, like you do in person recognition.
And so one of the grads of the labs of Sofia Landi studied this. She ran the same contrast for faces versus objects in an fMRI experiment. And she replaced the faces with famous faces and the objects with famous objects-- our subjects are monkey, so the face was not Charles Darwin's but that of a monkey-- it was very well known to the subject, and of a toy object that was very well known to the monkey.
And what Sofia found was the same layout of face areas that we've seen before, but in addition now for this contest of famous faces versus famous objects, she found two additional areas that we had not seen before. So these are areas that we call TP and PR, PR is localized in perirhinal cortex and TP in the temporal pole. And this is an indirect clue that maybe these areas are selectively engaged in the process of person recognition. It's by no ways proof, but we were very interested in these areas.
And so Sophia ran another experiment that doesn't require audio, so I'd like to run it by you. So you can see now left and right two highly blurred images. If you are sufficiently far apart from your screen, you will easily recognize that both are highly blurred images. And the way the experiment worked was that we are now slowly adding detail to the image.
And as we're doing it, more and more of you will actually recognize the person in the image. At some moment in time it will become entirely clear to you that this is Barack Obama. I might have given this away now for some of you, but it usually happens as I mentioned before even at the highly blurred level off the face. You don't need all the detail that is added more and more now. There will be this moment of recognition, and at this point you're sure it's Obama, and you don't need more information.
So let's do another example in the right hand side. Same process, we are slowly adding more and more detail to the face. And so the features of the person will become more and more clear, and probably this moment of recognition is not going to happen to any of you, because you don't know this person on the right-hand side. This actually happens to be Obama's half brother, who's not nearly as famous as Obama. And the reason I'm showing this is that this is the basic logic of our experiment.
So the reasoning behind the experiment is as follows. If you have a generic face processing system, as you're adding more and more detail to the face you might expect that the response of that system or the brain area is going to increase. It might increase roughly linearly because more and more cells will be active. There's more and more information about the face. But if you have a familiar face recognition system, you might see this nonlinear surge possibly around the time that you're actually recognizing the face, which basically means that you don't recognize, don't recognize, don't recognize, now you recognize, and then adding more and more detail only increases your certainty or your recognition ability by a small amount.
And so this allowed us with this experiment of very slowly over the course of half a minute adding more detail to a face to test for each of the face areas that Sofia already identified, if they show this linear increase or this non-linear increase. What Sofia found for the core face processing system that I've shown to you now several times is this, so there's an almost linear increase in activity to both familiar and to unfamiliar faces. There's a bit of an advantage to familiar faces, more response to familiar faces than unfamiliar ones, but it's very linear increase.
But in the two areas that she discovered in TP and in PR, there is this non-linear surge of activity which only occurs for familiar faces. It doesn't occur for objects, which I'm not showing here, and it also doesn't occur for unfamiliar faces. In fact in area TP, it's so selective for familiar faces. It only shows this non-linear surge here for familiar faces, and it doesn't show it for unfamiliar faces at all. There's no response unfamiliar faces. So this is strong evidence that there are regions outside of the core face processing system that are critical for relaying face information into memory systems and allowing us to recognize people that we know very well.
In fact, Sofia has now recorded from these areas. Here's an average response. And you can see it, like this big response here to familiar monkey faces, much bigger than to unfamiliar monkey faces, and much bigger to any of the other stimuli that we showed. And we were most interested about how these familiar stimuli might be coded.
And we have some evidence that the coding principles are fundamentally different from the ones in the core face perception system, such that we have cells here like this one that are responding only to one particular individual, this particular monkey face, but not to familiar human faces, like my face here for example, or unfamiliar faces, or to objects. So we can now have a third component in our understanding of how the overall architecture is implementing face processing in the brain, that goes from face detection, to discrimination, to person recognition.
OK, so this question about stimulation might be for me, so the answer is yes. If you are artificially stimulating these neurons-- there are some beautiful videos of when this was done in epilepsy patients-- that would then describe seeing faces where they know there shouldn't be faces. In one case required the visual environment to be a bit like face-like, in the other case it was not necessary. And you can actually interfere with face discrimination if you're doing that. You will aid face detection if a face is there, but you might interfere with face discrimination depending on where and how you're stimulating.
Does hippocampus play a role in face detection? That's a good question. It depends on the definition of detection. The way that I've conceptualized it here, I would differentiate face detection from face discrimination. And in this way face detection is a very early, very generic process, and I do not think the hippocampus plays a role in that. However, I think that hippocampus will at some level, it would require more distinction than I would be able to do right now, will aid in the process of knowing categories, and particular linking those categories to other categories.
Yes, so autism, it's very interesting. Actually Nancy, if she hasn't talked already would be a good person to ask this question because she did a metastudy of all published papers on autism and the level of holistic encoding. There was for a long time this idea that in autism holistic processing is impaired, that processing would be more detail-oriented, and there's some evidence for it. But Nancy and the postdoc did a more thorough investigation and method analysis and they found very little evidence for it.
Unfortunately as in many other cases with autism, we find huge heterogeneity of different alterations in different people, and so it's very difficult right now to know what's going wrong. What I will say is that if there was an impairment in holistic processing, the mechanism that I showed to you early on about the integration of local features with global information actually would explain that. So the theories of autism that there's an imbalance of excitation ambition, and if you work this through it actually works out in a way that if you have over-excitation then the holistic effect might be smaller.
What are the possible causes of prosopagnosia? Yes, there are lots of possible causes. So if we talk about developmental prosopagnosia, we now have some evidence that even early processes are impaired such that we set the fields, which I haven't talked about, might be bigger in people with prosopagnosia than people without, which might lead to a crowding like effect. Most of the evidence is really showing relatively small effects, like having smaller face areas or having somewhat weaker connections between face areas.
And so I think it's a bit of an embarrassment that you can have an effect that is as strong as face blindness that is full-blown and most of our understanding of the cause is a bit more quantitative and partial. It's not really clear why this effect would be so strong. Question about whether the TP identity is idiosyncratic. Yes, so I guess the question means whether it's idiosyncratic to the subject. And so we have some evidence that it is. That it depends on the experience of the subject, sometimes even very short term experience, and that influences the representation in TP.
Yes, so there are psychophysics on that some features are more important for recognition of individuals than others. There are other features that does distracting. For example if a person dyes their hair, has a different hairstyle something that can change very easily, that will largely affect whether you're going to be able to recognize a face, even though you should be able to recognize the face just by structural information that cannot change the face.
Yes, so we have not tested race or faces so far. What we have tested is different species. So in monkeys, they've experienced both with human faces and with monkey faces. And of course the differences are bigger than between races. And we see a spatial segregation even early in the face processing system for representations of different species. So you could have a face area, and then a subpart of the face area for human faces, and another one for monkey faces.
Yeah, so to last question, there is a lot of non-visual information even lots of semantic information that would greatly impact that and will let you to misrecognize a person, not recognize the person better. So Galit Yovel at Tel Aviv University, she's been studying this a lot, and so she sees big effects of the semantic information on the recognition of familiar individuals. Yes, a question on saccades. And quick answer is no, and this longer answer that will be yes, but I don't know. We haven't looked at it systematically, and it would be so interesting to do. Embarrassed that we haven't done this yet, anyone who wants to join the lab and measure this will be highly welcome.
Yeah, so PFC, I did not show the fMRI data. We so far have little recordings. There's one paper on a face area in the orbitofrontal cortex. And as you might gather, indications are that at this point it's not just about the face of the face, but about the face as a rewarding or socially meaningful stimulus. That's a worthwhile discussion to have, that when you talk about faces, that what do you mean by the word face can have very different meanings.
We've talked mostly now about faces as a three-dimensional object category. It could be four-dimensional because faces also have dynamics, but it's a visual object category with differences. Then we talked about faces as signifiers of a particular person, so they become social. And so faces can have other meanings as well. In orbital frontal cortex this is pretty evident, that it's not about specific visual features anymore, but about the meaning of the face and the social relevance that this particular face has to the subject.
What we really want to do is to use this knowledge of the physiology and of the organization of the face processing system to learn something about the newer mechanisms of intelligence. That is the overall goal of my lab. A large part of my lab works on faces, but we work on other functions as well, some built on faces and others that are actually not face-specific. But we think that we can learn a lot from faces about intelligence. One reason why you might want to care about faces is that you can argue that the greatest successes these days in artificial intelligence are the domain of vision, and again, faces are a great model system for that.
So what can we learn about computational principles from what I've told you about face processing systems so far? Again, we have a network here of interconnected face areas. In each area there's a qualitatively different representation for faces. And between this area here, this area here, and this area here, we have a transformation of information in two major steps between a picture-based representation that is very strongly dependent on head orientation and an identity-based representation here, which does not depend so much about what's currently inside but more about who is shown in the image or what the physical properties, the intrinsic physical properties of the face are, that are shown relatively independently of which direction they're shown from.
So compared to earlier pictures of face processing like the one that David Perrett compiled-- David was another pioneer in face recognition, incredibly creative, and he also compiled this figure that I showed you in the beginning-- he based in part of Mars principles came up with an idea about how transformations of face representations might occur in the face processing system, but it was largely speculative. Also if you have a system like this here where the face cells are interspersed and widely distributed, you really don't how many transformations you have.
And it's a bit similar to some deep network algorithms or like max networks how that Tommy Poggio developed where you're sticking one level of processing after another, after another, after another. The local mechanisms are repeating themselves over and over again, in this case simple and complex cell transformations. And then if you just stick enough of these errors on top of each other and have a clever enough learning algorithm, you'll be able to do some classification. In this case, animals versus not animals or for person A versus person B.
So the question is, can we actually now build a computational system that replicates the properties to be found in the face processing system? And will this eventually then lead us understand the computational principles in the brain better? So the first question we asked was about this qualitative picture, so can we explain the fact that there are these three different main qualities? In particular, do the architectures that generate these properties, do they have to be deep? And can we explain the occurrence of this mirror symmetry confusion at the middle level of processing?
I will admit I have a bit of an obsession with this last question, but I think there's a good reason for this obsession. So if you think for a moment about what happens in the brain as you're going from the middle face patches where neurons are orientation-tuned-- and so I'm showing here different kinds of orientation tuning, and you're in tune to the left profile, left half profile front, right half profile, right profile-- how would you expect if you had a deep network, a max network or something like this? You would provide input from this level, ML, to the next step of processing, which is AL, how would the tuning curves look like?
And for most learning rules if you would train this network with movies of faces, the next step processing would just be broader tuning curve, broader head orientation tuning curves. And the reason for this is that faces in natural life, they don't jump from left to right. They will have by the nature of physics, they have to rotate through a front view. Most learning rules are temporal contiguity learning rules. So cells wired together that fire at the same time are on close temporal proximity to each other. So a neuron that is encoding the left profile and one that is encoding the left half profile, they will fire together and therefore they will going to wire together at the next level, thereby generating a broader tuning curve.
But this is not what is happening in the actual brain. In the actual brain, the next tuning curves are not broader. They're twice the peaks that you had at the early level, but in specific combinations of these peaks such that now you have neurons that are responding both to left profile and the right profile, but not the front view that is in between. What this means is that you have to have a completely different wiring of the system than you would expect by most learning rules that are used to train these networks this way. And so that is why this odd observation of mirror symmetric coding at one level of processing could actually be a very important clue for us to understand how the system is wired, and therefore which computational principles are implemented.
So to realize this, we collaborated with Joel Leibo who was a grad student with Tommy Poggio at the time. And what Joe and Tommy did was to build a modular architecture system that's only processing faces that consisted just of three levels of processing, not a deep system. Level number one is basically a face detection unit. It's a face filter. It will let information from a video that comes through of faces, but not so much from non-face objects.
We are positing that the computational goal of the system is identification or discrimination between faces. And if we now apply one learning rule, Oja's rule which is a regularized version of Hebbian learning, we find actually mirror symmetry at level number two. It's a bit of a mystery. I don't have time to explain why this is happening. I just want to emphasize that this is only happening for Hebbian learning and not for other learning rules.
So amongst the conclusion that we can draw is, that yes a simple feed forward processing mechanism can explain the main qualitative processes that we find in the brain. We do not mean the depth of processing. And we learn something about the large scale organization here from the physiology of the face processing system. We learn something about learning rules. And we learn something about the geometry of the stimulus. I did not mention this. So for the mirror symmetry to occur, you will need an object that is internally bilaterally symmetric. If it's not a mirror symmetry, in the middle level it will not occur. So these factors have to conspire for us to get the main properties of the face processing system.
OK, so in the interest of time because they're only like a few more minutes I'm going to jump over the second study, or I'm going to just highlight the main points of the second study we did where we look more closely at the quantitative representation that we find in these different levels of processing. So in brief, we can quantify precisely for each of these levels how much these levels are interested in view specificity, in mirror symmetry, and in view invariance. And we can then compare this quantification that we find in the brain with quantification that we can get for different deep neural networks like VGG, or a network that we built and that I hope that Josh-- who we collaborated with on this-- might tell you more about, that we call efficient inverse graphic.
The basic logic here is that instead of what you normally do in a deep network, that you are using all these layers of processing to map an observed image onto a particular identity, you are now instead mapping them on the latent variables of a three-dimensional face, which you could use in turn to explain the incoming image by a computer graphics system. But the main point is that you're just changing the computational goal, partly motivated by what we find in the physiology, to learn not a particular identity but the latent variables of a 3D face model.
If we're doing this, we now get a quantitative description of the different levels of processing that is very similar to the physiology that we find in the brain. And as much, it's very closely correlated, almost perfectly correlated. While for standard VGG feed forward network, we find only a very loose correlation and lots of qualitative properties that are different. So there's several conclusions that we can draw. Yes, deep networks capture some of the properties, but they don't capture important other ones. The computation goal is important at different levels of processing. And we proposed that maybe this analysis by synthesis approach might be fruitful to explain information processing in inferior temporal cortex at large.
I just want to give you one illustration, and then I will end with this, and we can come to questions again about one implication that this has. And this is an explanation for a very cool illusion, which is called the hollow face illusion. And I'll show you here a video that might have some narrative, not from me. You can see Albert Einstein's face now, and it's being rotated now. So let's see what happens.
So it's Einstein's face. It appears to be rotated to the left. But now you realize, no, that's not what's going on. You actually have a face mask. Now you're looking to the front of Einstein's face, to the front, to the front, to the front. And now you will look to the back. But what happens is that now the face kind of jumps forward, seems to rotate in the opposite direction. So your face processing system we would conjecture is forcing you-- let me play this again-- to interpret a hollow face as a regular forward face, a convex shape-- a concave shape as a convex shape.
So this is Einstein's face from the front, everything is normal. And now you will look into the hollow part of this mask, and now Einstein's face jumps forward. You can see it again like jumping in the opposite direction. This EIG system, because it's proposing that you have this internal three-dimensional face model, it's a necessary interpretation of incoming visual information would naturally explain this phenomenon.
So this was very fast. The main messages I want to relate to you are that there is a beautifully organized system in a complex brain that is dedicated to the processing of faces. Different parts of the face processing system on coding faces in qualitatively different ways. We understand the organization. We can very efficiently record for many of these cells and really analyze the properties, how they are coding faces.
And I've showed you, or where I was pointing to two examples very briefly of two pieces of computational work that help us uncover the computational principles that are implemented in the system. By extension, we believe that they will be implemented in other parts of the brain as well. And that might also help us to design new computational architectures. So thank you so much for your attention.
We have still some time for questions. I'll do my best to answer as many as possible. [INAUDIBLE] asking, I did not comment on this. It's a very important point. So when I'm saying that there are two major transformations between the face areas, that is the observation that we have of largely homogeneous populations of cells within face areas, and they are like major transformations of qualitative nature between the face areas. But it's also true that there are transformations inside a cortical region.
People are debating how many there. There are not thousands of layers, but potentially there could be very many of them. There's also a question about what the local computations are. And I for one don't believe that's only convolutions. There are likely other operations as well. And so one of the big jobs ahead of us is to understand these local cortical networks and understand what computations they're implementing.
So in my lab we're very interested in this, so we've implemented [? calcium ?] imaging of very large populations of neurons so that we can really look at population codes and we can detail we have now ways to differentiate different cell types. And we also work to reconstruct the connectivity of local cortical networks. And so with these efforts we are hoping to unravel these local computational principles as well. They're just going to be more, much more difficult to understand than these global ones, which are just so obvious when we record from a few cells you know.
So I would love to answer the question about which areas PR and TP are interacting with. Again, this would make a great project for someone to do in the lab. I don't know the answer yet. I would bet that entorhinal cortex in hippocampus would be important for it. I would have certain other ideas about other areas, but right now we don't know the answer.
What other learning rules did we consider? It was a list of like four different ones. Most importantly, we used the standard learning rules that are used in that propagation as well. So the standard learning rules, and none of them worked. You could consider more. But there's also mathematical proof in the paper and that's going to be more important than the list of learning rules, like why Oja's rule is working, and or some of the other rules like why they're not working.
Yes, so the question from Laha about how can we be sure that EIG is true. So this is ongoing work. The way this works in general is that we have some physiology. We come up with the computational model that is a good explanation for this physiology. And it's any good theory, it should now make predictions that are testable. And one nice thing about EIG, that it's kind of explainable artificial intelligence, is that there are very concrete, very meaningful, functional attributions to different face areas, and they are eminently testable, and as you might gather this is what we're working on right now.
Yes, most architectures are feed forward. I don't believe for a second that the system is entirely feed forward. In fact, we did find evidence for predictive coding in the system. And so we are interested in building recurrent neural network models now in different collaborations. Also EIG, even though it's a feed forward model, there's kind of like a natural extension that I was briefly alluding to because it can explain stimuli. So it could be used in a predictive manner to explain away incoming information. And so it will be very interesting to generalize this model to a recurrent model that is exerting feedback.
Anonymous attendee, it's just this one model that I mentioned, EIG. I sent out the paper. In the interest of time because I'm one minute over now, which is probably OK, please take a look at the EIG paper and so the answer will be in there. OK, a question by another anonymous attendee, the computational work about the mirror symmetry and whether it emerges as a consequence of face detection?
It is possible because in this computational model that we considered it is a consequence of the intrinsic bilateral symmetry of a face, and that could be helpful to detect the face from the front. But it can also detect the face from the side. We don't have any symmetry. In fact, the asymmetry would help you to detect the face. So I'm not sure that detection specifically would be the driving computational force here.
A question about innateness or learning. I think Marge would be the person to answer that. It definitely has a lot of learning components to it. The question is whether there is something else to those areas where the face areas are landing that makes it more likely for this part of cortex unexposed to in learning, for it to become face-selective. If you haven't heard Nancy Kanwisher's talk, listen to her talk. She has a very interesting perspective on that.
OK, so the question about monocular depth cues comes up and I really think it was for Marge, because I don't really have-- yeah, Marge answered that one. The uncanny valley. Yeah, so that is very interesting and practically also very important. My hunch would be that this area mentioned before, orbital frontal cortex, would be likely making a big difference and telling you whether this is like a beautiful face that you should like, or whether it's a face that's creepy as a face in the uncanny valley might be.
Please feel free to contact me and I can clarify things. I know I went through material really fast. My heart bleeds for material I did not cover. But my mission today was hopefully to share a bit of the excitement of discovery that we had on the system, and then to focus on overall organization that we know as an example of a specialized system in a complex brain, and to allude to solve the computational principles that we can learn from. And that's going to be the main road for us forward within CB again.