How brain computations can inspire new paths in AI: Part 1
- All Captioned Videos
- Brains, Minds and Machines Summer Course 2021
GABRIEL KREIMAN: I organized this morning into three parts. The first one will be a very basic historical introduction to studies about visual cognition, visual processing, visual neural circuits. One of the exciting things about the summer, of course, is that there's a wide diversity of students. Some of you are really connoisseurs and probably would know a lot of the things I have to say in the introduction. For some of you maybe all of this is new. So this will be very introductory in nature at the beginning.
Then the second part-- so I'm going to do a short break. Then the second part, I'll tell you about a couple of studies that we have recently conducted in our lab at the intersection of artificial intelligence and neuroscience and cognitive science. And then at the last part, it will be a little bit of a Hilbert questions kind of section.
For those of you who don't know, Hilbert was a famous mathematician who basically came up with a list of open questions in the field. This is similar to what Tommy was doing at the end, talking about random questions. So this will be a sort of a disconnected series of discussions on topics that I hope will be controversial, provocative.
And again, the goal is mostly to stimulate discussions, to get people to-- so again, the last part, especially, I hope people will disagree. You can disagree with everything I say from the very beginning. But especially the last part is mostly meant to be provocative, to sort of come up with a couple of-- several random ideas, which I think are probably interesting for potential projects, but also for discussions and along the way.
OK. So again, please, please, please stop me, interrupt me, ask me questions along the way.
I'll start this-- probably all of you are quite intimately familiar with many of these major accomplishments in AI over the last decade or so, from very early successes in beating Garry Kasparov in chess, all the way to beating world champions in Go. We talked a little bit about self-driving cars. Tommy argued that there would be a Nobel Prize for AlphaFold, for the set of algorithms that can basically uncover the three-dimensional structure of proteins.
So this is really, really amazing and should not be minimized. We live in a different world now than we did a decade ago. Things are progressing incredibly fast and this is very exciting.
Despite all of these major advances, I think that the best is yet to come. I think that what we are going to see in the next decade or the next two decades is even way more exciting than everything you see on the screen right here.
I read a book by Francis Crick, famous for his discovery of the DNA double helix with Jim Watson, the book was called What Mad Pursuit, and in that book he tells the story of one day, when he was a kid he was reading some scientific article about physics. And he was very frustrated, because he said, he started crying. He said, by the time I grow up, everything will be discovered already. There will be nothing left for me to do. So his mom went to him and said, well, don't worry, I'm sure there will be some interesting science for you to do when you grow up, so-- and so on.
So anyway, if you think that this is impressive, I think that there's way more for you to do. I hope you will be the ones driving the discoveries in the next decade, the next two decades. And I think that what's yet to come is even more and more exciting than what we have right now.
And part of that, I think, will come from neuroscience and cognitive science. So my view, also embraced by Tommy and also many others in CBMM, is that it's very exciting to do top computer science, to have-- to develop algorithms and all kinds of hacks that can do amazing things. But I think that the truth fundamental revolutions will really come from studying the brain.
So I am particularly excited about the notion that we can discover computations and the algorithms behind them by scrutinizing what's happening inside brains, and then use those ideas to come up with better AI. Of course, and again, I'm reiterating some things that Tommy said. Getting better AI is not the only reason why I'm excited about studying brains.
We also want to fix brains when they don't work. As many of you probably know, mental health is one of the most serious issues in our nation. So in terms of clinical considerations, it's also quite important to understand how brains work. And to me, most important of all, I think it's the most exciting and most profound question in science ever-- understanding who we are, understanding the nature of our thoughts. So curiosity-driven discovery, I think, is perhaps the most important reason.
But mostly today, I will talk about this direction, about how we can think about algorithms, about some of the failures of current algorithms. Again, there are many successes, but I emphasize some of the failures of current algorithms, and how by studying neurons and neural circuits and cognition, we can actually come up with better ideas and AI.
OK, so this is the plan for today. I start with a very basic historical introduction. Then I go on to give three examples of how neuroscience can inspire AI. And I will end up with Hilbert questions, random questions, and mostly discussion.
OK, so let me start from the very, very beginning. I will mostly focus on vision and visual recognition and visual commission. And part of the reason is because there has been a lot of work done in the field. We can sit on the shoulders of previous giants. And I think it's a perfect arena to understand intelligence.
For sure, understanding vision is not all of intelligence. There's all kinds of important questions in AI that have nothing to do with vision or that transcend vision in many ways. But I will focus on questions about visual processing today.
So to start from the very beginning, about 3,500 million years ago, evolution basically advanced the idea of capturing light in the form of energy. And there are bacteria that could capture light and do photosynthesis many, many years ago. But that's not really what we mean by vision. The real revolution, in a way, came about 500 million years ago in a period called the Cambrian explosion.
So these creatures here, which are called trilobites. I'm not sure about the pronunciation. As you know, I have a very thick accent. I mispronounce most words. I don't know how to pronounce that one, I think it's trilobites. Correct me if I'm wrong.
In any case, these are the first creatures where there's some fossil evidence of eyes. These structures that you see here are presumably the eyes of this creature. So this was the first time, basically, where animals could start to use light to convey and detect information.
And there is a scientist at Oxford, Andrew Parker, who came up with the idea of the light switch theory. He argues that by being able to capture light and use light to detect information and understand what's happening in the environment, these gave rise to essentially an explosion, and the diversity of animal species on Earth basically arose due to the evolution of the visual system.
Nothing travels faster than the speed of light. So all of a sudden, we could detect prey and predators. So this gave rise to an arms race, basically, in the evolution of animal species. So the emergence of eyes basically coincides with a major explosion of the diversity of animal species. Most of the species we know today basically can be traced back to this enormous diversity that started in the Cambrian period. And that coincides with the first evidence for using light to capture information.
So to study visual cognition, David Marr, one of the fathers of the field of visual processing, together with Tommy Poggio, came up with this idea of three levels of analysis, arguing that to understand vision-- and I would contend, to understand anything about cognition in general-- we need to understand the problems at three different levels. So they refer to the computational level, what the problem is. The algorithmic level, that is how animals solve it. And the implementation level. OK, and I roughly map this on to understanding behavior, understanding neurons, and understanding the algorithms as a way of bridging between neurons and behavior.
So throughout, I'll mostly try to talk about these three levels, about neurons and about circuits and then also about behavior, and at the same time about algorithms and implementation.
So the first thing to point out, and one of the first entry points into understanding vision is the notion that vision is a construct. We think, naively speaking, that vision is just a reflection of what's out there in the world. There's plenty of visual illusions that show that this is not the case, OK? So I will flash this very quickly. I'm sure many of you are familiar with them.
Probably here, you can see one of the green circles that seems to be larger than the other. I'll do this very quickly. Raise your hand if you think that the circle on the-- the green circle on the right is larger. Even though you know I'm tricking you, just the perception appearance. So this is pretty strong, right? So of course this is a trick, right, so they're both the same.
Here's another one. All of you are probably way too young to see this. This was done by Pawan Sinha at MIT. The gentleman here on the right was former President Bill Clinton. And when you look at this picture, most people think that the person on the left was Al Gore, who was the vice president at the time. It turns out that it's not. It's actually just a replica of Clinton's face copied and pasted. But because of the context, because of the different clothing, and the position of the two, most people think that it's Bill Clinton and Al Gore. OK, so if you do this, you can sort of see that it's actually exactly the same face.
This is another famous one many of you probably have seen. This is called the Margaret Thatcher illusion. If you look at these two faces upside down, they look more or less the same. But then when you turn them around 180 degrees, you think that they're completely different. So in any case, these are just three examples of many visual illusions, just to argue that what we see is a construct, it's an invention. Our brains are making up stories about what's out there. OK?
There is lots of things that are happening here that we don't see. Just as a very simple example, there's tons of infrared and ultraviolet radiation and we just cannot sense it. So there's a reality out there that we don't see. And a lot of things that we think we see are just illusions, are just things that our brains make up. OK?
So this is yet another example. Probably many of you are familiar with this. Just raise your hand quickly if you see white and gold for the color. OK. So all of you are right, everybody else is wrong. That's how I see it. Anyway, basically it's more or less half and half. What is it that the other people see? It's blue?
AUDIENCE: Black and blue.
GABRIEL KREIMAN: Black and blue, OK. So I cannot-- for the life of me, I cannot see black and blue. So black and blue, yes. So there's a lot of black and blue here, all of you are wrong, of course. So this is mostly-- so this is yet another illustration of the fact that vision and our perception is in the brain of the beholder.
In the brain, not the eye. The eye has nothing to do. I'm pretty sure that our eyes-- all of our eyes are very, very similar to each other. So something's happening in our cortex, in our brain that's dictating that some of you see it in one color. And the other such thing sees things in a different color. OK?
All right, so let's say that we want to figure out how a car works. We can start with-- let's say you come from Mars and you figure out that there is interesting contraptions out there that are called cars. We want to figure out how they work.
So we can start with behavior. So we can understand that the car moves. It also makes sounds. We can-- it has different speeds. There are constraints in how it turns, OK. We can also look at how it works from the outside. So we can do things like measure the average temperature over five minutes and every three inches. Or we can get the frequency spectrum of the sounds from the motor and so on.
It's also very useful to use lesion study. So if there are no wheels, the car doesn't work. We can discover that if there is no gas, it doesn't work. But ultimately, to really understand what's happening, we need to open the hood. OK? So we need to study each component.
And so I'll give you a very brief history of the heroic efforts over the last five decades or so to try to open up the hood and see what's happening inside the brain during visual processing. OK?
So the very first-- just to start from the very beginning, light is reflected on objects. And that reflection, which is the eyes, light is then converted essentially into an electrical signal. That electrical signal travels all the way back to cortex. And cortex is where all the magic happens and where-- we're going to be mostly talking about what happens in cortex in the context of vision.
There's a lot of exciting processing happening already at the level of the retina. But largely our perception, as I argued earlier, is mostly dictated by what happens in cortex. This idea that light is reflected on objects and then the brain processes is pretty obvious to all of you.
It was not always the case. It took humanity tens of thousands of years to arrive to this conclusion. Lucid scholars, many of the Greeks for example, thought that it was the other way around. They thought that the eyes were actually sending rays onto the objects. And there are many other theories that people had about how vision works.
But a few centuries ago, people converged on this idea that light is reflected on objects, it reaches the eyes, and that's how processing starts. So one of the main ways in which we can start to study brains is by looking at the unfortunate consequences of lesions onto the brain. So here are two examples of that. These are two pretty famous examples. Many of you may be familiar with them.
The first one here on the left is a person that's known as H.M. And this person had epilepsy, had seizures. And because of these seizures, the treatment at the time was to surgically remove both sides-- on both sides of the brain, the hippocampus. Not just the hippocampus, hippocampus and surrounding structures. But mostly, it's a bilateral excision of the hippocampus structures. So that worked well in terms of seizures. So this person did not have seizures.
This person could still see the world and recognize chairs and objects and cars and faces and so on. But then, very soon it was-- it became apparent that this person could not convert short-term memories into long-term memories. So maybe some of you have seen Finding Nemo, Dory has the same problem. People who have severe Alzheimer's may have the same problem. You can have a very nice conversation with them, and then five minutes later they forget absolutely everything.
So this gave rise to the whole study of hippocampus neurophysiology, of the whole study of long-term memories. The Nobel Prize in 2014 to John O'Keefe can ultimately be traced back to this clinical mistake, if you will, of removing the hippocampus on both sides of the brain.
Removing it from both sides, incidentally, is critical to give rise to these memory deficits. Even today there are many cases where neurosurgeons still do unilateral excisions, meaning that they remove the hippocampus only on one side. And by and large-- and we can debate about this-- by and large, people don't have memory problems when you only remove one of the hippocampus and the other one is intact.
The other example here is a famous book by a great neurologist and writer, Oliver Sacks. Oliver Sacks, in this case-- he wrote about many neurological patients, this particular one is about a man who literally mistook his wife for a hat. This is a person who has a very, very rare condition that mostly nobody knows outside the confines of a psychology departments and neuroscience departments, known as prosopagnosia.
This is a condition where people cannot recognize faces. They can recognize chairs, and cars, and all kinds of other objects, but not faces. It's interesting to know that people may not even know that they have this condition. I once met a person with prosopagnosia and this was a professor at a prominent university doing very, very well in computer science. And he only learned about the fact that he had prosopagnosia when he was in his 30s, basically. He could recognize people by their gait, by how they walk, by their clothes, by their voice, in all sorts of ways. He didn't know that people used, actually, facial features to recognize others.
So studying lesions is one of the main entry points to actually begin to figure out under the hood what's happening inside the brain. And so fast forward a couple of decades. We now have a lot of information about the mesoscopic connectivity of different areas in visual cortex.
This is a famous diagram by Felleman and Van Essen. This diagram is based on the macaque monkey primate brain. We know much more about neuroanatomy in monkeys than in humans, for it's very, very hard to do real serious neuroanatomy in humans.
So this diagram, at the very bottom, you have this RGC structure, that's the retinal ganglion cells. Information from there goes to another structure called LGN, the lateral geniculate nucleus. From there onto a structure called V1, and then on to many different brain areas. Each of these boxes corresponds to a different brain area that's involved in processing visual information.
At the very top of this hierarchy, Felleman and Van Essen put the hippocampus. HC stands for the hippocampus. I think the hippocampus and then, also, you have the ER, the entorhinal cortex. These are not purely visual areas. So even though they are the pinnacle of this diagram, I think that they are not strictly visual areas.
But every other box here is a visual area. You can record the activity of neurons and they respond to visual stimuli. Lesions, if you make a lesion in one of these boxes, people have some sort of visual deficit. In particular, of course, if you don't have the retina you cannot see at all. If you don't have V1, major lesions that destruct V1 also render people completely blind.
So we know that each one of these boxes is involved in visual processing. A huge chunk of the primate brain is devoted to processing visual information. We are visual creatures.
So just a couple of numbers to throw, there are-- in the human brain, there are about 10 to the 11 neurons. Each neuron makes about 10 to the 4 connections, or synapses. So there are about 10 to the 15 connections in the human brain. This is in total. This is not just the visual system. This is the entire number of neurons.
So just by way of comparison, the population of Earth is on the order of 10 to the 10. And I don't know how to measure connections, but let's say that we find how many contacts you have in Snapchat or Facebook or in any other proxy for connectivity that you want to use for human connections. Let's say it's about 10 to the 12. Maybe there are some very popular people and maybe this is 10 to the 13. But in any case, each one of our brains has way more complexity, in terms of number of units, in terms of connectivity, than the entire human-- than the entire interaction of humans on earth. OK.
All right, so this connectivity that I mentioned is what I call mesoscopic connectivity. This is one area connecting to another. What is a tremendously exciting things that are happening right now in neuroscience, is that we are beginning to be able to elucidate the microscopic connectivity of neuron circuits. That means to know exactly which neuron connects to which other neuron at the highest possible resolution.
And so we'll have a lecture in a couple of days by one of the world leaders in this effort, which is known as connectomics, Jeff Lichtman. He gives amazing talks, I think you're going to love that. I hope you'll be able to interact with him and ask him lots of questions.
So one of the things that Jeff and many others are doing is taking small parts of neural and then coming up with basically a complete diagram of connectivity in those circuits. So we are very, very far from having that kind of diagram for primate species, even rodent species.
Interestingly, as a side comment, we have the entire connectome for the worm, for say, the sea elegance worm. And despite having the connectome, it's still extremely exciting and very difficult to understand what the computations are. And if people are interested, then Dr. Changwon over there, who's an expert on this, she's saying no. But not that she is.
Anyway, so having the connectome in that case, 302 neurons, does not immediately translate into understanding computation. There's still plenty to do. But I think this is an incredible road map that we're going to have. That's different from-- that would really jump start a lot of detailed investigations of neuron circuits, and hopefully also of helping us understand computations that we didn't have before.
So I want to take a quick tangent to talk about the fact that in humans, people are beginning to do [? neuralanatomical ?] connections, but we don't have any kind of diagram of mesoscopic connectivity similar to the one we have in monkeys. So one of the things we did recently, and I mention this very briefly, is try to look at functional interactions between different parts of the human brain.
So we have been collaborating for many years now with neurosurgeons who implant electrodes inside the human brain to try to understand different aspects of cognitions-- in the hum-- of cognition, in the human brain from the inside. So this is similar to-- these are people who have pharmacologically intractable epilepsy, similar to the patient H.M. that I mentioned earlier.
So before they try to remove the part of the brain that's responsible for the seizures, they try to map the different brain areas to try to find out where the epileptogenic focus is. So for that purpose, neurosurgeons put electrodes inside a human brain. And the patients typically stay for about one week in the hospital. And during this one week, we have the unique opportunity to examine and scrutinize activity in the human brain in awake patients while they eat and drink and do psychophysics and do cognitive tasks, and do all sorts of things. And this allows us to really begin to see what's happening under the hood inside the human brain.
In some cases, and this is going to be a bit technical-- happy to talk more if people are interested-- depending on the type of electrode that's being used. In some cases, we have been using what are called micro wires. These are 40 micron diameter, high impedance electrodes, similar to the ones that are used in animal models to study neural activity. And then we can get action potentials. The action potentials are the fundamental currency, the fundamental way in which neurons communicate throughout cortex.
In other cases, we have what are called stereo, or ECoG, type of electrodes, which are two millimeters. Huge electrodes that have very low impedance. And from those electrodes, we can get field potential activity, but not the activity of individual neurons.
So one of the things a talented student in the lab, Jerry Wang, did was to take data from many of these patients collected over nearly a decade, and then look at the correlations in activity between different electrons. I'm not going to go into the details. If people are interested, I would be happy to talk more about this. But this gave rise to what we call the functional interaction of the human brain. So this gives us an idea of which area of the human brain correlates with which other area of the human brain.
This is different from the Felleman and Van Essen diagram. That's anatomy. This is physiology. This is at the physiological level, who talks to whom. Which area of the human brain correlates with which other area of the human brain? Incidentally Jerry was a student in the summer course, I forgot, maybe four or five years ago. And he started working on this that at that time.
Anyway, so I won't say anything more about this. If people are interested, I'm happy to discuss this in more detail. This is a way of trying to look at something similar to a Felleman and Van Essen diagram, but in the context of physiological measurements from humans.
So just to go back a couple of decades now, the fundamental revolution in neuro-- people have been thinking about brains for centuries. The main revolution happened in about 1927. Edgar Adrian got the Nobel Prize for figuring out how to insert electrodes to record the activity of individual neurons.
Fast forward a few decades. This is a picture by-- a schematic picture by Hubel and Wiesel at the Harvard Medical School. They also got a Nobel Prize for using these electrodes to probe the productivity of neurons in primary visual cortex.
So this-- the ability to interrogate the activity of individual neurons gave rise to heroic investigations over the last five decades or so, where people poked electrodes in different parts of the brain, mostly in cats and rodents and monkeys, to try to figure out what neurons are doing, how neurons respond to different types of visual stimulation. So just being unfair to people who work on this for decades, just to give a very quick overview of what happened in seven decades of visual neurophysiology, summarizing that in one minute.
Stephen Kuffler started recording the activity of neurons in the retina. He recorded activity of retinal ganglion cells. Together with other people, he was one of the first to describe what are called receptive fields. The notion that there are neurons in the retina, that [? tile ?] the entire receptive-- entire visual field, and they have a localized part of the visual field that they respond to. So there may be a neuron, if I'm fixating, I'm not moving my eyes, let's say that I'm fixating on that project over there. There's a neuron that's in charge of describing what's happening here, another one here, another one here, et cetera. So the entire set of neurons in the retina have a receptive fields that [? tile ?] entire visual field.
Hubel and Wiesel, who started with Stephen Kuffler, decided to go into cortex, into terra incognita of cortex, and record activity of neurons in primary visual cortex, while presenting different stimuli initially with anaesthetize Scuds and then later on moving on to monkeys. We can discuss about the philosophy of science, about whether science should be hypothesis-driven or not. I always like to quote David Hubel saying that the most sophisticated hypothesis he ever had in his entire life was this, that, "If I put an electron in V1, something interesting will happen."
Lo and behold, they spent-- these were really heroic experiments. They would spend day and night doing-- once they had the animal anaesthetized, they would then sacrifice the animal. So once they had it, they had a neuron, they had to work continuously around the clock basically. So they worked for days recording the activity of neurons. And mostly they were using the same kind of stimuli, light on or off, that Stephen Kuffler was using.
After one of these experiments-- in those days, they didn't have computers to present visual stimuli. So they had slides. Most of you are probably too young to know what a slide is. Basically this is a small rectangular display where they would put it back in the projector, and I would project onto a screen.
So they realized, mostly in a serendipitous manner, and life always favors those who are really working hard and people who have very strong powers of observation. They realized that the neuron went like crazy every time they put this light in and out. It doesn't matter very much what the content was. What really mattered was putting this light into the projector and out of the projector. And that's how they discovered that with the neuron was really responding to was the edge, basically, between this light and the projector and the movement of inserting.
So that gave rise to study and discovered the fact that there are neurons in primary visual cortex that are tuned to different orientations. So here, for example, if you have a bar that's horizontally oriented, the neuron essentially doesn't care. And if you have a neuron that has a vertical orientation, or this slanted orientation, the neuron goes like crazy and starts firing. So that was the discovery of orientation tuning that they got a Nobel Prize for this work.
And that gave rise to a whole generation of people studying visual cortex. I think it's fair to say that everything we know about the visual system can ultimately be traced back to these [INAUDIBLE] investigations.
OK, so roughly speaking, people divide the visual system into two parts, the temporal pathway, also known as the "what. pathway," mostly involved in visual object recognition, but also a pathway that's most involved with where things are with movement, with stereo information. If you put electrodes in the dorsal pathway, for example, a famous area called MT, there are neurons that respond very vigorously to motion. And they can discriminate whether things are moving to right or to the left and all kinds of aspects of motion processing.
And then Bob Desimone working with Charlie Rose discovered that there are neurons that respond to very complex objects, for example, neurons that respond to faces. And those are mostly located in this area called the inferior temporal cortex, in here.
OK, so again, most of what we know about the visual system comes from study in animals. I want to very quickly tell you about the human brain. Again, working with these patients with epilepsy, we can begin to interrogate what happens in the human brain when we present visual stimuli.
This is an example of an intracranial field potential recording. This is not individual neurons. This is a field potential recording in an area that we think is analogous to the macaque monkey inferior temporal cortex. We refer to this as the inferior temporal gyrus.
And here what you're seeing is the intracranial field potential in one of these electrodes in response to the presentation of a picture of the face. Here on the y-axis is the intracranial field potential in microvolts. On the x-axis, you have time. This is the onset time. This is 150 milliseconds. You can see that there are very strong and reliable responses upon presentation of these stimuli.
Here you have another example electrode here showing you every single individual trial. There's no massaging of any flavor here, this is raw data. You can see that in every single trial, there's a deflection in voltage happening roughly at about 150 milliseconds after stimulus onset. That correlates very strongly with the visual stimuli.
So by and large, what we are discovering in the human ventral visual cortex parallels what we understand from non-human animals and basically reflects a very rapid, highly selective transformation and [? varying ?] set of responses to different stimuli.
I want to very quickly flush this, because people often ask me about this neuron. This is the neuron that's in an area of the brain that I would not call visual cortex. This is at that very, very top of that Van Essen-- Felleman and Van Essen diagram. This is a neuron that for this particular example is located in the amygdala in the human brain. We've also recorded from similar neurons in other areas, the [? anterior ?] cortex and the hippocampus.
And this is a case where we record the activity of individual neurons. We were presenting multiple different pictures. And again, in a completely serendipitous fashion, we found this example neuron that responded in a pretty peculiar way. What you're seeing here, for those of you who are not used to these kind of diagrams, these dots here correspond to action potentials.
Each dot is a single action potential from one neuron. Each row corresponds to one trial. So for example, here, you have nine different trials when the subject was looking at this rabbit. OK?
And you can see that the neuron did not fire too much. Why did the neuron fire at all? Well, all neurons essentially have some rate of spontaneous activity. There's basically no such thing as a neuron that's completely silent. So even if a neuron doesn't care at all about the rabbit, there's still some spikes. And this is prevalent throughout cortex in all kinds of species that we have-- that the people have studied.
And we can debate about what's happening there. Why is there spontaneous activity? Why do neurons fire all the time? It's quite expensive to fire. It costs energy. So we can debate about why that happens. That's a fascinating question in and of itself.
In any case, so we present this picture, nine times the neuron doesn't seem to care. We present other pictures of faces, the neuron doesn't seem to care at all. And then, again, in a completely serendipitous fashion, we realize that the neuron fire quite vigorously to these three different pictures of building.
Here, there are 15 different pictures. In this particular experiment, we presented about 50, five zero, different pictures. What's quite remarkable here is that these three different pictures are quite distinct from each other. One is a black and white drawing. The other one is a color photograph. The other one you probably cannot see anything from your distance, but Bill Clinton is standing up. You can barely see his face. He's shaking hands with someone else. And so despite the fact that these are completely different, the neurons seem to somehow selectively respond to these different images.
This is work that we did a very long time ago. I still don't know why this neuron responds in this way. One of the limitations in these kind of experiments is that we have a very limited time with each neuron. So we cannot exhaustively sample the visual system of space. And I will come back to this at the very end.
So this is a very serious methodological concern in all of the studies about vision. We present a random set of images, or an informed set of images, or a set of images that is guided by our intuitions, by previous work, et cetera. But at the end of the day, we don't have a full mathematical proof or an exhaustive sampling of the entire set of different types of images. So I don't know why this neuron responds in this way. And I'm happy to talk more about this.
OK. Yes?
AUDIENCE: [INAUDIBLE]?
GABRIEL KREIMAN: Indeed, indeed, yes. That's a great question. So I'm just highlighting one example neuron. There's another neuron that responded to Jennifer Aniston. There was another neuron that responded to the White House. And this gave rise to a whole industry-- there are lots of people who followed up on this type of work.
There are neurons that seem to be highly selective and highly specific. Not all neurons. There are other neurons that respond to all chairs, or to all faces, or that are much less picky compared to this one. But yes, is not the one neuron. There are tens, maybe hundreds by now, of neurons that people have described. Still, a very small sample and very anecdotal in a way.
But this is not the only way. There are many other neurons that do respond in a very selective way. Caspar?
AUDIENCE: Do you have a sense of how many neurons someone would need to screen before they could find one that responds to something in [INAUDIBLE] way?
GABRIEL KREIMAN: That's a great question. And I can refer to a paper where people try to estimate that. So what are the odds? You're randomly listening to a neuron, what are the odds that you would find such a thing? So a couple of things. And these are mostly conjectures. I think there's-- so there's a study where people tried to estimate these numbers. And they said, well, the probability may be around 5% of finding such-- there's a lot of assumptions in these calculations.
So one thing that I should point out is, I think that most of these neurons do not seem to be in visual cortex. These neurons in the amygdala. The textbook version of what the amygdala does is that it's involved in emotion processing. So it's conceivable that this neuron is not really selective for Bill Clinton.
Maybe this person loved Bill Clinton, they thought that he was funny. Or they hated him. Or maybe they thought that he was cute. Or maybe they were Democrats. Or there are all kinds of interpretations that transcend that visual processing.
The other point I want to make is that these kind of very selective and [? varying ?] responses have mostly been described for famous things, for famous landmarks, and famous people much more so than a new object, let's say. So the conjecture there is that there may be an over-representation that's dependent on the statistics of stimuli that people are exposed to.
This recording was done in the year 2000, where people who maybe have been-- this patient had been watching Bill Clinton on TV for the last four years every day. So it may be that the statistics of exposure to stimulus, to the world, may influence how many neurons respond in this way. So it may be possible that if I meet a new person today, I will not have a representation of lots and lots of neurons, unless I get to know that person.
So there may be something special about your family, about your friends, about wife, about kids, about Bill Clinton, about the White House, and there may be an over-representation of those stimuli. So not a very good answer, but that's a couple of points I want to make. [INAUDIBLE]?
AUDIENCE: Wouldn't it be also sample size? Because from what I remember from studies, these were patients mostly in LA, and they had limited time to record, so they chose images of famous people, because it would be more likely that they would find responsive [INAUDIBLE]. So there was-- that's why all of the neurons that have been found are either actors or former presidents, or [? such, ?] so--
GABRIEL KREIMAN: It's very, very plausible. Let me just tell you a couple of more anecdotal points. But I was the person running, actually, this experiment. And this was a bug in my code. I was not-- I was presenting all kinds of images, and I wanted to present each image only one. So I was not-- there were some famous people, but there were a lot of other stimuli as well.
So in this case I wasn't really trying to-- but indeed the work that followed up on this mostly was-- basically, had this conjecture that famous people would be more interesting, and therefore there is a sampling bias there, I think. I think that's true. I think most of the-- there was a disproportionate number of famous landmarks and famous pictures.
I want to come back to this because I think that now with the advent of computational models, we have much better ways of probing neuronal responses. And of course, you are very familiar with these, [INAUDIBLE] here also. So I want to come back to some of their more modern ways in which we have, I think, to try to probe neuron responses.
So I think there was minimal bias in this particular neuron. In general, most of what people did after this, I think, yes people were really-- there was a very high prevalence of famous object. In this case, I was presenting these kind of stimuli. I was presenting all sorts of-- yes, there were many famous people, but I was not really looking for that. But I still think that there may be something bias. I think that's a good point. Yes?
AUDIENCE: So has your process continued on to [INAUDIBLE] neurons? [INAUDIBLE] maybe it could be a face representative, for example?
GABRIEL KREIMAN: It could be a what?
AUDIENCE: A face representation?
GABRIEL KREIMAN: A face representation. So, this particular one I think it's not about faces. So I find it very hard to think that this is really faces. Here and there are other faces the neuron does not respond to.
AUDIENCE: I can't speak [INAUDIBLE] the state.
GABRIEL KREIMAN: The face?
AUDIENCE: State.
GABRIEL KREIMAN: State. Oh, the state like being happy or not, for example. Is that what--
AUDIENCE: Or like-- or the difference of, you know--
GABRIEL KREIMAN: Oh, the government, or democrats, or-- yeah, oh, yeah, absolutely. Yeah so-- so absolutely. So again, I don't want to claim that I understand what's going on here. I certainly don't.
So a lot of people ask, would these neurons also respond to the White House? Would this neuron respond to Hillary Clinton? I've been asked whether this neuron would respond to Monica Lewinsky. Some of you may know the story. I don't know the answer to all of these questions. I don't know.
So we really-- many of these experiments-- a lot of experiments in visual neurophysiology have been pretty random, I would say. We throw a lot of pictures, and we hope for the best. So it may very well be that this is connected to emotions. It may be connected to state, to politicians.
There are other politicians here. This is-- I don't know if you can see it from there. This is Kennedy. This is Washington.
But anyway, this is mostly speculation. I'd love to come back to this after I tell you about-- I briefly mentioned in the third part, what I think is a better way to try to probe neuron responses than blindly showing pictures and hoping for the best, which is what most people have been doing, in a way.
All right, OK. So if we really want to understand-- what does it even mean to understand cortex or computations? So together with many of my colleagues, I fully embrace the notion that was put forward by Richard Feynman here and many others before him as well.
Richard Feynman was a Nobel Prize, a very famous physicist. He said, what I cannot create, I do not understand. By create, I interpret that to mean mathematics. I interpret that to mean rigorous theories and computational models. I think if we are going to understand anything about brains, we really need to do that in the language of science, which is the language of mathematics.
So I'll tell you a little bit about a very, very brief introduction to computational models, just to have everybody on the same page. An image, and image recognition, has to do with taking an input that's just a bunch of numbers, and then understanding what's in there. OK?
So this may not be obvious to people who have never thought about computer vision, but for all of you, I think this is a pretty figure, right? So you look at an image like that, and of course you see a flower. The input, what's actually happening, more or less, at the level of the retina, is that you have a bunch of numbers. That's what represents the image.
In particular, if we convert, if we remove color for the moment, an image is just a matrix. A two dimensional matrix with numbers that represent the intensity of each pixel. To a very, very, very coarse, oversimplified approximation, we can think that the output of the retina, of the retinal ganglial cells, defined rate of retinal ganglial cells, is some sort of representation of the intensity of each location.
This is a very unfair way of describing the retina. There are many of my beloved colleagues that spend their lives working on understanding the detailed intricacy of the circuit in the retina. There's much more to the retina than this kind of representation.
But to a first approximation, many of us think of the input to the cortex as a bunch of numbers, OK? So you look at this, and you have to look at these numbers, and say that's a flower. And that's a pretty hard task. OK? That's what we mean by building computation models.
People do not-- so some of you may have seen this. Again, I apologize that I-- I know some of these things are deja vu for many of you. Seymour Papert was a very famous computer scientist and cognitive scientist working at MIT. Very, very lucid guy. Many of us, we love to work with students, we have summer students, there are summer projects. MIT has got the Europe projects and summer projects for forever.
This was-- in the 1960s, he proposed as a summer project, he said, we're going to solve vision, essentially, by creating a computer that can recognize images. This was the summary, the abstract for the project. He said, well, we'll get a bunch of smart MIT undergrads. And this summer we're going to solve the problem of vision.
This is the vision memo number 100. We still-- there's still memos at MIT. I think they became a little bit less fashionable these days, now that we have so many other ways of communicating science. In those days, those memos were one of the ways in which people communicated.
So this is a memo, the vision memo 100. You can-- people thought that we could solve the problem in one summer. This was, like many of us, perhaps a bit overly ambitious.
But anyway, so I want to just put a couple of basic guidelines of what we mean by understanding. And this is an interesting point that maybe will lead to further discussions, and I hope it will. Not everybody agrees on what it means to understand brains, to understand computation. So I would like to put forward a couple of basic requirements, desiderata for a theory of visual processing.
The first one is that it has to be image computable. We need to be able to take an image, and by that I mean a matrix of numbers, and then do processing on that. This may be, again, obvious to many of you. There are many theories that are not image computable.
There are many-- in psychology for example, there are many people who have beautiful ideas about how the brain might work. They may be correct. They may be very, very good. But they're not image computable. You cannot really have an input and manipulate it and do things with it. So that's what I mean by image computable.
I personally think that it has to be based on biologically-plausible mechanisms. That is, I'm particularly interested in how the brain works.
This is also not universal. There are lots of people who are interested in building vision systems that just work. And that's very laudable. And I applaud those efforts as well. Google may care about having a vision system that can describe anything in the world, without caring for whether it connects to brains or not.
But I'm personally particularly interested in the notion that we want to build systems that are actually based on biologically-plausible mechanisms. So roughly speaking right now, I think that has to do with, at the very least, a neural network type of architecture. Neural network is not quite a replica of the brain.
We can debate about exactly what we mean by biologically-plausible. That that's an ill-defined concept. And I don't want to get into too many details now. We can discuss what it means. But let's say for now, at least, it has to be something like a neural network.
These models should be able to account for behavior. That is, it should be able to-- they should be able to describe our visual illusions, our visual perception, how we recognize Bill Clinton, how we recognize chairs, how we get confused with circles that appear to be of different size, et cetera, et cetera.
I also think that they should account for neuronal responses. That is, ultimately, we want to be able to explain how neurons respond to different stimuli as well. And again, this is not universal. This is my own wish list, my own desiderata. There are lots of people out there who may not care about human behavior or about neuronal responses. I do.
But I think it's very important that this model should generalize. And I'll come back to this idea of generalization later on. I don't want to just build a model that works on the training set. Or a model that can only work with chairs of a certain type. I want models that are general, that can do everything that we do with our visual system.
And importantly for me as a scientist, it should be falsifiable. If a theory is not falsifiable, it's not a scientific theory. There's a lot of beautiful English literature. Jane Austen is not falsifiable. And that's fine. It's just Jane Austen, right? But if we want to build scientific theories, they need to be falsifiable. We need to be able to do experiments that will falsify theories.
Interestingly, if you ever read Karl Popper, you cannot prove theories to be right. You can ultimately falsify them, OK? So this is, I think, crucial for any scientific account of visual process.
And a lot of people think that it's, perhaps, embarrassing to falsify theories and to prove your models wrong. I personally think that's exciting. I think that's good. It's good if we have a model, and we can prove that it's wrong, I think that's scientific progress. That's good. That means we have to build better models.
A lot of people are afraid and say, oh-- You ask them about falsifiability, how would you falsify your ideas? And a lot of people feel intimidated by that. And you hear lots of talks of people who never think about how they can falsify their ideas.
I think most of the models that we have right now are wrong. A famous person once said, "All models are wrong, some are useful." So I fully embrace that notion. I think we should build models. I think those are essential for understanding.
We should build models that are falsifiable. We should falsify them. That's part of our work. That's part of our job. And then create better ones.
In a way it's a recipe for scientists to have to have a job for a very long time. So you build something, then you prove it's wrong. And then you build another one, and so on.
OK, so just a quick comment about computational models. Some of you may be familiar with this very famous piece of art by Rene Magritte. For those of you who speak French, and my French is very bad, I apologize. It's even worse than my English. [SPEAKING FRENCH]. So this is not a pipe. This is a representation of a pipe.
And there was a lot of discussion in the art communities about what exactly this means. The idea is that you cannot smoke with that, right? That's a picture, right?
So similarly, the kind of computational models that we build here are not a brain. This is not a brain. It's sort of always the French word for brain.
This is a representation. This is an abstraction. This is ignoring a huge amount of stuff that is happening in the brain. In particular, every single one, even the state of the art, our best models, the algorithms that allow my phone here to detect my face and recognize my face, all the algorithms that you see that can do instant segmentation, or your favorite algorithms, are really very, very far from neuroscience. They are very far from the mesoscopic connectivity of the visual cortex by Felleman and Van Essen. And they're even farther away from the real connectome described here.
We can debate about what's the right level of abstraction for a computational model. But I think it's important to have models that are abstract. That is, we are not trying to replicate every single atom in the primate brain. We want to have models that will feel the desiderata that I was alluding to earlier.
That doesn't mean that my models need to have ATP. ATP is the currency in biology for energy conversion. That doesn't mean that my model needs to have every single one of the different-- the hundred different types of interneurons in the brain.
So there will be some abstractions, there will be some mathematical rendering of what we mean by building computational models. And people disagree, and I think this is a fruitful discussion on what's correct-- what we can abstract away and what we should never abstract away. What are the basic ingredients that we can't get rid of, and what are those that are necessary? OK?
OK, so one of the first models of visual processing came from Fukushima in the 1980s. And if you look at the picture from Fukushima and you squint a little bit, you can sort of see the basic genesis of the current deep convolutional networks that are so fashionable today and have been so successful in the last decade in a lot of visual-- different types of tasks.
This was directly inspired by the recordings of Hubel and Wiesel. What I didn't tell you about the Hubel and Wiesel discoveries that they discovered basically two types of cells. And the one they called simple cells, the other one complex cells. The simple cells detect disorientations, but they are very picky to this particular scale and position of the stimulus within the receptive field.
Whereas, the complex cells had a higher degree of invariance. They were tolerant to the particular face and position of the stimulus within the receptive field. So basically, in Fukushima here, have this S layers and C layers, these simple layers and complex layers. That we're basically doing this concatenation of filtering operations to get more selectivity and then complex operation-- units to actually build invariance.
If we fast forward a couple of decades, Tommy Poggio, together with Maximus and Hoover, came up with the so-called HMAX model, which was again, based on this idea of Hubel and Wiesel, a hierarchical model that also was compared with behavioral metrics and with neurophysiological metrics. And these gave rise to basically everything we know of today in terms of basic models of computer vision. So these basic biologically inspired computational models were the pioneers of a whole generations of deep neural networks.
[? Thomas ?] [INAUDIBLE] scientists who work with Tommy Poggio as well. And he made major contributions to these type of models. And he will tell you more about that in a couple of days.
So fast forward a few decades. There were a couple of things that changed dramatically in the last decade. One is that we now have an incredible amount of data to be able to train computational models. Those were not available in the days of Fukushima, or even in the days of Maximus and Hoover and Tommy Poggio's model. So now we can train models with many, many more images.
We have access to exciting and tremendous computational resources that were not available there. And then people started using algorithms, especially an algorithm that many of you are very familiar with, stochastic gradient descent, or back propagation, where one can build this type of hierarchical models and start to do fine tuning of each of the weights in those networks.
By tuning the weights and the additional computational resources and the enormous amount of data, performance basically exploded in terms of there was a major improvement in the ability to do computer vision tasks and do pattern recognition.
So this is a rather outdated slide by now. This is a computation that's based on I give you a bunch of images, and I want to know what the labels are. So those images include chairs and faces and dogs and cats and so on. One of the famous data sets is called ImageNet. Again, many of you are very, very familiar with this. And the idea is, how many of those images you can label correctly.
So in 2010, the error rate was about 28%. This is outdated by now. This is 2016. It went down to about 3%. Some people have claimed that at least for this particular data set, it's almost solved in a way. So it's at the level of human performance, in terms of how well people can discriminate images. I will argue that we're still very far, and I'll come back to that.
But in terms of this particular ImageNet data set, which has about a million images, there has been an incredible progression of performance. You can imagine this is way better now than in 2016. But we have amazing algorithms by a combination of these three different ingredients, on top of the basic ideas of people like Fukushima and Tom Poggio. Yes?
AUDIENCE: [INAUDIBLE] I think the data, even [INAUDIBLE] absolutely would be back propagation. [INAUDIBLE]?
GABRIEL KREIMAN: Back propagation? Yeah, the paper-- so, the paper on back propagation, I think it goes back to 1990 you may-- what's that?
AUDIENCE: '85 or '88?
GABRIEL KREIMAN: Sorry, I cannot hear you very well.
AUDIENCE: [INAUDIBLE] '85 or '88.
GABRIEL KREIMAN: '85, '88. OK, all right. Yes, I agree. So the basic idea of back propagation is not from the 2010s. You're absolutely right. If I said that, I said it wrong. So back propagation has been around much, much earlier than 2010.
People-- interestingly, as soon as back propagation came up, I think the next month or the next week, there was an article by Francis Crick in Nature saying back propagation is not biologically plausible. And people have been fighting about the biological plausibility of learning in a back propagation way ever since.
But you're absolutely right. So back propagation, the timing is not 2010. It goes back much earlier. And again, sometimes it's about combining all of the right ingredients at the right time, with all the right tools. I think that that's the major transformation.
By the way, just to give credit in 2012, Alex Kruszewski and Jeff Hinton were the people who came up with this AlexNet, this is known in the community as a major paradigmatic shift in terms of performance. You can see a pretty large drop.
For those of you who submit papers to Computer Vision Conference, sometimes people fight about improving things by 0.5%. And they get very excited. They publish a paper when the performance of one algorithm is 0.2% better than the other one or 1% better. Going from 25 to 16, that's a big deal, OK? So that's one thing.
But yes, you are right. Back propagation was not invented in the 2010s. It was way earlier. Yes?
AUDIENCE: [INAUDIBLE].
GABRIEL KREIMAN: Yes. Yes, so that's actually-- You predicted that perfectly. That's the end of the first part. And we'll take a 10-minute break and then come back for the next part.