Neural Representations of Categories

Neural Representations of Categories

Date Posted:  August 19, 2020
Date Recorded:  August 18, 2020
CBMM Speaker(s):  Haim Sompolinsky
  • All Captioned Videos
  • Brains, Minds and Machines Summer Course 2020
Associated CBMM Pages: 

PRESENTER: So welcome back, everyone. We are on the verge of having artificial intelligence systems that can conquer the world, and that could do everything. Except for starting point presentation is in PowerPoint. That's beyond-- that's beyond the capabilities of current day AI.

In that case, it's a great pleasure to introduce Professor Haim Sompolinsky. Arguably one of the fathers of computational neuroscience, he's received numerous awards. I won't have time to list all of them because I really want you--

HAIM SOMPOLINSKY: Keep going.

PRESENTER: --to listen to him. So, Haim is jointly a professor at the Hebrew University in Jerusalem, as well as the professor of cognition neuroscience at Harvard.

So , without further ado, Haim, all yours

HAIM SOMPOLINSKY: OK. Very good. So I let me start with this image. It's a photo that I took last year at [INAUDIBLE] in Woods Hall. Quite by accident, in such a collage. But it's a beautiful sunset, and I just want to remind you that there is a beautiful world outside the virtual one. And I urge all of you to visit Woods Hall, and then in 2021 should they host you again.

What I'm going to discuss with you today is the question of how neural systems represent categories, or concepts, or objects. And, more specifically, how categories-- such as objects-- emerge from low level physical stimuli. We know pretty well-- we have a pretty good idea how early sensory systems represent the physical value attribute of the stimulus-- as an image, or a sound, or an order. But we lack a good understanding of how the brain represents object, and how the representation of object-- whatever form it has-- emerged from this low level physical encode representation.

Now here's another set of photos-- images. I don't know if any of the students can tell us who is on the top left.

PRESENTER: Somebody posted Aristotle.

HAIM SOMPOLINSKY: Arisotle. Yeah. It is Aristotle. Very good. Then middle top-- Euclid. The bottom left--

PRESENTER: Somebody posted CF Gauss.

HAIM SOMPOLINSKY: Yes, Gauss. Very good. And the bottom right, another famous mathematician, Euler. So here you have the full details-- Aristotle, Euclid, Gauss, and Euler. You see how many centuries or millennia separating those. Why am I showing you this and mentioning these famous thinkers and mathematicians?

And this is because it's very much related to what I'm going to talk about. Because the question of what is the essence of concepts, this question has been debated by philosophers and mathematicians and psychologists for thousands of years. The classical theory of what concept is is Aristotle and then later on Plato, among the founders of the kind of classical theory, just concept is an abstract entity, which is defined by some logical or metaphysical, sufficient, and necessary condition.

So if you want the concept of the dog, then you'll see there is a list of conditions that if and only if this set of conditions obtained, then the image or the entity corresponds to a dog, and so on and so forth. But this has been discredited a long time ago, and for reasons that I won't get into. Nobody was able to make sense except of sufficient necessary conditions.

And cognitive scientists sought different ideas about what concepts are. One famous theory by cognitive scientists in the '70s of the previous century it's called prototype theory, which is that basically, a concept, such as an object, is represented by an idealized image, let's say, of that object. Let's say it's a dog, then there is a prototype, an idealized image of a dog, which represents the entire object, an entire concept.

This idealized prototype may not necessarily be one of the real dogs out there. It may be or may be some sort of an average of many dogs to represent the essence of the concept. There are competing theories, exemplar theory, which says that basically, mentally, in the mind or in the brain, we store, not the prototype, but the few typical examples, real examples of each one of these objects, a concept.

And again, when there is a stimulus, we compare it, not to a prototype, but to the closest example. And then, of course, there is a whole literature and empirical data about combinations of those prototypes and exemplar or something in between, generalized and so on. But you can see how this is a hanging non-trivial question, conceptualizing concepts.

And what I'm going to discuss with you is something which I would call a geometric theory of concepts, or manifold geometric theory of concepts. It will have some relation to a prototype, as you will see later on. It will have some relation to examples, but nevertheless, it's neither. It is an attempt to capture the brain representation of concepts as, at least in some stages, in geometric terms.

This is why I showed you this famous geometers because the geometry that I'll talk about, as you see, has something we've-- Euclidean distances as the basic sense of distances, and so on. But it will be in an abstract, a high-dimensional spaces and trying to capture the shape of geometry of manifolds And for this, Gauss and Euler were famous for this thought machine, some of basic concepts of manifolds.

So before I go on, I just mention my collaborators on this work. It's a set of work for several years now. Soon Chung is now at Columbia University, Uri Cohen is at Hebrew university, Dan Lee is professor at Cornell Tech and director of the Sampson AI Center in New York. Ben Sorscher is in the lab of Surya Ganguli at Stanford. And most of the work, not all of it, but most of it, you can you can read in references that I'm citing here.

I'll begin with, as you'll see later on, with the theory-- and this would be the physical review, X paper, and then I'll continue with the application of the theory to the visual system and in artificial deep convolutional networks, which you can find more details in the Nature Communication paper published recently.

So I'll begin my introduction, the neural representations of object manifolds, then I'll talk about some theoretical constructs of manifold classification, manifold geometry. Then I'll apply to visual object manifolds in deep convolution networks. And I'll end up with applications to neural data. I'll probably won't have time to discuss future learning, so this will have to, perhaps, be in 2021.

So let me, again, recapitulate the challenge, the challenge of processing objects or whether it is recognition or classification or categorization. The challenge is to do it for the brain is to make those computations in a way that is invariant to their physical variabilites. So you can see the top right, we had a set of faces of different persons, but each face comes with a different pose. You see in the bottom right, you see orientation or a spatial location variability or even a category like a plane or a car or a boat can come in a variety of types of cars or planes and so on and so forth.

I'll talk in some detail later on about ImageNet data sets, but this is another example. I'm sure most of you heard about it. It's a huge data set of about 1,000 categories, and each category has about 1,000 images. So again, you can see yourself that this has reached variability in each category, not only in location or orientation, but in background, in many other things that appear in the image partial occlusion or something.

So how do we turn those individual, physical stimuli with the richness of the physical information about each image to a representation, which basically keeps the identity of the object as the main variable? So here is an abstract notion of that. So imagine we have a population of neurons respond to these images.

In this case x1, x2, x3 will be the activation of the activity level of the response of neuron 1, neuron 2, neuron 3. In general, we'll talk about the high dimension, so this will be, I don't know, 1,000 or 10,000 or few dimensional space. Each point in this space is the population response vector. It's the vector which represent the response of each one of the neurons to one stimulus.

So this is a point here, a vector on our end in neurons. And as you can see, there are a dog, and different scale of a dog, or different pose, and so on. Each one of these physically different image of a dog would just induce slightly different responses, and therefore, you will have different points in this representation correspond to the variability in some physical parameters.

But the collection of all these responses will be what we call a manifold, a manifold of a dog, in this case. Correspondingly, if you show the same population of neurons, images of a cat, in this case, and again, you'll have many, many points, but the collection of all the points that represent the responses in this population of a cat will be another manifold.

Now, that means that this population of neurons represents as a whole, respond are selective or sensitive, not only to the fact that there is a dog versus a cat-- in this case plus 1 versus minus 1-- but also to the physical changes correspond to a given stimulus. Now, if we want the representation, which will get rid of these nuisance variables and only represent the identity of the object, then you can think about hypothetically as a different representation starting from this one to the right where basically, you have one point in the top, which represents the concept of a dog, maybe an idealized dog or whatever, prototype of a dog and then another point representing cat. So this will be kind of a geometric visualization of the prototype approach.

Now, the question is, of course, is this what happens in the brain? Do we find such a representation, which is collection of neurons that respond to the identity or selected to the identity of the different objects, but nevertheless they are not selected to changes in the physical variables of the images? Because everything is just represented by a single point of prototypes.

Now, this may be the case. I'll just briefly flash an image from a paper, but a well-known series of works, including Gabriel Kreiman, about concept cells in the brain, cells that respond to famous players, even famous locations or famous politicians and so on. And in an amazing invariance in the sense that the same cell will be selective to a donor, maybe even to a cartoon of modal or maybe even to the sound or something on your visual. So it is a modal or multi-modal, a highly inventive presentation.

Whether these are concepts cells or grandmother cells or I would say highly responsive representations, it can be debated that when going to it, but I just want to say that this type of a highly selective and invariant responses are found mostly in the hippocampus. Particularly when the multi-modal [INAUDIBLE] to Bill Clinton will elicit or even the name of the person will elicit response as its image and so on and so forth.

But we are interested and I'm interested here in some intermediate stage, which is not in the hippocampal areas, which are related to many things, including memory and so on, but to essentially, a pathway, a high LT, and in particular, I envision the pathway between from the retina through, again, to primary visual cortex, and then to high order visual cortices, particularly what's known as the ventral stream shown here.

And in those areas, it's fair to say, you don't see such an invariant, prototypical grandmother set or whatever representation. Even at the top layers of the visual cortex, what's known as anti-cortex, the neurons, although they are selected to different objects, as we'll see, but nevertheless, they are also sensitive to the physical variability of the image, of the stimulus.

So the prototypical metaphor is not a good model for these stages. So for even the top hierarchy, the top stage in a sensor or in an individual. And it's an interesting question, if we have time to discuss at the end of my presentation is why and so on, but this is the fact I'm mentioning at the end.

So instead of talking about thinking about taking the object manifold from the first stages and compress it to a point, which doesn't happen in the sensory system, we're talking about changing the geometry, and particularly, the notion of untangling of the object manifold first introduced by Jimmy DiCarlo and David Cox and their collaborators and elaborated in our theory of manifolds. And basically, the picture is more or less what in the cartoonish fashion what we'll see here.

In the right graphical representation, again, if you have in the input layer or pixel layer or retina, you have those object manifolds, but they're highly nonlinear and very hard to separate them. And then as the information propagates through the visual hierarchy, those manifolds become more and more separated and untangled in such a way that ultimately, when you come to anti-cortex, they are sufficiently nicely separated so that you can easily decode the identity or separate the manifolds as a whole or decode the identity of the object, which are represented by those manifolds as you see in the top kind of anti-cortex.

So it is basically a dramatic notion, and the question is, how do we take this metaphor and make a theory or testing the game theory that can explain or can define or determine what variables or what geometric measures are relevant to the questions of untangling the manifold.

Now, one point which I want to mention, to stress and this is that this type of visual illustration of manifolds and untangling, it can be misleading. First of all, it's kind of freedom. It's two dimensional, but more aware, it is only showing two manifolds.

And in general, I mean high dimension, if you just have two manifolds, it will be easy to separate them by a hyperplane, even if you look at the pixel layer. So we did this test. We took data of many, many images corresponding to one object, the images corresponding to another object, and we found that even in the pixel layer, we can linearly separate them. So the challenge is more when you have many, many folds that the computation is supposed to support. And there, the question of untangling is much more challenging and in demand.

So fortunately, the same question or the same can be analyzed with respect to artificial networks. Again, basically, they have the same challenge that they have to transform the representation of the input into and what we call a feature layer, the top layer before the commission layer, before the soft mass layer, in such a representation that we'll have those manifolds easily separable.

So let me just say briefly, what do we mean by manifold? Here is kind of the mathematical definition of what we call what we mean by manifold. We have closed dimensions, and dimension is always a problem when we talk about manifold because there are many dimensions, and I would like to make sure that we understand we are on the same page about this.

First all this, capital N, capital N is basically the NBM dimension. So if you have a layer or a visual area or layer in a deep network consisting of either a million neurons, then N would be million or 1,000 neurons, and N would be 1,000, basically, the number of neurons to participate in the representation. So that's capital N.

Next, we have D. We consider mathematically manifolds like the manifold of the dog or the manifold of the cat to be of lower dimension, to reside in a linear subspace, which is much smaller than N. And this is D. D Is the linear subspace spending the manifold.

And the equation is D plus 1 because there is also the center of the manifold, which is relative to the origin called in a system, and that's another dimension. So all together, we have D plus 1 dimension, the center and the D dimension, so the D dimension or the linear dimension, in which the variability relative to the center reside. So I think that's enough.

So here you see, again, the manifold. You have a center here, and again, in this case, two dimensions plus the center, which is another dimension. And also, we introduce the notion of little r, where mathematically, we can take the same manifold, the same shape of the manifold in the D dimension and expand it or compress it, and this will be this little r. This is the mathematical construct to study the effect of the shape, of the size of the manifold.

I want to emphasize here, again, going back to this picture here, that D is the linear dimension that the manifold reside in, but the shape of the manifold is given by whatever the geometry is. In other words, the manifold that has spent, in general, the entire D dimensional hyperplane, it is constrained, it is a compact-bounded set, it has a well-defined perimeter, and so on and so forth.

So now the question is, given this manifold, how do we quantify the goodness of that manifold? How we define the class of computations that are relying only on the object identity?

So for instance, we can say object recognition. So you want to show an image and recognize that image as a dog versus all the rest, and this would be one example. Or we can say animals versus not animals, or I don't know, certain types of dogs from other types of dogs, and so on and so forth. So each classification, categorization, recognition, each one of these tasks may demand slightly different geometric requirements to be able to perform it.

So in order to come up with some more generic concept of object-based computation, we define the object-based computation basically as a class of binary dichotomies. We take a set of object or manifolds. We randomly label them either 2 plus 1, or 2 minus 1.

And we hope that there is a linear separator that separates the plus 1 from minus 1. And the point is that the labeling is random. So basically, by averaging over all possible labels, we span a broad range of specific tasks of computations, and by that, we generate something, which is more generic as a quantitative measure of separability of manifolds.

So then the question is, how many manifolds can be non-classified, and how depends on the NBM dimension, and how depends on the geometry, and so on and so forth. So I won't go into the mathematics, but I just want to summarize what the theory tells us, what's called a capacity of linear separation of manifold. So the notion of capacity denoted here by alpha C, periodical alpha, where alpha is the number of manifolds that we separate in a given task, in a given labeling, divided by the number of neurons.

And alpha C then is the maximum number of manifolds that our hyper ability will be linearly separable, but probabilities of all possible random labeling. So this way, it is, in some sense, a measure of information, a measure of object identity information in representation. So that, from a theoretical perspective, seems to be a very good measure of the separability of manifolds.

And one of the basic things that this theory tells us is that in high dimension, in non-NBM dimension, N, the number of separable manifolds, it's linear to N. So the quantity which is interesting is the ratio between the number of manifolds and number of nodes. And that turns out to be extremely important, not only from theoretical perspective, but also from applying the theory or those measures to read data, data from artificial synthetic networks, even more so form neural data. Because all these P small-- P is final, not small, but finite. N is finite. When we compare different stages in the hierarchy and maybe different, when we compare artificial versus a biological network, again, things dimension and embedding dimension may be different.

But once we know that the capacity is extensive in the sense that only the ratio, so when N is large and P is large, the ratio is what matters, then we don't have to worry about it. We can, for the sake of fairness, we can fix this. We can sub sample and fix the sub sample N to be, let's say, 1,000 and fix it for all labels and also for other networks and also for neural data and measure, again, and change P until we find that it's not separable, so measure alpha C.

And the point is that alpha C is now a measure of the system as a whole. We don't have to ask, OK, what happens if we take the entire neurons and use that because the theory tells us that everything was [INAUDIBLE]. So that's an extremely important, attractive feature of this measure.

So now, we can talk about limits, and again, that's something which well-known or can be derived easily. First of all, what happens if the limit of prototypes and the limit that each object is represented just by one point, by the center of the manifold, then linear separability on the random labeling is the same as linear separability perception of random points. And public theorem and other results show in high dimension capacities too. So that's upper limit. We cannot get with our manifolds better number than 2, at least under a random variable.

There's another bound, a lower bound, which is if there is no manifold structure, really, if we have manifolds consisting of finite number of points, that's the end point, but actually, there is no geometric structure here, that the points are random and we just collected N points and called it a cat and collected another N point and called it a dog.

But geometrically, they are not different from random points. In that case, you can easily derive another limit, which is 2 over N. So when N is large, when the number of points, the manifold is large, or maybe even infinite, then you get a very small capacity. So that would be the lower bound.

There's another interesting lower bound is when the manifold is actually continuous, so little m is infinite, but it lies in low dimension. So now you get alpha still like 1 over D. So basically, we can say what is tangled versus untangled is tangled, on one side is if the capacity is low, so it's 2 over m or 2 over D, and m is large and D is large, so it's very low capacity versus manifolds which are sufficiently untangled so that the capacity is over 1, whatever over 1 means, but that's kind of the separation.

And basically, the theory is telling us that these two regimes, whether we are in the tangled regime or an untangled regime depends on the geometry of the manifold. And precisely on two parameters, the red use of the manifold and the dimensionality of the manifold. And basically, so here is RM and DM. I'll talk in a minute what does it mean, but there is a notion of manifold reduce the size of the manifold and manifold dimension in how many directions it is spread, the vulnerability, so that's RM and DM.

And basically, the theory says, once you know how to compute RM and DM, you can plug in an analytic formula that we have-- I won't go into it-- force fields. So imagine that instead of complicated manifold, you have these fields. However, these fields have the values and dimension corresponding to the manifold themselves, to the actual manifolds.

And what's important is that the separation between tangled and untangled, the division where the manifold is untangled or capacity is over 1 is when there is this ratio, the combination of reduced, that's square root of dimension less than 1. If reduced of the manifold times the square root of dimension of the manifold, if this quantity is less than 1, then we can call it the manifold is untangle. The capacity will be over, will be high, will be over 1.

So that's kind of pilot of the theory, but what's interesting is the question, how do we measure the radius and dimension? Because the manifold may be quite complicated in the shape. The manifold may infinite number of continuous manifold, or it can be just a set of points. What is really the radius and dimension?

So we have the Euclidean simple notion of size shown here. So radius would just be the overall size. For a simple object, we have dimensionality, so this is the line, the variability spread in one dimension, here in two dimensions, here in three dimensions, and so on. But we have, in general, shape, which is more than the size and dimensionality.

But if we have objects, which are as weird as some of the objects, which are shown here in the bottom, it's not obvious how do we define the size or the ranges and the dimensionality. So here is, I won't go into mathematical definition, but I want to give you the qualitative essence of that. So here in the blue here on the top left, there is a manifold, and we want to separate this manifold together with the other manifold on the right from the two on the right manifold by some hyperplane. And let's suppose we can do it, so there is some hyperplane plane separating them.

But then, as you see, there is a point, which is very close on the blue manifold, on the banana manifold, point which is very close, which it touches the hyperplane. So that's an interesting-- that's a point which you call an anchor point.

Now, the point is that if we now take the same collection of manifold, but just opaque them or relabel them plus 1 versus minus 1, then, in general, there might be another point of the same manifold as before, but now another point is touching the plane. So this is another anchor point and so on and so forth.

So by sampling the labeling of the manifolds as well as the orientations of environment of manifolds, we are just moving around the manifold a point. And that gives us a statistic. That induces statistics on the manifold, a measure, in the mathematical sense, of the manifold, which is defined by the collection of this anchor point.

So these anchor points are the collection of points which are important to define this ability of any manifold relative to this environment of other dimensions. So that's basically the notion of anchor points. Here you see it more in the context of visual images. So here is-- imagine a manifold correspond to a vase, and a manifold separating from the manifold correspond to head cabbage.

And in the center of these manifolds, there is kind of the prototypical vase showing here in the image, and the prototypical head cabbage shown here. This will be prototypical, the center, the typical ideal vase and head cabbage. But now if you look at the point on the vase manifold, which is close to the hyperplane separating those, you will find an image, which is a vase, but not very different from a head cabbage, so this is kind of an anchor point.

Now, if you look at the right part of that figure, you'll see the same vase now separating from birdhouse. Now, the anchor point will be a vase, but there will be some figure of a bird on the vase. So this is why it is close to the separating plane. So you see that by taking one manifold and separating them from a collection of other manifolds, you're going to have an anchor point, which is similar to the notion of support vectors for those who that know, but now in the context of manifold, and that basically, defined the geometry.

And incidentally, again, you can see the relation between prototype and exemplars that I mentioned in the beginning of my introduction because in the manifold theory, the centers are important because the signals are in the separation of the centers of the manifold. Because the manifold would have 0r, zero radius, it means that they are going to collapse to be compressed into the centers of the manifold.

On the other hand, the centers themselves don't capture the geometry and are not sufficient to be able to perform the full separability to just ignore the geometry. So here, in this case, the set of anchor points are, if you want, a set the exemplars for each manifold that are needed to be stored in order to be able to define the separating planes between a subset of manifolds.

So I'll skip correlations, for the sake of time, and I want to maybe stop here. I see there are questions, before I go to application. This is the end of the theory. You've got questions, I'll accept now.

PRESENTER: The top one is from Jim Batista. How do we know the neuron is looking for Diego Maradona and not for Argentinean or a football player, et cetera.

HAIM SOMPOLINSKY: Well, because if you look at that experiment, you'll see that you can show the neuron or the human, different time continuum, football player. So again, the details of that experiment, I won't go into. It's marginal to what I am talking about.

PRESENTER: We have one from Surea. It seems that the number of manifold is correlated with the number of objects one can recognize, and when we learn new object categories, how does the manifold space change, and what happens to linear separability between manifolds since the number of neurons within a human is fixed?

HAIM SOMPOLINSKY: So that's a good question. When we learn a new manifold, to what extent we need to change the representation to accommodate them. But I want just to clarify. The P, the number of manifolds that entered into the capacity is not the total number of objects that we can recognize. It is the total number of objects that can participate in a given task.

So if I want to separate some manifolds, let's say cats and dogs from horses and cars, that would be peoples for-- not for, but depending on how many cats I have, how many dogs I have, and so on. So P can be enormous. The number of manifolds that we learn and we can do computations on can be enormous. But we are limited in any given computation.

Any given computation, any classification, if we want the system to be able to classify, have a subset of objects from another subset of objects, that is P. So that doesn't mean that at some point, we cannot see a new object. P is how many objects can participate in one computation. In another computation, different objects can participate.

So we can have one computation be separating male from female. So depending on how many faces we have, males and females and so on, we can have cars versus something else, and so on and so forth. So each of these computations is limited by the number of objects that can participate in that computation.

Let me go on, and then we'll have a few minutes for further questions, unless there is a burning question, again, of understanding. Because that issue of P, it was a good question, but I think this is important to clarify. It's not the maximum number of objects that we experience. It's maximum number of objects that can be linearly separated from each other in one computation, in one classification task. Good.

So let me just show you how we take this theory and apply to ImageNet. So here is an example. So here it is. We basically take what we call a point cloud. So imagine we take ImageNet, we take whatever the number of images per category as our manifold. Sometimes we'll say, well, we take on it the top 10% or top 5%, the details which you can read in the paper.

But basically, a manifold is a discrete set of points. And now, for instance, for our X net, we now take these images after training. We don't train the network. We them after training with deep condition networks, and we basically pass them these set of images through different layers, and we compute the capacity.

How do we do it? We take these images, construct the manifolds. Now we say, let's do a linear classification that randomly labeled them, find the linear classifier, see at what point how many of these we can do and so on. So this is capacity. So you can see how the capacity increases from the pixel layer to the feature layer at the top.

You can see here also the bottom horizontal line will to be if the all the points are shuffled so that it's basically random. And you can see that the pixel layer is not far from that. It's a 0.02. It's 2 over N now here. It is slightly better than that, but not much.

So that gives you a sense of how entangled the object representation is at the pixel layer. It is basically as capacity, which is only slightly larger than what would be if the whole representation would be completely structure-less. Now, you can see how the increasing Capex is correlated, as a theory says, with a very dramatic decrease in dimensionality of the manifold and decrease in the radius of the manifold.

And the combination of those two, of shrinking the size and reducing the dimensionality is not what is causing this increase in capacity. Again, we have made the several of these tests. I won't go into it. Populations, I'll skip it. You can see it's the same in ResNet 152. It reduces as a function of the skip layers, and so on. Dimension decreases at the end and then capacity increasing incrementally through the layers and then increased sharply towards the end of the layers.

Now, because I'm running out of time, I want to show you that we can use those measures also in neural data. I give you just one or two examples. So this is about face processing area. So that's work by [INAUDIBLE] Zho and Phi Wong that discovered in anti-cortex of the macaque, several patches or regions that are selected to faces, you can see here in the example neuron 1 is selected only to faces, but doesn't respond to other things and so in set 2 and set 3.

And furthermore, from anatomy me and from other physiological characterization, they came up with the idea that those different patches are organizing hierarchy. So we took the data, took the neural data now, took the images, and now, computed the capacity in anti-cortex in the top layer of face area in anti-cortex, the neural data, and compared it to what happens to the capacity would be to the responses of the same images if we take a VGG for faces, with network face for faces.

So you see the capacity starts slow and then goes up. And then interestingly, if the top layers give capacity, which is very similar to the capacity that is measured in anti-cortex. And this is not an accident because if you do the same comparison, but you train for object, just the standard VGG, there are some faces also, but genetic object image net type of data, including face, including many other things, you see that the capacity doesn't reach the capacity of anti-cortex.

So it's quite interesting and highly nontrivial that you have this quantitative correspondence between measure of capacity from the neural data and from the artificial VGG face. The same happens to other data. Here I just show you this is now a paper that will come out today or tomorrow in the archive work by Andreas Toreas and Manuel and Promocus and Uri Cohen and has analyzed it.

And this is now interesting because it's a mouse cortical visual cortex. And in mouse, we know much less about hierarchies, about the notion of hierarchies and so on. How that if you do the same, this is compulsive imaging of several objects, which are also, again, note taking or shifting. And you do the same measures of capacity in B or manifold values and manifold dimension, you can define a hierarchy in using these measures. And interestingly enough, here V1, unlike macaque visual cortex, V1 is not a beginning of the hierarchy in cortex, but actually, there are several cortical areas, which actually have lower capacity than V1 and also higher [INAUDIBLE] dimension.

And then other areas, particularly, this blue ones shown here or LM and AL, which have a higher capacity than V1 and register in dimensions. So you can see here in fully-known hierarchy, we can use this measure to actually establish the streams of information flow, and the results are not necessarily straightforward.

It's not the case that V1 is smaller than it is the first stage and the other side, next figures and so on. So I just want to go to briefly talk about open questions. First of all what I talked about here is measuring the concept representation of fully trained networks. But the interesting question is what happens-- and it's related to one question that I was asked earlier is what happens to if you learn a new concept, how well you can learn a new concept that's related to the future of learning and what is the role that geometry plays there, and that's an ongoing war between Ben Sorscher. I don't want to talk about it, but I think it's extremely interesting, and again, the notion of prototypes and exemplars appeal there as well.

Effective neural noise, an important question because we all know that trial-to-trial variability is very substantial in all parts of the brain with the cortex. So this neural noise makes those manifolds fuzzy because of that kind of noise. So the blurred boundaries of the manifolds and the impact of that on the ability to process objects is currently studied by Uri and also by Soon Chung, concept work, but ongoing work.

Another question is, related to the issue of why the brain is not using prototypes only. And one answer may be that if you just collapse an object presentation to a prototype, you lose information about the variability. So an ongoing work that Uri is conducting now is how manifold geometry plays multi, the role that it plays in the ability to extract information about identity preserving that. It's like the location of the object, that color or the pose of the face, orientation and so on so forth.

So we have in mind the trade off between changes in manifold geometry that will enable object identity computations versus a trade off between that and the desire to be able to easily decode also other variables. And we know that anti-cortex does have information about other benefits as well.

So two more points. One is computational. We so far used deep convention networks, which is standard training of HDD, and forced entropy, and cross functions and so on. But our theory, perhaps, can suggest better ways of training those convolution networks using, in particular, the geometric measures themselves as a kind of a target for supervised learning. That's something, again, we'd like to make progress on.

And finally, I'm going back to the Maradona and Argentina and football and so on. That would be a fascinating question how we go from the visual model untangled manifolds, but still manifolds to something, which is much sparser and multi-modal and connected to memory, et cetera, et cetera, representations of concepts in the temporal lobe, particularly, hippocampus as I showed before.

So all of these are ongoing works, and I think that's why I gave you at least a taste of how geometry, thinking about high-dimensional representations of four concepts in the world, of objects and categories, how thinking in terms of geometry can give us, not only qualitative, but also quantitative measures to understand some features of neural presentations, which is quite orthogonal to the classical ways of taking a single neuron and making it a separate field and so on, properties.

It's statistical, it's population level, it's geometrical, and I think that's an interesting direction of research to pursue. Thank you.

PRESENTER: Great. Thank you very much, Haim, for that wonderful talk. We are a touch over time, but we can still field a few questions, if you'd like to.

HAIM SOMPOLINSKY: Yes.

PRESENTER: We have one from Autonas. He asks, can you comment on the possibility of sparse coding representation in manifolds?

HAIM SOMPOLINSKY: Yes. So I didn't have time to talk about that, but sparsity is a feature which already exists in what I call the feature layer. So part of the changes in the representation that occurs from the input layers to the top layers, part of them has to do with sparsity. So I do think that sparsity is part of the micro mechanism underlying changes in the manifold. And definitely, when you go to the MPL, then it's extremely sparse.

So sparsity, I believe, is an important causal mechanism of some of the dramatic changes that I talk about.

PRESENTER: Great thanks. And we have one here from Yee. Our prototypes necessarily defined through a visual image?

HAIM SOMPOLINSKY: No. Basically, all of that I said, not only prototype, all what I said applies to other modalities as well. It can be auditory. It can be other sensory modalities. A prototype, again, is basically, you have to think about representation, high dimension. Again, you have to think about the collection of points in high dimension representing the different instantiations of an object or melody or a word. Imagine a word as a category in the auditory system.

So again, it would get manifold of all types of sounds that represent that word. And again, the theory says that the centers of this manifold, the separation of the centers is basically what I would call signals. So this will be the prototypes. However, in order to separate the different words, you will have to take into account the geometry of the variability around them, and this has to do with the kind of measure that I mentioned.

But the concept, not only the prototype, the concept of manifolds, anchors of the manifold, radius and dimensions, is quite general. It can be applied to auditory. It can also be applied to pure dynamics. So dynamics itself can be thought of as a variability in the neural representations of a given category simply because of the temporal evolution of the dynamic. So it's quite general not to stick to vision.

We have, however, most of the information that we have, both in artificial modalities, but artificial networks, but also in the brain, we have visions, so this is why we tend to focus on vision, but it's quite general.

PRESENTER: Great. And then one last quick question from Sophie Miller. Has the capacity of neurons been worked out for non-binary classification? How does the amount of information per neuron change with more categories?

HAIM SOMPOLINSKY: So I'm not sure I understand the question. A category here is an object. So if you think about the ImageNet, then the categories that I was talking about are the leaves, which are the 1,000 different categories there-- Persian cat, and Saudi camel, or whatever. These other categories of that object.

In that sense, capacity is not dependent on the number of object. Capacity defines how many categories can be named and separated. That's what capacity is. It tells us what is the maximum number of categories that can be separated in a given computation.

Now, there our higher level categories in the semantics categories, like animals with non-animals and so on and so forth. And that's an interesting question. And again, I didn't talk about it, but that's another ongoing or future work is to discuss the hierarchical structure of the manifold. What does it mean that those manifold themselves are part of higher level categories.

In that sense, this, I think, is a very interesting topic that we have not yet made sufficient progress on.

PRESENTER: Great. Thank you.

HAIM SOMPOLINSKY: Thank you, all. It was brief, I know, but I urge all of you to look at the papers. I'll put the slides on the website of the course. And feel free to write to me to ask more questions. I'll be happy to answer all them. Enjoy the rest of the course.

PRESENTER: This is wonderful, Haim. Thank you very much.

HAIM SOMPOLINSKY: OK. Thank you.