How did you learn the natural numbers?
Date Posted:
December 10, 2020
Date Recorded:
August 13, 2020
CBMM Speaker(s):
Pietro Perona All Captioned Videos Brains, Minds and Machines Summer Course 2020
Description:
Pietro Perona, California Institute of Technology
Link to manuscript - https://arxiv.org/abs/2012.04132
PRESENTER: It's a great pleasure to introduce Professor Pietro Perona. He's a Professor of Electrical Engineering and Computational Systems at Caltech. And I am especially honored to have taken some of my initial steps, in addition to his class, introducing computer vision to the students at Caltech.
So without further ado, I don't want to steal any of his time. So Pietro, all yours.
PIETRO PERONA: Thank you. Thank you. OK. So I would like to talk about the natural numbers. And this is a video. I have some bees in my backyard. And sometimes I make videos because I want to see what they're doing, but I don't want to hang around the beehive for too long. Otherwise, they sting. They get really defensive. And so this is [INAUDIBLE] bee. [INAUDIBLE]. I left my cell phone on top of the beehive looking down on the entrance [INAUDIBLE].
And so it's very interesting to see what they're doing. You will see that some of them come home with pollen sacs attached to their legs. I just saw a couple of [INAUDIBLE].
In any case, something that your visual system is doing for you as you are watching this, you are unconsciously aware of how many bees are there on the screen at any given time. And so here is one, another one. And then soon it will be three or four and so on. So this is something that is somewhat effortless, and it feels natural. And not surprisingly, [INAUDIBLE] 0, 1, 2, 3 and so on are called the natural numbers. And today, we'll be asking the question of, how did the natural numbers pop into your head in the first place?
And so there are different stories about natural numbers, and archaeologists are very interested in figuring out, how early did humans start to count? And so here is a piece of a femur of a hyena that was found in a cave in France-- and this is about 50,000 years old-- in a site that was inhabited prevalently by Neanderthals. And archaeologists believe that these notches are there for a purpose. And the purpose was likely counting. And there's a whole story behind this bone.
And it's very funny to read the article, because the archaeologists assembled a group of modern humans and gave them flint stones and asked them to put notches into bones of goats and, I think, cows, to see what the notches would look like, and how regular they were, and so on.
And from that, they deduced that these markings were not causal, but they were there for a purpose-- maybe for counting. And somehow, the idea that you would count this way, and [INAUDIBLE] the markings of notches, goes back to the definition of natural numbers that mathematicians have come up with about 100 years ago. And there is a story that is also interesting to read that means like Bertrand Russell and [INAUDIBLE], et cetera, who debated on how to give a foundation, a proper foundation, to the natural numbers.
And the idea of the natural numbers, the current definition that you learn in school, is that it's an equivalence class between sets. So if you could take different sets and you can put one to one correspondence, the element of the sets, then the sets have the same number of elements. And so this idea of putting things in correspondence is right smack on an explanation of how this bone was used. And so probably somebody was running their fingernail on the notches as, I don't know, members of a group were entering the cave or something like that to count. And so they put them in one to one correspondence with the notches.
And in archaeology, there are other examples of this. This is what is called a bulla. It's a clay bubble that contains pebbles, and it's believed that it was a way of counting a certain number for accounting purposes. And so suppose if you were a landowner and you had some sheep, and you were giving your sheep to a shepherd to bring into the mountains during the summer, what you would do is, as the sheep were going out of your enclosure, you would put into a fresh clay bowl a pebble per sheep. And then you would close the clay bowl and let it bake in the sun. And so you would have a record of how many sheep you had sent out to pasture.
And when they were coming back from the mountains in the fall, you would break the bubble and, again, put in one to one correspondence these pebbles with the sheep as they were coming back. OK?
So this is, again, part of a history of the actual numbers. And here, I collected a few objects that should be familiar. So on the left, you have to one correspondence. In the middle, you have the equivalent of a rosary, where the faithful say prayers. And for each bead, you say another prayer. And so again, you run your thumb on the beads. And so you advance by one bit at a time. And so it's not really counting. It's, again, putting in one to one correspondence.
And on the right, you have an abacus, which is instead built exactly for counting. And it is a much more abstract representation of the numbers. I'm not going to go into that.
Anyway, so the debate has been, for a while, on how does the brain do this? And the linguists, in the '70s, '80s are saying, well, it's-- the most abstract property of the brain is the language and the [INAUDIBLE]. So clearly, mathematics growth of language. And instead, the perception people had a different intuition, which is that the numbers are more of a perceptual property. And so we'll address this later.
But amongst the types of investigations that people do-- and this is from a paper by Stanislas [INAUDIBLE] and collaborator-- is exactly trying to dissect this question. And so here, he is looking at areas of the brain that light up in the MRI. So these are brains of mathematicians as mathematicians are cutting out mathematical tasks.
And what you see in blue, on the left, is areas that are lighting up. And I will read it to you. Element 1, which is the one on top, says you have to distinguish between sentences like LP [INAUDIBLE], which is the mathematical, versus the Paris metro was built before the eighth one, which is a fact. [INAUDIBLE] mathematical fact.
And so the claim is that the blue areas are the ones that light up for mathematical statements, and the green ones are facts that light up [INAUDIBLE] mathematical. And experiment 2, it's more-- it's equations like a plus b times a minus b equals a squared minus b squared, versus rock and roll is a musical style characterized by a slow tempo.
And so again, you see that more or less the same areas are involved and so on. And so this is taken as evidence that mathematics is different. It stands aside apart from other facts, and it has very specific evidence of a brain [INAUDIBLE]. This is way beyond the natural numbers I want to talk about today, but it's just to give you a sense of the type of discussions that are ongoing on the subject. And so the type of evidence that is being marshaled for distinguishing between language and mathematics is a special thing.
OK. So now I want to go back to the basics. So if you look at this picture, again, it's immediate for you to say, oh, I can see there is one buffalo. And there are a few birds. And so this is interesting. So for the buffalo, we know it's exactly one. And for the birds, we're not sure if it's five, six, or seven somehow, right? If you look at the [INAUDIBLE]. Of course, you can count and then you know. But at a glance, you know that the birds are less than 10 and more than 5, but you don't know exactly how many. But the Buffalo is just one.
And so psychologists have been aware of this fact, that for small numbers, you know in a second or less-- in fact, in a few [INAUDIBLE]-- how many. And for bigger than four numbers, you don't know quite for sure, but you have a sense of the amount. And so they call it numerosity, this idea of there being a sense of the quantity without knowing precisely what the number is. And it's [INAUDIBLE] to plus or minus 15%, give or take.
And subitization is the idea that, immediately, you have the number popping in your head. And so just to make things clear. So this is numerosity. If you look at these panels, you have a sense for how many bees there are, although you don't know exactly how many they are. And you can compare the two panels, and you know that there are more on the right hand side. OK?
And subitization, instead, is being able to-- the number four pops in your head immediately here. And so these are two things that people can do. And there is another thing that people can do, which is ordering the numbers. And so not only-- let me go back one slide. Not only here the number four pops in my head. But then I can also put these pictures in order and I know that 0 is less than 1, 1 is less than 2, and so on.
So there are lots of things that happen. And then the ultimate abstraction is natural numbers, where we have properties of sets. And so we can say that there are more pieces of pasta than tomatoes, and we see [INAUDIBLE] two pieces of pasta-- one tomato, four bottles of wine, et cetera. This is certainly something that you would have a child do with much gusto. Right?
OK. So this is a slide I borrowed from David Burr, who is the second physicist who has worked in this area. And he makes the point then that, again, you have subitization on the left, where you can tell exactly the number of objects, and that has no error at all. Numerosity, which is in the middle, it's between 4 and, say, 100, 200, 300, something like that. [INAUDIBLE] you have an estimate up to about 10%, 15% error.
And then, when you go beyond that, you have textures. And then your accuracy goes down dramatically. And it's-- you estimate sort of a number of items [INAUDIBLE] visual field. But it's not good at all. OK?
So that's the situation. So this is not natural numbers, but it's for some other natural numbers. And there is work in non-human animals. And so here you have experiments on chicks. And it turns out that this paper claims that chicks have a certain line of magnitude in their head. And if the numbers are smaller, they tend to go to the left. If the numbers are bigger, they tend to go to the right. So it's a very surprising observation which I-- I read the paper and I kept scratching my head and saying, no, this is not possible. But [INAUDIBLE] fairly well-- fairly well carried out. And so you're welcome to go and read it.
And I'm not here to review the literature, but there are quite a few papers on various species. And this is a slide I pilfered from-- I think also David Burr, although I'm not crediting. And so I'm sorry about that. But so there's a number of papers published on different species, on numeracy in different animals. And so you might wonder, why do animals need to know about numbers? And so there are situations that have ecological value, where you would imagine if animals want to know about numbers. For example, you know about schooling in fish and also flocking in birds. And so if you're a bird, you have an advantage in being inside a bigger school or group.
And so you've seen those videos, natural nature videos, where the fish-- the big school of fish has a predator going through it. They get separated and then the two schools of fish reunite into a big one. And that's because, if you're in the small school of fish, you want to be part of a big school of fish. And so there is an advantage for the fish to be able to quickly estimate where are there more fish? And to go there, right? So that's one case.
Another one, just to give you examples, is prides of lions in the savanna. And apparently, they can count how many calls are coming from a different rival pride and estimate the size of the other pride and decide if they have to get out of the way or to stay put because they are more and so on. OK?
So I'm not going to go beyond that in animals, but [INAUDIBLE] nice literature and it's ongoing. Stuff is happening, and people are discovering things. It's interesting.
Now the other question is, if animals have some sense of number, then it's likely to be innate for many of them. And how much is innate in humans? And here is [INAUDIBLE] work. And I'm wondering if she's a speaker in the summer school, and if she is, you can ask her more questions.
And here, what she's looking at is, if you take infants around 6 months old, can they tell if there are more dots in one panel or another? And as you know, it's not easy to do experiments on infants. And the way you do it is by showing them an object, a panel, a toy. And it turns out that, depending on age, they will-- but in six months, they will look longer. [INAUDIBLE] newer and [INAUDIBLE] unexpected.
And so the experiment [INAUDIBLE] habituating these infants to a pattern with about 30 dots versus a pattern of about 15 dots. I forget the exact numbers. And then you test them with patterns with more or fewer dots, and they seem to be paying more attention to patterns [INAUDIBLE] contained [INAUDIBLE] that is very different from the one they were shown before. And it has to be a factor of 2 in order to attract their attention.
And so there seems to be some-- not huge, but some level of innate ability to evaluate the number. And as you might imagine, these papers have lots of controls to rule out trivial aspects that have nothing to do with numbers. And so it's interesting to read through.
Now another piece of evidence that-- the number sense, or at least [INAUDIBLE], is built into a circuit in your brain. Comes from the work of David Burr and John Ross. And here, those of you who study vision will know about the phenomenon of habituation. Or those of you who study perception in general.
And so something you might remember is the waterfall effect. If you stare at a waterfall for 30 seconds [INAUDIBLE] avert your gaze, and you look at the trees near the waterfall, or the rocks near the waterfall, they appear to go up. And it's a very strong effect which is very impressive. And it can amuse your friends when you go on a hike with this effect.
And so what's happening there-- and so people have different points of views. And people say that if your motion, visual motion circuits, are fatigued by looking at the water that goes down-- the ones that mark motion going down and fatigue and so they will not oppose the ones that want to tell you that stuff is going up when you avert your gaze.
But most likely, it doesn't have to do with fatigue. It has to do with self tuning of a system. The system is always looking for a zero. And therefore, exposing it for a prolonged period to a certain stimulus will indeed offset the 0 of the circuit. So when you look at the stationary rocks, they will appear to move [INAUDIBLE].
So the same thing is happening with an evaluation of how many dots that are. And so the idea that Burr and Ross had is quite clever. So they make you stare at the top panel. But you can see dots on one side. And then if they test you with two patterns containing dots at the bottom, and they ask you which one was-- which one has more dots. OK?
And so the plot on the left shows you that, if you habituate at 30, which is where you see the arrow below-- I don't know if you see my point, but there is a little arrow at 30. Then-- sorry, I misspoke.
So if you habituate at 400 dots-- so you look at 400 dots. Then the other panel will appear to be more numerous than the panel-- so the panel in [INAUDIBLE] area of your visual field will appear to be more numerous. And so you see this shift of the psychophysics, psychometrics curve, as a result.
And this is another panel from Burr. And here you see again the way they do it. They first habituate on one side. They use white and black dots to avoid the phenomenon of just counting the amount of blackness or whiteness on the screen. And then they probe you here with a sequence of probe and test, and they tell you which one is bigger. And so again, the adaptation will cause a certain shift which they plot here in this curve.
So when you find adaptation, you think of this brain circuit that is dedicated to this job, and it's fairly well connected to the relevant perceptual circuitry. And so this is another line of evidence that shows that there is some hardware circuitry that specifically supports, at the perceptual level, of the natural numbers, or at least some elements that are building blocks of the natural numbers.
And here is, again, a fMRI picture from the [INAUDIBLE]. And again, it's a comparison of the human brain, a macaque brain. And there are [INAUDIBLE] areas where you can see that there is dedication to calculations and there are roughly the same brain. And so there is a whole discussion of how to find [INAUDIBLE] to pursue the studies in different species.
OK. So what have I told you so far? So first, I brought your attention back to natural numbers. And I've told you that there are a number of functions that are maybe earlier than natural numbers. So subitization, which I misspelled. Numerosity, comparison of quantities, et cetera. And some of these appear in many non-human species. And in humans and primates, we see dedicated brain areas. And there is evidence for some-- I shouldn't say just [INAUDIBLE] brain areas but dedicated [INAUDIBLE]. And there is evidence of an innate ability even in infants.
So the question I was wondering about is, are the numbers innate? And if they are not, how could you explain the way they come up? Because I have children. This is one of them a few years ago. And clearly, it's not that they need to go to school to learn to count and the natural numbers. It feels like it's something that they develop somewhat spontaneously.
And so the question is, what's the role of nature and what's the role of nurture? And so today, I would like to propose to you is a hypothesis of how this [INAUDIBLE] have no reason to know that hypothesis is correct. But I have a model, or a theory that shows that, with very limited amounts of supervision-- or in fact, without supervision-- the natural numbers may arise in your brain when you are a child.
And so here is the idea. So inspired by this child is on the beach playing with seashells. I thought, OK, so this is-- so my children learned manipulation very early. All children learn manipulation very early. You start seeing them, at six months, are they able to pick up objects? At between one and two, they start stacking blocks, et cetera. So they can pick up objects and put them in front of them. Or they can toss them away.
And so the idea is that, when they do that, they are able to selectively fixate on one object at a time. And they control their workspace. And so they have different operations they can carry out in order to modify the workspace that is in front of them.
And while they do it, they are watching what's happening to their little workspace. And they are taking note of the pictures they see.
And so the type of hypothesis I'm working on is that as you-- so you have this ability to pick and place objects. And you watch. And as you watch what you do, then your perception-- your perceptual system is being trained to recognize somehow patterns or invariance of the space. And that's what gives rise to the natural numbers.
And so I want to take you through the whole chain of reasoning. And so just to be clear, I'm thinking that this child has three basic operations available.
So one that I call take, which is to pick up an object in front and put it aside-- put it aside. One which I call put, which is pick up an object from the surrounding space and bringing it into the workspace. And then, another operation, which I call shake, which means just pass your hand over the object and move them around. Or if you have the objects in a little bowl, just shake the bowl and see what happens when the objects move around. OK?
So there are three operations. And this is, of course, my own little world that I've built. And I hope you will follow me on this path.
Now in order for things to work, then I also make the hypothesis that, while the child has no idea how many objects that are here-- so the concept of number is not in the head of the child yet. And so the question of how many objects you have doesn't even make sense. It's not that they don't know how to count. It's also, they're not necessarily paying attention to the fact that it might be a number that is related to the objects. OK?
But they make one exception, which is zero. And so you feel that perceptually, an empty workspace, is salient. In the sense that, if you put your hand there, there is no object to be picked up. If you have a bowl with no object, and you capsize it, nothing falls off of it. And so it's very special.
And so I'm supposing that empty is-- [INAUDIBLE] primitive that they can do.
Now where do I want to go with all of this? And here's where I want to go. I would like to have-- to arrive at a point where perception is able to take these images that we have seen, and it's able to map them to abstract concepts, which I indicated in the letters B, C, A, D. And the letters are purposely not ordered in the sense that it doesn't need to be-- they don't need to have an order at the beginning.
But if it's true, then they should map to A. And if it's C to D, et cetera. And then these are abstract concepts which are meant to-- eventually will become natural numbers. And you can put them into the right order, because you have these put, take, and shake operations, which cause you to move or not between specific sets.
And so the idea is that you reach this abstraction. So how do we think about it? And so first, you start from images. And initially, the images are meaningless, or you have some objects or not, but that's about it. But ideally, perception is about developing representations in your brain. And these representations are useful for cutting out tasks. And so ideally, somehow perception will be able to develop a map or a presentation or embedding where, eventually, pictures that contain the same number of objects will be mapped.
So I'm not telling you yet how this happens, but I'm saying I'm thinking that something like this should be happening. And once we have this happening, then the rest is fairly easy. Because what you would expect is that the brain is very good at clustering unsupervised learning. And so it's able to recognize that it has, in this case, four clusters. And [INAUDIBLE] in front of them. But as soon as a child will speak and will hear numbers from the patterns, we learn different names depending on the language we use.
And now, given the fact that you know how to-- you have these put and take operations. You also learn how to move from one to the other of these sets. OK?
And so you get, you know, like beads of a chain. And so that's the beginning of the natural numbers, because not only have these abstractions, but also you have a means of putting them in order. So I said beads of a chain, but it's beads of a necklace somehow. It's rings of a chain or beads of a necklace.
OK. And so you get this abstraction, and then you end up with something that is like the natural number.
So how-- and so the key crucial point is, how do you develop the embedding in the first place without knowing anything about numbers? And so that there is a little bit of magic there. You start off with no numbers, but there is some learning, and you end up with numbers. And so how do you learn it? So here is the idea.
So the embedding you want is one that maps pictures with two objects to the same place, give or take, and pictures with [INAUDIBLE] different place and so on. OK?
Now this sounds very much like things that go on with deep networks, where we can learn to classify images. And this is my field of expertise. And so that's why I use this analogy. And people distinguish between two phases when you train a deep network. One is developing features or embedding-- typically, the space is 1,000 dimensional, 2,000 dimensional.
And then these features allow you to classify the object that is there in the picture. And so, in this case, it's a mouse. And so you get more strength in the output for the unit that goes [INAUDIBLE].
So it's crucial to have good features if you want a simple classifier to detect a mouse. So how does this whole thing get trained? And so it gets trained in a supervised way. You have a number of [INAUDIBLE] images, where you associate with the image. You see the name mouse. And that's the desired output. And you check what the network outputs, and you compare the output [INAUDIBLE] with the desired output, and then there is some learning stage, which is gradient descent, which modifies the parameters, both of the Deep Network and of the classifier.
And so at the same time, you train the embedding. And then you train the classifier to do the job. OK? So this is supervised learning. And so you would think that if you could train your-- the embedding we want, the one that will give rise to numbers, that you could train it with numbers.
But the problem is that you cannot do that, because you don't have [INAUDIBLE] in the first place. So that's not a good way to go.
OK. The thing we know-- what is it that we know? And so what we said is, the motor system is picking and placing objects. And so the motor system is aware that it's either doing a put operation, a take operation, or a shake operation. And so you could think that you could train your network with a classifier for put, take, and shake.
And so that's conceivable. However, unfortunately, this is not going to work out either. Because put, take and shake is not the property of a single image. And so how could you possibly take an image and know where it comes from? It's just an image. And so you cannot do this.
So how do we do it? And so there is a way to do it. And it's to use what people in computer vision call a Siamese network. And so you have-- and the idea is this one, you have a little bit of short-term memory. So imagine that you're embedding. Perception is creating and embedding for the image before a certain action is made and after a certain action is made.
So this was a put action for example. So I mark it as take, which is wrong. It's a put action. And so you have these two representations. And now, the classifier can take the two representations as an input and decide which action was carried out.
So this is a Siamese network, because the two deep networks on the left are the [INAUDIBLE] just applied to two separate images at different times. And then the classifier is operating on those two representations. OK?
So now we have an approach. And so the question is, will it work? And so I want to take you through some experiments that [INAUDIBLE], are cutting out these days to see if this works.
And so the type of displays for now is simpler. We don't use tomatoes. We use computer generated bubbles that speeds up our work. And so we decided that in order to-- and so we want to be invariant with respect to size and with respect to contrast and so on. And so we randomize size and contrast. And then we take these displays [INAUDIBLE] put and shake operations to see what happens.
And so we can train the network, the Siamese network, this way. And this is what we see. So once the network is trained, it can indeed classify take actions, shake actions, and put actions. So let me take you through these panels. On the vertical axis is the log error. And you see lots of colored curves. So don't panic. Each colored curve refers to a different data set that was used for testing the network.
And so we tried different statistics of the points and so on. So don't worry too much. Just pick your favorite color. Follow your favorite color.
So what you see is that, after some training, the network does about 2% error for take, shake, and add. And the error goes up a little bit towards 5. Now we trained the network with panels that contain from 0 to 5 objects. No more, no less. Well, difficult to have less, but not more than five.
But then we can test the network with panels containing more than five objects. And what you see is that, in the blue region, you have panels containing 6, 7, and 8 objects. You see the error rates go up, but they're not completely [INAUDIBLE]. So they'll be about 7%, 8% instead of 2%. OK?
Now is the representation that is being learned useful also for numbers? And so here, the representation is doing something that may be useful to the child but not terribly useful to us if we're [INAUDIBLE] interested in numbers. [INAUDIBLE] this representation will work [INAUDIBLE].
So to have an initial sanity check, we take the deep network. We don't touch it. The [INAUDIBLE] it was trained with the task of saying which action happened. And we take a representation produced by this deep network. And we train a classifier on the side that should classify how many objects are there in the display.
And again, this is not what we think goes on in the brain. It's just as an aside test to see what kind of representation have we produced? And later I'll show you also a graphical representation of the representation.
So here is what happens. So if you train from the representation to predict the number of objects, you find that, indeed, a representation now allows a fairly simple classifier to tell how many objects that are. And so the error rates are about 1%, as you see from 0 to 4 objects, 2% for five objects, and then they go up a bit for more objects.
So it looks like the representation is actually quite good [INAUDIBLE]. And so what happens now? And so now after we have produced this representation, what we can do is we can look at it by using [INAUDIBLE], which is a dimensionality reduction algorithm. And here, we paint the points corresponding to each image with a color that corresponds to how many objects there are.
And as you can see, 0 is a very tight cluster as you would expect. Because the image is always the same.
And then 1, 2, and 3 are separated. And 4 and 5 starts merging a little bit. However, you can [INAUDIBLE] using any clustering algorithm you could recover these clusters quite well.
And there is one more piece of information which is, for each one of these points, we also know-- here, [INAUDIBLE] if you do a put, a take, or a shake operation. And so by using this additional information, the clustering becomes even easier. And so depending on your taste, you may want to do clustering purely based on the position of these points in the embedding space using also these topology operations.
And so this is another experiment where you see that here the points are really well separated in the embedding space. And so depending on how you train the network with more or fewer examples, you get bigger or smaller separations. OK.
So here, I'm reaching the end of my talk. So what we have now is exactly what we wanted. Namely, we have a perceptual process, which is represented by the green arrow, that will map images containing a certain number of objects into a cluster, or a group, or an abstraction, which is specific to the number of objects in that set. OK?
So it's not specific to the looks of the image. It's specific to the number of objects in the set.
And we do have a topology that tells us how different obstructions are concatenated, one with respect to the other. And then, if somebody tells us words associated to these abstractions, we can learn those words and link them to the abstractions.
OK. So just to wrap up, I have told you about various aspects of what people call the numbers sense. And we know that some aspects will be innate and many animals show some aspects of number sense. And what I was interested in, or what I'm interested in, is to see if we can explain how something more complex like the natural numbers can arise. And the [INAUDIBLE] I'm following is that the learner is an already skilled manipulator but doesn't have a concept of numbers.
And what we've seen is that, through watching one's own hands one manipulates, one can learn a representation that is going to be a useful foundation for the natural numbers. OK? I'm done. Thank you very much. I'm looking forward to your questions.
PRESENTER: Great. Thank you very much, Pietro. We've got some great questions here. The first one is from Sabia. Does the knowledge of mathematical operation come automatically with numerosity judgment? Or do babies start with pattern matching of different numbers before learning how they are related to each other?
PIETRO PERONA: OK, Sabia. So this is a wonderful question, and I don't know the answer in detail because I'm not an expert of development. And there is a whole set of people, psychologists, who study not only development but also what goes wrong when a child can't handle numbers correctly. And so it's a very important field, of course, in education and development.
What I can tell you is that, in order to judge the relative number of dots, you do not need to have a concept of natural numbers. Because, again, fish and chicks and so on can do it. And so I view that, the judgment of which set has more points. I view it as a much earlier and simpler manifestation of the number sense-- the ability to count and the ability to add and subtract.
PRESENTER: Great, thanks. Our next one is from Ronald Alvarez. Are multiple fixations necessary for numerosity and subitizing? Or can we perform good estimations of number using our peripheral vision?
PIETRO PERONA: Yeah, so the experiments that I quoted by David Burr are [INAUDIBLE] in a fairly-- OK, here, I mean-- well, OK. So these are done very quickly. And so the subject fixates [INAUDIBLE] in the middle. And the research [INAUDIBLE] the subject is unable to-- unable to fixate, because the patterns [INAUDIBLE] shown very quickly.
So it's purely done with what you call-- it's probably the near periphery. I would say it's within 5 degrees of the center. But yeah, they're done quickly and without moving your eyes.
PRESENTER: Thanks. And the next one from Sharon Chen, if a feed forward neural network is able to learn numerosity judgments and various operations, does that mean that numerosity is not innate?
PIETRO PERONA: OK, so Sharon-- OK, now I've found a way to also read these things. Right. So OK, so I suspect that, when people say numerosity in comparison, they have this term for a certain number of different abilities. Like when you see it in infants, you are accurate within-- so they are accurate within a factor of 2. And in adult humans, humans are accurate with a factor of 15%, 10%, something like that.
And therefore, at least in humans, you would expect that learning-- or whether it's deliberate learning or just being trained by life plays a role in finding this ability.
So definitely, I would say that it's one of those things that improves with practice.
PRESENTER: Thanks. And the next one from Guy [INAUDIBLE], is there a sharp threshold between subitization and numerosity? In other words, is it a different mechanism or is it the same serial counting mechanism at different degrees of difficulty?
PIETRO PERONA: Right. So if you read the literature, people say that there are two separate mechanisms. One is getting what is called numerosity. And that operates at all scales, including the small number, so including 1, 2, 3. And it allows you to compare magnitudes, and to estimate, and so on.
And so you could say that you are accurate within 15%. Then of course once you [INAUDIBLE] 5, then you will make no mistakes, right? And so it could be that it's exactly the same mechanism.
Now it turns out that you can also name the number-- not only compare the magnitude of the numbers. And David Burr has an experiment in which he's able, psychophysically, to separate the two and to show that the sense of quantity [INAUDIBLE] is susceptible to adaptation while subitization [INAUDIBLE] saying exactly the number is not subject to adaptation. And so he believes that it's two separate mechanisms and one of the two is more primary, namely judging the overall quantity. And the other one, subitization, is less primary. And I don't know the details, but that's what he tells me.
PRESENTER: Great. Thanks. We've got another from Guy. Whatever embedding is learned, in the end, is it a visual feature? What does it look like?
PIETRO PERONA: OK. So I find it difficult, Guy, to answer your question because I don't know what you mean, what does it look like? What I showed you is a representation-- a decent representation of the embedding we obtained from our experiments. Of course, our experiments are computational experiments. So we don't know if that's what goes on in the brain. We just say that this is-- so we have a sufficient theory for how you might learn.
I should emphasize something that I didn't say in my talk. Namely that people can count and can estimate quantities also from other perceptual streams-- certainly from a condition, for example, and from touch.
And so there are experiments where people are asked to tap with their finger a certain number, or they hear certain number of bell rings and so on. And this interferes, or interacts with, the visual perception. So probably the representation that we talk about is multi sensory. And we use vision to learn it, just to see-- just to keep it very simple. But I don't want to claim that it's only visual. I don't know if I'm answering your question or not.
PRESENTER: Thanks. We have a next question from Khwaja [INAUDIBLE]. A recent paper from J. McClelland on how numerosity might arise, they show that using a generic deep RL agent trained in a virtual environment, they show that, rather than specifying mechanisms specialized for mathematical learning, their methods provides an architecture in which aspects of the sense of number emerge from learning several different number related tasks.
The LSTM memory cell intriguingly learns a way to store memory implementation of a memory operation that is not shown to the network. Instead, it emerges as it learns through anticipating the demonstrated action sequences.
Can this resolve the magic part? Can such mathematical operations are just emergent properties resulting from multimodal interactions with the world?
PIETRO PERONA: Yeah. So yes, definitely. And now I'm blanking out on one aspect of J. McClelland's paper, and so I'm not able to make a pointed [INAUDIBLE] exactly a place where we differ. But there is one question on which we are differing.
So there are other papers where people have attempted to see if the number sense can arise from visual learning. And the type of work that I've seen has to do with numbers basically estimating broadly the number of objects but not with the question that we are asking, which is the natural numbers. Namely, this thing having these abstract concepts of 1, 2, 3, 4 and in what relationships do they stand with respect to each other? So I hope that this is clear.
So again, some people have worked on the numerosity question. And we are more focused on the natural numbers question.
PRESENTER: Great. Thanks. A little clarification question, what did you mean by topology in this context? And somebody was following up that might help you out. I think he means in connectivity in the sense of a graph, not in the sense of a topological space. And the following question--
PIETRO PERONA: Yeah. So OK, so think of the subitization property in which you look at three tomatoes and you say, oh, three. Right? So you have recognized the number, and you can associate a unique tag to it. And whether I show you three tomatoes or three pieces of pasta or three children, you would say three. OK?
And maybe, at a later stage, you will be shown something and you say, one, et cetera. Now that doesn't prove that you have the natural numbers in your head yet. Because in order to have the natural numbers, you also need to know in which sequence do they fall? That 0 is followed by 1, 1 is followed by 2 and so on. So you need to have a sense of how to walk from one to the other.
And so that's what they call topology. So you have-- you start off with symbols in your head. And that's why I used random letters to denote them, to indicate that there is necessarily no relationships within them initially.
But then, through the operations of take and put, you establish links in them. And you know how to go from one to the other. So it's a necklace of beads with a thread going through it, where you have the first bead, and then you've got the second bead, and so on. So that's what I meant by topology.
PRESENTER: Great. Thanks. The next one is from [INAUDIBLE]. How many prior information or images are needed to embed in order to form effective clusters? More is better?
PIETRO PERONA: Yes, yes. So as you might imagine, if you wish to come up with an abstraction through learning, given the way our visual learning algorithms work, you need a lot of training examples. Because think of the concept of one, and think of using tomatoes. If the tomato is in the top left of the image, that image is very different than from what [INAUDIBLE] in the bottom right.
And if you will, an image where there are two tomatoes in the top left is more similar to an image with one tomato in the top left, than to an image of one tomato on the bottom right of the image.
So if you wish to achieve an obstruction, you've got to have a number of instances of the number 2 or the number 3 and so on with different configurations, so that the perceptual network can learn how to generalize.
And so what we see is that you need about 10,000 experiments, or 10,000 moves, in order to get there. So is 10,000 a lot or is it little? And so I was wondering about that.
So I was thinking, when a child-- whenever it's playing on the beach [INAUDIBLE], they might stay there for maybe 10 minutes to play. And so in 10 minutes, you can think that they do maybe 10 actions per minute or something like that. So [INAUDIBLE] 100. And so it would mean 100 play sessions.
Now is that reasonable or not? I think it's a bit high as a number. But again, the visual system is being trained simultaneously to do many things, including the pick and place operation. And so the model we have does probably not account for the number of examples you need to train the system.
And if we want to a full account, we would have first to train the system to identify objects for pick and place operations. And after that has happened, we would move on to counting. I hope that's clear.
PRESENTER: Thanks. And I think this was referring to one of your slides. Why is the area different for one through five and six through eight objects?
PIETRO PERONA: Oh. OK, yes. Let me get there. So I said it but I said it a bit quickly. And so it may be-- OK. So you see that the panel is divided into a pink zone and a blue zone. Now the pink zone goes from 0 to 5 to [INAUDIBLE] x-axis. I don't know if you can see the numbers along the x-axis. And 0 to 5, those are the numbers of objects that were there in the training images.
And so let me show you the training images here. And here, you see 5, 4-- 5 and 4. I don't know why we don't have 2 and 3 and so on. But anyway, so up to five. And so the system is trained with up to five objects per image. But when we test it, we can test it with as many objects per image as we want. And so we decided to test also with six, seven, and eight, just to see what would happen. And unsurprisingly, these error rates go up, because these images are very different from the ones that we were using training. And so the system has trouble generalizing as you would imagine.
PRESENTER: Great. Thanks. Let's sneak in one last question. This is from [INAUDIBLE]. A clarifying question for you. With the Siamese network, are you classifying the actions based on two consecutive frames or images? Do you use the features before the action classifier for TS and E? If so, how does that TS and E cluster, the numbers almost perfectly, as the consecutive frame and actions might differ? For example, there were four tomatoes before I take away one tomato, so the number becomes three now. So action is taken.
But we have-- or so action is take. But we can have two tomatoes, and I add one, making the total number to three. But here, the action is put.
PIETRO PERONA: Right. So that would be somehow, if you will-- so that's a great question. So that would be the surprising thing that we find in some ways, right? So that the idea is that initially, the network is purely trained to do the task of classifying take, shake, and add. And this is what these three panels show you. So [INAUDIBLE] take, shake, and add.
And now the representation clusters by number not by action. And so that's an emergent property. I cannot tell you more than that. It's a fact that we find, namely that when we look at this representation with [INAUDIBLE], it is-- it was an earlier experiment, where we were using fewer training examples. But this would be more or less the type of separation you'd get if you were using enough training examples.
You know, I cannot tell you more than that. It's just a fact that we observe. And that would be the counterintuitive finding of the study.
PRESENTER: Thank you very much, Pietro. That was fantastic. Very nice. Thank you.
PIETRO PERONA: Thank you.