Laura Schulz: The Origins of Inquiry: Inference and Exploration in Early Childhood
Date Posted:
June 6, 2014
Date Recorded:
June 6, 2014
CBMM Speaker(s):
Laura Schulz All Captioned Videos Brains, Minds and Machines Summer Course 2014
Description:
Topics: Brief historical overview of key research that revolutionized the study of cognitive development; analogies between how scientists and children learn; overview of studies showing that (1) childrens’ generalizations depend on how evidence is sampled (Gweon, Tenenbaum, Schulz, PNAS 2010), (2) children infer the relative probability of hypotheses and choose interventions most likely to achieve desired outcomes (Gweon, Schulz, Science 2011), and (3) children isolate variables to distinguish competing hypotheses (Cook, Goodman, Schulz, Cognition 2011); If children are so smart why is learning so hard? Because of (1) limited image processing capabilities, (2) limited world knowledge, and because (3) inductive biases constrain learning (Schulz, Bonawitz, Griffiths, Developmental Psychology 2007; Bonawitz, Fischer, Schulz, J. Cognition & Development 2012; Bonawitz, VanSchijndel, Friel, Schulz, Cognitive Psychology 2012; Schulz, Goodman, Tenenbaum, Jenkins, Cognition 2008))
LAURA SCHULZ: Maybe I'll go ahead and get started, because I was already thinking I had until 12:30, and I only have until 12:00. So depending on what we do this afternoon, I may either save the last half hour for the beginning of the next talk or just skip through quickly.
So a couple days ago, I was presenting a kind of overview of ideas I think are important in cognitive development. Actually, can I stop one second? I have a really low echo here from the microphone. Somehow when I talked yesterday, I didn't hear myself projecting into a great big space. Is there a way to dampen it a little bit? Yeah, that seems more normal.
So I did a big overview of ideas in cognitive development. And I today what I'm going to do is talk a little bit more about my research program, in particular. So what I was doing in a kind of a whirlwind was telling you the backdrop against which, in some sense, my thesis program emerged, so what had been going on about when I started graduate school. And one of those big revolutions was an understanding of inference cognition. So the idea that babies knew much more about the physical and social world than we believed. So we talked about those Drawbridge studies by Renee Baillargeon and Liz Spelke-- studied infant to object concept. There was a lot of work I didn't talk about. I skipped the part on number. About early numerical representation, number representations in babies. Small exact emerge approximate numbers. And then also a lot of early social cognition. So stuff about goals. I showed you these studies as well.
So in addition to these early abilities, which are plausibly innate in our species and other species-- they're very early emerging-- there was also a revolution in our understanding of early childhood. So it wasn't just for babies. When researchers began looking at preschoolers, it became clear that children also understood the world in ways that were much richer than Piaget and others had believed. That their understanding of both the psychological and the physical world were abstract, were coherent, were causal, supported prediction explanation and intervention in ways that seem to justify and made reference to non-obvious properties in ways that seemed to justify referring to them as naive theories.
So as I'm sure you've heard in various ways, and as I was trying to allude to the beginning of the last talk, one of the real answers to how we think and how we learn is going to be the structure of the representations. It's not a bunch of knowledge and isolated facts that were in that style. It's going to make reference to structure. And the way we've talked about those structures in the developmental world has been as folk theories. Very specific theories of psychology. And this research forms the background to why.
But essentially, all of this research was really about the representation about what is knowledge like. What kind of knowledge children have. Is it abstract? Is it coherent? Is it causal? In some respects doesn't make sense through unobserved variables. It wasn't actually a research program about learning. So although this work is often talked about as the childhood scientists or the scientist in the crib, the reference to science is really about the structure of the representation. It's not about the acquisition of knowledge.
And again, it's how I started graduate school, with this metaphor of the child as scientist. And it's a problematic metaphor because, of course, science is this historically recent, culturally specific innovation that's practiced by a very tiny minority of our species. And as all of us know, it's very, very difficult. So it might it seem like a very strange place to look for fundamental properties of cognition.
But science has this very strange property, which it gets the world right. And it sometimes gets the world right in startlingly accurate ways. So if you really want to understand the generation of new knowledge, one thing you might want to ask is how is that possible? What kind of learning mechanisms should allow you to go from the stuff of the world and stuff in your head actually generating brand new ideas about the world? So it seems in that sense a very important place to look.
And so you might want start by asking, well, what is it scientist do? And the answer to that question, of course, is that we both do and do not know, which is to say that both formally and informally, we can list a lot of things that scientists do. And here are some of them. And they will all be familiar to you. And they would be familiar to you if you're a neuroscientist, or if you're a paleontologist, or if you were a botanist. These are the kind of things that arguably unite scientific inquiry across content domains.
If you did every single thing on this list, you couldn't necessarily do science. Right? Science requires bringing these abilities to bear on a lot of rich, detailed culturally acquired content knowledge, often with very specialized tools. So this is in no way sufficient for doing science. But arguably, if you had all that culturally transmitted rich knowledge and all of those technical skills, but you didn't do these kinds of things, you couldn't learn anything at all. So these are the kind of epistemic practices that are arguably fundamental to inference and discovery. And what I want to suggest-- in fact, the heart of my research program has been to suggest-- that they are fundamental to inference and discovery, not just in sciencee-- which our culture canalizes in this particular way-- but they are fundamental in early childhood. They're fundamental to learning. And that is because these are the only kinds of practices that we know of they can actually enable you to solve the hard problem of learning, which is the problem of how to go from sparse, noisy data to rich, abstract inferences relatively rapidly.
And so what I want to suggest is that all of these properties-- [INAUDIBLE] practices are actually present in children in the first few years of life. So that's what I'm going to talk about a little bit today.
I said a minute ago that we can characterize these abilities both formally and informally. And by we here, of course I mean my computational modeling colleagues, not me. But which is to say we can do reasonably good models of how you might maximize information gain in certain contexts, or how you might distinguish various associations from genuine causes. For any number of these and for much of the research that I do, it's been inspired by computational models of how you might do this or how you might do this or how you might do this.
But with the greatest respect for my computational modeling colleagues, and as much as we want simple laws that are going to capture these kinds of practices-- things like Head's Law or Bayes' Law-- none of them really does justice to what children do. And that is because none of these principles explain why all of these-- and there's nothing magic about all of these. This is a laundry list. We don't have some computational model that tells you all and everything that's necessary for inference, for exploration, for learning and discovery. We don't know the fundamental computational principles that unify processes of search and discovery. We cannot replace you but with machine learning at this stage, even though we can be quite precise about any one of these in some individual set of cases. I think that remains a very hard problem of cognitive science that I hope you all contribute to.
But what we can do is we can say that that's the kind of model to which our theories should aspire, because there's good empirical data that this is the kind of learning in which humans-- including very, very young humans-- engage in. So that's what I'm going to start out with today, just take a couple of these from the left and show you some experiments that lend support to this. But the whole list has been supported by research from many other labs besides my own.
That's part one of the talk. And I'm just going to short-hand it like this and actually talk about four studies including one recent study I'm doing at the lab. But the second part of the talk is going to be this, which is I just suggested that children are engaged in all of these epistemic practices that really support rich learning. And at the same time, we also have a strong intuition, as I said, that learning is hard. Learning is difficult. It's challenging. And there's lots reasons the things that children and the rest of us are often not very good at it in particular ways. And so I want to speak a little bit to both sides of that. Then I'm going to talk a little bit about what makes learning hard. And if I have time, then, at the very end or in the next half hour, I'm going to talk about this tension between rational learning and yet the costs of learning are not only characterized how we learn ourselves, but we understand those when we are interpreting the behavior of other agents. And that those joint understanding help us make sense of other people's behavior in some of the ways I think Rebecca was alluding to earlier. But I might not get to that for the second half of today, if at all. But we'll try to get there.
OK, so let me jump in, and just provide a little data to say why we think some of these practices that we may think of as scientific practices are actually quite basic to cognition. So I'll start with a simple one.
Science, of course, depends on drawing generalizations from small samples of data. So if I want to generalize from a small sample, there are lots of cues I can use. In particular, I can use feature similarity. You look like a person. If I open you up and I find that you have a spleen inside, I don't need to open up everyone else in the room. I can just say probably if they look a little bit like you, maybe they all have spleens too.
So children can do this too. If you have one rattle and it shakes, a very young baby will take shake other things that look like rattles and expect that they're going to make noise. So feature generalization is very simple.
A much more friendly way-- I can just establish linguistically that you have a certain label. You're a person. People have spleens. End of story. I can generalize from an object label from a category and use that as a inductive guide to guide my future generalization. Children can do that too. If one [INAUDIBLE] lights up a toy, other [INAUDIBLE] light up the toy.
As scientists, we of course do this. So for instance, if I take a sample of Martian rocks that have a high concentration of silica, maybe all Martian rocks have a high concentration of silica. If I take some Pacific silver fir needles and I find out that they lay flat on the branch, I can say, OK, so do other Pacific fir needles. Right? This is a common practice.
But we can actually do something a little bit more sophisticated than this. Sometimes we know something about how our evidence is sampled that affects the kind of inferences we're willing to draw. And sometimes it lets us ask suspicious questions like this. Do all Martian rocks have high concentrations of silica, or just those that were easy to put in your bag because they were hanging out on the surface of Mars and they were right there? And maybe they were dusty from the surface. Do all Pacific silver fir needles lie flat on the branch, or just those that were low on the canopy that you can easily pull off the tree?
And if you think you cherry-picked your evidence, if you think it's a select example, if you think it may not be representative of a population, you're going to constrain your generalizations accordingly. In the case to just populations that are very near the ground.
What we wanted to know was how far this kind of inference extended in development. So when we're making generalizations, it depends on how we got that sample. If we think it was a random sampling process or a selective sampling process. And we want to know whether inference generalizations would just depend on that feature similarity or on labels in the way I just described, or if they would also take the sampling process into account.
OK, so how are you going to do this with a baby? In this case, we showed babies a box that a transparent false front that looked kind of like that, and it had dog toys inside. And so the dog toys are blue and yellow balls, and in this box they are mostly blue. And as far as the baby can tell, it's not a false front. It looks like it's a full box of balls.
We reach in. We're pull out a blue ball. We squeak it. We pull out a second blue ball and we squeak it. We pull out a third blue ball and we squeak it. And then we go ahead and hand the baby a yellow ball that has been disabled. We've removed the squeaker. So now we can look not just at whether babies squeeze that ball, but how often they squeak it. How strong their expectation is that that thing's going to squeak.
In this case, there's a lot of feature similarity. This is a dog toy and that's a dog toy. A lot of reason to think that the babies maybe should squeak it. But also, there's nothing particularly suspicious about that sample. You could easily reach into a box just like this and randomly pull out three toys. And if those first three toys you happen to randomly pull out of a box squeak, probably all the balls in the box squeak. And we expect that in this case many children should try squeezing that ball and that they should squeeze often.
You can compare it with a case like this. We're going to pull exactly the same sample out of the box, but in this case, of course, it's very likely that you just randomly reached into the box and just happened to get three blue balls. The motions are identical in both cases. They're consistent. So if you're either knowing what's in the box or you're randomly sampling.
But you didn't just randomly reach into a box of mostly yellow balls and pull out three of the balls that squeak. That's unlikely to have been sampled from the whole box, and much more likely to have been sampled selectively. And if children are sensitive to that, then they may constrain their generalizations. [INAUDIBLE] to the population you sample. In this case, fewer children should try squeaking that ball, and the children should squeeze less often.
So let me show you what that looks like. I'm going to show you a couple babies. And they're actually all seeing three squeaky balls and they're all getting a yellow ball. So they're going to do a lot of the same things, because they're all playing with the same toy. We actually attached that handle so they could do something other than squeeze it if they wanted to. And I want to show you what it looks like. And though they're getting the same toy, only in this case do we really expect the kids to squeeze persistently.
Mean of 15 months.
So I'm going to show you just the data we put in the mean numbers. We also looked at the number of children who squeaked at all. And it shows exactly the same pattern. And what we find, in fact, is that the children squeezed much more persistently in the case where it looked like the sample with arguably represented the population then when the sample was selective.
So this just is sensitive to something about the relationship between the sample and the population, but what is it? It could be, that in this case, the children are just unwilling to generalize. They're willing to generalize from a majority object to a minority, but they're not willing to generalize from a minority object to the majority. So they might not be sensitive to the sampling process even if they're sensitive to some aspect of the relationship between the sample and the population.
So to check this, we replicated those conditions. We again pulled three blue balls from the mostly yellow box. A pretty unlikely sample. And we compared it to a condition like this, where we pulled just one blue ball from the mostly yellow box.
Now, the children are going to see less squeaking. Right? This is just a single blue ball. But the important point is that this is not an improbable sample. You could randomly reach into the box, pull a ball, and if it squeaks, maybe everything in that box squeaks. So even though the children see less squeaking over here, we predict in this case, if they really are sensitive to the sampling process, they themselves should squeeze more often.
So we did this in two different ways. We squeezed one blue ball once, and then we also did one where we squeezed one blue ball three times. And we got the same results in both cases. In fact, what you see is that when the sample looks like it could have been randomly drawn from the box, the children tend to generalize the property and squeeze the yellow ball more often. But when it looks like it's an improbable sample, we replicated the effect that they were unlikely to generalize the properties. So importantly, the kids are not just imitating how often they saw the ball squeak in a process pattern of evidence. They will sometimes squeeze more, even though they themselves have seen squeezing less.
Finally, of course, we're setting this to a computational model, which is a little bit tricky with babies. We did the one, two, and three data point. You can see that there is a graded progression of how often the children are willing to squeak, which is sensitive both to the amount of evidence they get and the relationship between the evidence and the sample. And this is actually consistent with two different possibilities. One is that the children are jointly inferring the sampling process and the extension of the object properties. And the other possibility is they just assume that people are engaged in strong sampling. But it's not consistent with the possibility that they just flat-out assume random sampling across the board.
There's one other prediction you might want to make, which is what happened if they see the same evidence? Did you specify behaviorally what the sampling process is? So suppose you take that blue box and shake it and three blue balls fall out. Well, that is an extremely improbable sample, but we're at MIT. We can have a hidden compartment such that three blue balls fall out, even out of a mostly yellow box. And if they do, then in this case, kids would say, well, I don't know. It's an improbable sample, but if you can just shake the box and three squeaky balls fall out, probably all the balls in the box squeak. And in that case, the kids themselves should squeeze more often. And in fact, that's what we found. There's a sharp distinction between this condition and this condition over here. Is that clear?
So 15-month-olds' generalizations take into account more than perceptual similarity, more than category membership. They are sensitive to both the amount of evidence they observe and the process by which the evidence is sampled.
That's generalization of category. Right? Here's a property. How far should I extend the property? But a lot of what we want to do in learning about this world is make causal inferences. We want to figure out the best explanation for the event that we observed.
And to look at this in children, we gave them what I think it's maybe the most fundamental problem of confounded evidence that all of us face, which is that we are in the world. And this may not like a problem to you, but like most problems, it is only a problem when things go wrong. And when things go wrong, you may not know, well, it is me? Is it something I did? Or is it something out there which I changed?
Right? You walked into your friend's house, you go to the bathroom, you flip on the switch, and nothing happens. Well, if you think that you've flipped the wrong switch, you should ask your friend for help. If you think something is wrong with the world, you should change the world, or at least the light-bulb out there.
So being able to divide between these distinctions is an important one. And what we're going to do here is give the kids a toy that they can't make work, and we're going to ask whether they can distinguish these kinds of two attributions. If they think they can't make it work and it's a problem, they should go ask for help. The should approach another agent. But if they think that the world is at fault and something there is the problem, then they should go ahead and reach for another toy.
This may seem like a fairly crude cut view of the world. But it's an important one. It's been instantiated in artificial intelligence and machine learning. It's the difference between the agent state and the environmental state. And in education literature, with issues of locus of control. Did I not study hard enough, or was that test too hard? Right? Or test unfair. So you can make these different attributions. And of course once you know this, there's still a lot you don't know. You don't know what you did wrong or what exactly is wrong with the world. But it helps cover the hypothesis space.
So what we're going to ask in this study is whether 16-month-olds can use minimal statistical data to make appropriate causal attributions, and then choose to seek help or explore, based on these different inferences.
So let me show you how we did this. We give the kids some toys. I'm just going to show the second experiment we ran. It's an experiment that involved two experimenters. And in one condition, each experimenter succeeded and failed once. So one experimenter pushed the toy and it worked, and then tried again and it failed. And the second experimenter tried the same thing and failed and succeeded. And this suggests that something is probably wrong with the toy. Right? Maybe there's a weak circuit, or a missing battery. Who knows what? But that's not a very good toy. You should probably, if you have a choice, reach out and change your choice. Get another toy.
The other condition was quite similar, except we varied the distribution of outcomes across the experimenters. So one experimenter always succeeded and the second experimenter always failed. This suggests that the second experimenter is incompetent. Right? So if some people can make it go and other people can't, then when you yourself can't make it go, it slightly shifts the weight of evidence to suggest maybe I can't do it. Maybe somebody else can. And if kids are sensitive to this shift in attribution-- [INAUDIBLE] probably me-- change the agent.
OK. So of course, this is very sparse data. Just getting a couple of trials. It's not the world of big data. We're in the world of tiny data. That's the world that babies and children learn from and live in very small data. And again, both of these outcomes are possible, which is shifting as the inductive bias, the probability that one attribution is more likely than the other. We wanted to know if kids are sensitive to this. So again, I'll show you a little bit of what it looked like.
So and this is what we saw across subjects. To [INAUDIBLE] distribution, the tendency to reach for help or to reach for another object.
I, like everyone, am guilty of bringing up the cutest babies for the talks, and the most demonstrative. But in the case of this study, it's actually a requirement of science that the senior author looks at every single set of data. So I looked at every one of the tapes. And in this experiment, you could've pulled any of these babies. They are extremely clear. They are handing you the toy. They are reaching for help. It is a very distinctive pattern of evidence across these children.
So this suggests that by 16 months, children are tracking statistical dependence between agents, objects, and outcomes. They can use minimal data to make attributions about the cause of failed goal-directed action. And quite nicely, in this particular setup. It left them with two different strategies, right? They could seek help from other others if [INAUDIBLE] strong, or they could go ahead and explore the world on their own.
I want to shift a little bit and bump us up an age. I'm going to take you now to older children. Preschoolers. Because we want to ask not just whether children can learn from evidence, including small amounts of evidence, but whether they can selectively seek out and find that evidence in the case where evidence is ambiguous. So do they actually design and ask them the kinds of interventions you might need to learn about the world.
Here's the setup for this study. How many of you played with little snap-together beads when you were in preschool? Remember those big, plastic beads? You could snap them together and make chains. Raise your hands. Yes. You guys know what I'm talking about? OK. [INAUDIBLE].
They're each uniquely colored. But what we did with these beads is we put them one at a time on a toy and the toy lit up and made music. And in one condition, every single bead activated the toy. In the other condition, only some of the beads did. The kids had no way of knowing until they tried which beads did and which beads didn't. But the important thing is always varied between conditions here with the base rate of the effective beads. Here, every bead works. Here, only some of the beads work.
What we did next is we showed the children two pairs of beads. One pair was a stuck pair. We'd epoxied it together. We couldn't pull it apart. We handed it to the kids, they couldn't pull it apart. We all knew that these could not come apart. The other pair was an ordinary snapped-together pair of beads. The kids were allowed to snap one part together as long as they wanted. So they could tell that was a separable pair.
Then we put both pairs back together. And we put them on the toy one at a time as a pair. And this pair made the toy go, and this pair made the toy go. And then we walked away. What were we thinking? We were thinking, in principle, of course, the evidence is confounded in all of these cases. Maybe just this bead made it go. Maybe just this bead made it go. Maybe they both did. For each pair.
But if you had just learned that every bead makes this toy go, there's not a lot of ambiguity. There's not a lot of possibility of information gain. You should probably assume that every one of these beads makes the toy go. and if that's the case, you should try indiscriminately with these pairs of beads.
But in some bead conditions, there really is the possibility of information gain. Only some beads work, and you know that. So maybe it's only this bead, or maybe it's only this bead, or maybe it's both of them. And same with this pair over here.
But only one of these bead pairs has the right affordances to learn something. Only one of them lets you isolate variables. Only the separable pair of beads can you pull apart and put individually, one at a time, on the toy. If you're sensitive to the possibility of information gain and you are sensitive to which pair has the right affordances to let you isolate variables and learn, the kids should selectively play with the separable pair, just in the fun beads condition.
And that is, in fact, exactly what we found in all these conditions. Only one child ever snapped apart the beads and put them on the toy. In the some beads condition, half the children did that. And did the intervention exhaustively. They put on each pair one at a time on the toy.
But my graduate student at the time said, you know, they're also doing something kind of interesting with the stuck pair. And I said, what can they do with the stuck pair? It's stuck. There's nothing that you can do. And she said, well, let's try it again. Let's adjust the stuck pair of beads. And so we did it again. We assigned the kids to the obvious condition of [INAUDIBLE] condition. But this time the kids only saw the stuck pair placed on the toy. The stuck pair made the toy go. And I want to let you watch what one of the kids did here.
So she's going to show them the individual beads. We'll skip through a little bit. Now she's going to hand the child this pair. That's what we did, and that's what the child's doing. And she plays.
And now she does something we have never done, and that she's never seen before. But [INAUDIBLE] contact causality is not a bad way to try to isolate your variable and determine which bead makes the toy go.
So that is a kind of intervention that had not occurred, for instance, to the PI on this study. But it makes a lot of sense, consistent with an intuitive theory of contact causality, which as we know 4-year-olds have, about how things are likely to work.
And again, when we ran this experiment, you almost never saw that behavior in the all-beads condition. Again, about half the kids designed this intervention and tried it out in the some beads condition.
So it just suggests that preschoolers can use information about the base rate of candidate causes to distinguish the relative ambiguity of evidence. And they not only select, they can design potentially informative interventions that isolate all those variables.
Now I want to talk about some new work in the lab on a kind of similar theme. One way that interventions can be uninformative is because they fail to isolate variables. You know that they're confounded. But there's another way interventions can be uninformative, because you just can't tell their outcomes apart. So let me give an example.
Suppose you go to your doctor and say, well, look, you may have disease A. Or it might be you have disease B. I'm going to give you this blood test, and so we're going to put this Z element and a tissue sample in a vial and shake it up. And if it's disease A, the vial's going to turn red, and if it's disease B, the vial's going to turn maroon. OK. And it's up to that lab tech to show you which disease. One is incurable, and the other one you're going to live forever and be just fine.
You may not be totally delighted by this test, because you can tell this is going to be really hard thing to tell apart. You might be much happier if he said, by the way, there's another kind of assay we can run, the X assay, and if it's disease A it's going to turn red and if it's B it's going to turn blue. OK? That seems like, on the whole, a much better intervention. The evidence is much more distinguishable. You're going to be able to make better contact attributions.
So if children are sensitive to the probability of information gain, they should prefer, all else being equal, interventions that generate really distinctive pattern of evidence. What does it mean to be distinctive? It means you can tell them apart. Right? Which means that in this sense, the way we've been talking about in our lab-- and this, by the way, is work by Max Siegel, who's pretending to not [INAUDIBLE] right now, but he's over there in the corner-- as an intuitive psychophysics. Right? So they'd have to be aware of their own ability to discriminate stimuli to be able to tell whether it's going to be ambiguous or not in these cases, and to make relevant judgements.
We're not going to put children's tissues in vials, so what we're going to do instead is this. Here's the very simplest version of a test like this. We're going to give kids a really --you can't tell here, it looks gray-- but this is the coolest pencil you've ever seen. It's hologram. It's shiny. If you were five, you would want it. The kids that pencil if they can find it. And they have a choice between this hologram pencil and this really boring pencil.
And one of these two is going to go into this box here. Either the hologram pencil or the blank pencil is going to go into this. And in this box, it's either the hologram pencil or this beanbag. That's what's going to go in this box. And you know what you're going to do? You're going to shake each box. And you know what you're going to hear? A pencil. No, sorry. You're going to hear [MAKES THUMPING NOISE].
And now the question is, which box should you open? And if kids are sensitive, of course, to where the evidence is more discriminable, but they certainly open this box here, with either the pencil or the beanbag, and not this box here, where the evidence is going to be hard to discriminate.
That's a relatively simple test of the concept. Here's a harder one. Here's a case where you really want these beautiful colored marbles here. Right? You don't want the boring white marbles. In this box, there are eight beautiful colored marbles and two really boring white marbles. And one of those vials is going to get poured into the box. And this box has either these eight beautiful colored marbles or these six super boring white marbles. Which box should you open if you want the colored marbles? Most kids are sensitive to the discriminability of evidence. They should prefer this box.
Let me show you a little bit of how this works. This is work by Rachel Magid and again by Max here. I'll let it run through a little bit.
She's going to repeat all of that, and then she's going to ask the child which box she picks. OK? All right.
So-- go ahead.
AUDIENCE: So if I didn't listen at all, [INAUDIBLE]. I think the inference would be the same.
LAURA SCHULZ: Exactly. And so one thing you want to know is can they only do it once they hear the evidence, or can they do it predictively. Can they do it even without hearing it? Do they recognize in advance which kind of evidence is going to be more discriminable?
We run that experiment too. So you give them a choice before you shake the box, and you say, which one do you want to open? But in order to answer a question like that, you have to in some sense say, well, what are eight marbles going to sound like? What are two marbles going to sound like? What are eight marbles going to sound like? What are six marbles going to sound like? And know that this is a more discriminable pattern than the other. Or what's the difference between a pencil and a beanbag, and a pencil and a pencil? You have to be able to know that one of those is going to do more discriminable than the other in advance, and choose them before we do an intervention. So we run it both this way, where they get the evidence, and predictively, when you're trying to look at the evidence.
AUDIENCE: So the beanbag and the pencil I get, but the marbles-- the question of color, the wrong color is-- if I was just to draw at random, I would go for the one with fewer of the white ones, not because it would be less discriminable, but because just like random chance I would get more-- I'd be more likely to get colored ones.
LAURA SCHULZ: They're in as a whole vial. They don't go on separately. Right? So either this entire tube goes in, or this entire tube goes in. Right? So they're fixed. They're [INAUDIBLE]. And what what you'll see is see is kids are very, very good at this. And they are, importantly, good at this anticipatorially as well as in a contact [INAUDIBLE].
AUDIENCE: So in these cases, though, once you've shaken both boxes, maybe it's easier to discriminate if there's two white marbles. Is it always--
LAURA SCHULZ: You always hear the same thing in both boxes. You always hear the desirable thing. The actual thing kids hear is the same. So really importantly, they have to in some sense be thinking counter-factually. You know, would I have been able to hear a difference if it were this versus that? They're always hearing the same thing. They always hear the pencil. They always hear the eight marbles.
AUDIENCE: So when you describe an experiment with [INAUDIBLE] and she had to find them, as opposed to being what she preferred, you can imagine that if she wasn't told specifically to find the marbles that you might want more marbles. And then if you choose the one on the right then that will always give you more marbles.
LAURA SCHULZ: For this reason, we didn't want to-- if you start introducing the kids' preferences, some kids are going to prefer beanbags. Some prefer something else. So we wanted to fix for them what the constraints of the task were. Otherwise, we'd have to interpret what they were doing and why.
AUDIENCE: Can they describe why they chose the box?
LAURA SCHULZ: We have some kids who--
[INTERPOSING VOICES]
LAURA SCHULZ: Yeah, they're actually quite good at telling you and explaining what this evidence and what it should be. And again, I feel like this is one of the things that seems like a simple task, right? [INAUDIBLE]. Yeah, of course. But now rate that-- you all are the right people to ask this. OK? So now get a machine to do it. Right? What do you have to do to solve a problem like that? What kinds of information needs to be available online rapidly such that you can recognize that in this causal context of shaking, with this kind of psychophysical evidence and this kind of simulation of the physics of the world, and your perceptual properties, you can make that kind of discrimination effortlessly at age four or three, even. We have three-year-olds do it. That's turned out to be a hard problem. Deceptively hard. But I think if you try to actually make it work, you will see what kind of sophistication has to go in to even relatively simple tasks like this.
OK. So, preschoolers. They have an intuitive psychophysics. They know when evidence is more or less distinguishable. They prefer interventions that generate distinctive patterns of evidence.
So now I've shown you, kind of, a very small tip of what it can be an entire talk on "Look how smart children are. Look at all the wonderful, smart things they do that are kind of hard for us to understand and to figure out." And usually, depending on who I'm talking to, the audience is more or less bristling at this point, and saying, "Yeah, yeah, yeah, but if you read the newspaper at all, you've heard of the crisis of STEM education in America. Don't you know people are bad at science?" You know, "if they're so good at this, why are our fifth graders incapable of answering remedial questions," et cetera, et cetera.
So I think it's important to say, well, if children are so smart, if they can do all these kinds of things-- and again, there's evidence across a host of labs for the emergence of these abilities in preschool and in toddlerhood-- I wonder sometimes, why is learning hard? What are some of the problems? So again, I'm going to adjust this with some research in my lab, but I think you can find it across labs. And I'll try to gesture at a few of these ideas.
So one reason, of course, was illustrated, actually, by an experiment that set out to show why children were so smart. So this is actually the first experiment I think I ran when I started at MIT. And we were just going to give kids a recurring variable here, A. And A was always going to be paired with an effect, but it was going to be paired with a constantly changing variablee-- B, C, D, E, F, G. And basically, we wanted to know if, at the end of the day, kids were given a choice between A and G, would they pick A instead of G as the cause of the effect? This seems very simple, and of course it is.
We gave them all stories. Monday morning Bambi runs in the pine grove, gets excited, runs in the cattails, has itchy spots on his legs. That keeps happening. He keeps running in cattails and getting itchy spots all the way until Sunday morning, when he runs in the garden and in the pine grove gets itchy spots. And you want to know well, why does he have itchy spots? You really should be able to say the pine grove. He doesn't always have itchy spots in the afternoon. He finds there's no trailing effect problem here. And this should be relatively easy, if the kids are tracking the evidence.
What we were interested-- in this case, kids have no prior beliefs. There's nothing at stake in why you might get itchy spots in the world. We compared it with a case like this. Kids have a stake in what causes tummy aches. And it happens, we know from previous research in kids' intuitive theories, that they believe that eating things that are bad for you can cause tummy aches. But they do not believe in psychosomatic causality. So even though they are subject to it, they sometimes get tummy aches when they're scared, they don't believe in it. And they will deny that being scared or being afraid can make you have a tummy ache.
So on Monday morning, Bunny is going to think about show-and-tell and eat some cheese and get a tummy ache. And then Tuesday he's going to think about show-and-tell and eat a Popsicle and get a tummy ache, et cetera, et cetera. And on Sunday he's going to eat a sandwich and think about show-and-tell and have a tummy ache.
And if kids are integrating their prior beliefs with the data they get, even though being scared occurs just as often as the cattails does, children should be less likely to endorse the A variable in this condition than in this condition. But they should still be more likely to learn from the evidence, when it's repeated, than they are when they just initially see the data.
And in fact, that's exactly what we see with the four-year-olds. Right? So you see this very nice mapping among the group of the 16-year-olds, where the kids are basically [INAUDIBLE] within the main condition after they see the evidence. And they substantially increase their endorsement of being scared after they see the evidence, which they initially do not believe in.
So that seems very clear, and that seems like another, "OK, well, kids are smart." But I want to show you this three-and-a-half year-old. Three-and-a-half year-olds look just like four-year-olds when the evidence is theory neutral. But they're completely flat when the evidence is theory violating. They absolutely never chose being scared.
And there are a couple of possibilities. Maybe three-and-a-half year-olds have stronger convictions. Right? They just really, really, really don't believe that being scared can cause tummy aches, and it's going to take twice as much evidence, or three times as much evidence to convince them.
But there's another possibility, which is that this task is hard for them. And that any kind of additional concept or additional load makes it hard for them to answer these questions. And one reason to think that might be the case, if you look at the data from the three-year-olds, they don't learn in either condition. Right? It looks trivial to you, but these kids have to sit down and read a storybook through seven pages, with a stranger, and answer questions about it.
So what we did in this case was do a training study. I don't need to walk you through all of it. But the important point is we just taught three-and-a-half year-olds to sit through a lot of books that were not about psychosomatic causality, but which did have recurring variables in them. So they got used to this task and this kind of presentation. And after having done that, then we asked them to explain the cause of a tummy ache, and the trained group did much better than the control.
The only reason I'm walking you through all of this is to say it would be really foolish to be a developmental psychologist of any kind and say that information processing abilities don't matter. Information processing abilities is an extremely vague term. It's kind of an umbrella for all the things that we don't tend to talk about when we're talking about computational models and the process of learning.
But we're talking about things like the ability to still. We're talking about the ability to inhibit responses. We're talking about the ability to direct your attention and not get distracted. We're talking about how well you're following the language of the task. We're talking about a whole range of skills related to executive function. To inhibition to just the ability to keep track and remember. Memory ability is to data. Those things matter. They can affect performance. They can affect performance hugely, which is part of it is really important if you really are testing children not to mistake performance deficits for competence deficits. It's easy to make kids fail. Easy to confuse them. You need very, very simple tasks, often to show disabilities. And it's important that they fall apart another other kind of [INAUDIBLE] because we've built a lot of cognitive tools just sitting in front of a bunch of them to support us in this way. And that's what enables us to do some of the sophisticated reasoning we do and a lot of what happens in science learning is the ability to use those tools.
Kids are very good at these things. Hand them a ruler and asked them to measure things, they're hopeless. It takes them a long time just to learn to line up a ruler with an edge, let alone to make a little table or a little chart where they can track what happened. And those kinds of tools, I think we shouldn't underestimate how important they are to what we do.
Here's another thing that seems really basic, but I don't think we can emphasize enough just how big an impact it can have on the types of behavior and responses you get from kids. This is an old task. It's from Piaget and Inhelder. They found it on-- children play with blocks. You've seen from Josh a lot of really good intuitive physics about blocks.
If you ask for children's explicit theories about blocks, they undergo a transition. So the little ones don't really have an intuitive say about blocks. [INAUDIBLE]. But somewhere around six, they have a theory, and they will explicitly tell you, at a certain age, that blocks balance on their geometric center. So if you ask them to point and make a prediction about these blocks, they will tell you that each one of these clearly asymmetric blocks balances right here in the midpoint, which is a pretty good generalization from non-asymmetric blocks, but really doesn't work very well for these kind of blocks. And it's only a little bit later that the kids go ahead and shift and realize, actually, you need to do this here. Now this is an explicit prediction. Proprioceptively, of course, the kids compensate. But if you ask them what their theory is of how it works, they have a wrong theory initially.
So what we did in this study is we asked kids to make predictions like this. And we sorted kids who one three for three trials picked the center, from kids who on three for three trials knew to adjust. And so we got 32 kids in each condition.
But I want to see these kids are kind of in the same classroom. You couldn't tell them apart. But they have really different theories about how the world works, or different world knowledge in a certain sense. But they're closely matched in age. I call these Mass Theorists and these ones Center Theorists.
And then we ran a little experiment like this. We let them play with the blocks. We let them find out it was asymmetric. Got familiarized with it. And then we opened it up like this, and we brought out a brand new toy. They played with that toy for a while, and we walked away. We wanted to know what the kids would do. And the prediction is that each kid selectively explores evidence that violates their intuitive theories. Then for the Center Theorists that should be really surprising. They should go back and forth to the familiar toy. But the Mass Theorists should go ahead and play with the novel toy.
And of course we compared it with a condition like this, where we passed it up just like this. And the prediction, again, is that kids selectively explore theory-violating evidence. Well, the Mass Theorists now should go ahead and go back and play with the familiar toy. But the Center Theorists should go ahead and play with the novel toy. Is that clear?
And of course that's what we found. So the kids are seeing the same evidence that their prior beliefs about what should happen is affecting their pattern of exploration and what they told us. It's not just affecting the exploration, it's also affecting their explanations.
When you play with this, you might wonder how it is that we managed to get a asymmetrically weighted block to stand up like this. And again, we're at MIT. We can do sneaky things. We put a magnet under there and it sticks. And to be fair, we put a magnet in both conditions, and every child discovers the magnet when they're playing. So all the kids spent some time playing with that block, and all of them find the magnet. And in all cases the magnet's a sufficient explanation for why that block stays up. If I say why is that block staying up? You can always refer to the magnet.
But what we predicted is that the children would selectively appeal to the magnet just when the evidence violated their theories. So that's what we looked for, and in fact that's what we found. So although all the children had found the magnet in play-- you can see that from the way they played with the toy-- if you asked them to explain the evidence, is the evidence is consistent with their theories, they don't refer to the magnet. If the evidence violates their theories, they do.
What does this have to do with the bigger picture program? World knowledge affects how children are going to learn about the world, and it's going to affect what they're going to explore and how they're going to explain data. Such theories can get entrenched. Right? If you really think the world works a certain way, if you really have a theory, that might entrench your learning in particular ways.
Which brings me to what I think may be the most interesting reason that learning is hard, which is that the same exact things that make us good at learning and that make us smart make learning hard. It's part of a single process. You've all been thrifty. You know that there may be a mass heuristics that get in our way, and that causes problems. Sometimes we flow. But I think that one of the really interesting tensions is not that we have different things that cause us to behave in different ways, but that we sometimes have single ways of thinking about the world. We have a set of inductive principles that are guiding us, that are going to do both. That are going to make us smart and also make learning hard, we'll say.
Let me show you an example. I'm going to introduce you to a brand new world with just a few, simple causal laws. These are the only laws that operate in this world.
In this world there is an indicator. If you mix it with an acid, it turns blue. If you mix the indicator with an oxyden, nothing happens. And if you mix the oxyden with an acid, it explodes. I want you to learn this. This is your world. Three causal laws.
Now you go down into your grandmother's basement, and you pull out a vial. And because you're a intrepid scientist, you mix it with the acid. And guess what? It turns blue. Now, you've never seen that vial before. You're not totally sure what it is. But how many of you intrepid scientists think you can now go ahead and safely mix the contents of the mystery vial with the oxyden?
So, hands? Do you think that's probably OK? It's not about that small data, right? Constrained by these prior beliefs. But sure, if you mix this with the acid and it turns blue, it's probably an indicator. If you mix it with the oxyden, it's probably going to be just fine.
So you can draw a really rich, rapid inference from very sparse data, because you have these abstract laws here. You actually don't know what it's going to do. You've never seen it. But you can make a prediction. Right? That's good.
What if you're wrong? What if you go and do that and it explodes? Now what are you going to do?
Well, there are a bunch of things you can do. You could say, this is some brand new substance in the world. When you mix it with an acid it turns blue. When you mix it with an oxyden, it explodes. It's something I've never heard of or seen before. You could do that. Or you could explain away the data. You could say, maybe the mystery vial just got a little bit contaminated with the acid here, and that's what caused the explosion. So you have the option not to posit a brand new entity. You could just explain away the data with reference to your existing rules. Right?
So the question we wanted to ask is whether children, first of all, would use for sparse to make these abstract laws that would enable rich inferences. And then if they got evidence that violated their beliefs, would they go ahead and try to explain it away with reference to those initial principles that they had learned? So we did it like this. But the point here I want to make is that the constraints that let you learn rapidly [INAUDIBLE] make you take a quick inference about that vial also can make you intransigent in the face of counter-evidence.
Here's how it works with children. We're going to show them a red block and a blue block. If you put them together, you get a train noise. If you put a blue block and a yellow block together, it makes a siren noise. The red and yellow block do nothing. So it's exactly analogous to what you just saw. OK?
So the kids learn this about these kinds of blocks in the world that we show them. And if they are just learning about these particular blocks, all they know is about this particular blocks. If they're just concrete learners and they only learn from these, that's the whole world. But if they're doing a generalization and they [INAUDIBLE] kinds of blocks, then that's what their world looks like.
Now we show them a mystery white block, but we put it to the blue block and it makes a train noise. If there's three kinds of blocks in the world, the kids should think, well, that's a lot like the red block. Right? And if it's a lot like the red block there, then if you put it with the yellow block it shouldn't do anything at all. So if you see the white block behind the curtain and the yellow block's there, and in fact it makes a noise, you might think, well, there's not one block behind the curtain. There's two. Everyone got it?
By contrast, if you see the red block hit the white block and make a train noise, you should say, oh, well it's a kind just like the blue block. And blue blocks do interact with yellow blocks. Blue blocks do make noise. So if you see the white block go behind the curtain [INAUDIBLE] you don't have to posit another variable. This evidence is perfectly consistent with [INAUDIBLE].
So a slightly complicated experiment, so I'll keep it up there for just a second here. But it's really exactly the same as the chemistry experiment you all just went through. And what we wanted to know was, children are all seeing the same evidence, and they've never seen anything behind the curtain, but would they go ahead and posit the existence of some other entity back there, just to explain away the data?
Now, we actually-- this is everything I just showed you. We actually ran a whole set of other conditions where we flipped it. We changed the color laws so they'd make opposite predictions. I'm not going to walk you through all of that, except to say it flips the conditions under which you should think of the blocks.
And what I want to show you-- oops. I didn't need to do that because I'm only going to show you the first few data. The other ones look like that.
We let the kids reach. They always got a white block on the first trial. We looked if they reached again. And the kids in the condition where the evidence has violated those initial laws did reach again. They did think there was something else hidden back there.
The reason I've walked you through all of this is I want to think about whether this is a good inference or a bad inference. The kids went ahead and they said in this case there's something else that is back there. And if there are only three blocks in the world, that's a really, really insightful thing to do. Right? If there really are only the three block that we have, they're right that there's some other unobserved variable that explains it away, and that's what they should think. In our experiment they were right. There were only the three kinds of blocks. We had to use another blue block there to get the noise out of it.
But if they're wrong, then they look really silly. You've just seen the white block hit the yellow block and make a noise. Why are you denying the evidence of your own eyes and positing something else out there? Why can't you just learn that the white block actually did hit the yellow block and make a noise?
And the really important point is there's no fact of the matter here. Right? It depends on what the world is really like. If there really are four things, and you just haven't happen to have seen that additional kind of thing before, you're going to be less likely to believe in it. And that's going to make you look like not a very good mind. But if there are really only three things, then you're going to look extremely smart. And it's exactly the same principle that lets you look smart in one case and lets you look kind of dumb in the other case. It's not different heuristics trading off with each other. It's a single set of inductive biases that just happens to cut both ways. Is that clear?
AUDIENCE: So in this experiment it was pretty easy for-- well, there was a lot of ways in which the intermediate [INAUDIBLE] and so it's a little bit maybe different than the vial experiment? Or the description at least, where it's perhaps somewhat unlikely that you would get a contamination. Whereas here there's a very large space--
LAURA SCHULZ: So, for instance, had we gotten rid of the curtain and actually [INAUDIBLE] all the kids would have slipped up. Right?
AUDIENCE: Yeah, maybe if the curtain was shorter.
LAURA SCHULZ: Every one of these inferences is predicated on a whole bunch of information working together. Right? So it is clearly the case that the evidence is super strong, and it's very hard to posit a way that another variable. But [INAUDIBLE] you're much more likely to actually have to change your theory, and say, actually this overturns what I believe. There really is something else out there. If the evidence is a little weaker, or your theories are super strong, then you should not. So all of these are graded. All of these depend upon inference. We could've easily gotten rid of these results by removing the curtain.
AUDIENCE: Trying to wrap my head around [INAUDIBLE]. So why would we assume that there's a hidden object. It's simply basically a collision between the red and blue blocks would cause the music. So if it's just a contribution of the blue blocks together with a collision with whatever block, it doesn't have to be--
LAURA SCHULZ: No, they know that's not true. Right? So they know that red and yellow don't do anything. They know that red and blue make a train. That red and yellow do nothing. But they're selective to the kind of thing that they are. It's not any kind of collision [INAUDIBLE].
AUDIENCE: [INAUDIBLE].
LAURA SCHULZ: Right. So we did this in a directional way, so we didn't select the order of them.
So what I want to suggest with all of this is that there are lots of reasons, as I said, to think that children are using a bunch of very fundamental epistemic principles that we think are startlingly accurate that let them get the world right in a lot of ways. That doesn't mean that they always do get the world right in all of those ways, of course.
And when I was speaking to philosophers, we talked about this as bound rationality. But nothing less than the term rationality here. [INAUDIBLE]. It is really true that your information, your world knowledge, and the mere fact that these inductive biases themselves are going to affect how well, how accurately [INAUDIBLE] the world, even if all of these principles are in play. And there's lots of good reasons to think that they are.
This is where I'm going to end this talk. There is another half hour that, depending on what we do this afternoon, I may or may not get to. Where having showed you that kids understand the information that's valuable to selectively explore in ways [INAUDIBLE] for information gain, information is also costly. And children are sensitive to those costs, and that affects not only how they learn from others, but also how they learn about others. And if we have time this afternoon, I'll try to get to that as well. Thank you.
[APPLAUSE]
AUDIENCE: [INAUDIBLE].
LAURA SCHULZ: I think it's a hard question to answer in exactly that form, because of course what we have are computational models, again, for how for any given set of problems you might be able to solve it by what kinds of representations might be at play. And we have lots of reasons to think that limitations aren't working when their attention or other thing might limit what you can ask a child at any given moment. The deeper question would be, do you actually change the computational architecture, basically, that's available to a child as this information processing. I've referred to information processing as involved knowledge [INAUDIBLE] things that feed in to a set of fundamental principles that are essentially constant for development. Right? So you could just sort of say well, these basic probabilistic program and principles that are in place very, very early, and then what changes is basically how long those programs are. How complicated they are. How many iterations they have to run over, things like that, what you put into a specific nature of the representations. But the basic architecture is in place very early. Another thing you could potentially say is that effectively you learn to program yourself and as these abilities change, you actually rewrite controls under which you construct this underlying [INAUDIBLE].
AUDIENCE: [INAUDIBLE].
LAURA SCHULZ: Another way I think that people have talked about this is, like, do we know the algorithms that children are using to implement these things? Do we know particular processes are at play? And the answer is no, I think. We know some cases there's some experiments where people have suggested, look, there's one algorithm and it would generate this kind of if data if it's doing this, and here's another one that would do something else. And try to distinguish experimentally [INAUDIBLE]. But for most of the kinds of developmental experiments that have been run, the answer is no.
AUDIENCE: [INAUDIBLE]
LAURA SCHULZ: We're trying to get there.
AUDIENCE: [INAUDIBLE].
AUDIENCE: [INAUDIBLE]. But I think his conclusion was that we're bad at teaching as opposed to learning is hard.
LAURA SCHULZ: Probably both of those have some elements of truth. I think that there's a lot of reasons to think that learning is hard. In some of the experiments, you will just see what in some terms looks like you're giving kids good evidence against a belief that they have [INAUDIBLE]. For many people, one simple thing says, why not? Right? How much evidence does it take under what circumstances? What are they doing with it?
And there's lots of very good reasons to think it shouldn't overturn. But the problem there really are a lot of cases where you shouldn't change your mind. And telling the difference between those cases where you should [INAUDIBLE] and those cases where you shouldn't is, I think, actually, really, genuinely hard.
AUDIENCE: [INAUDIBLE].
LAURA SCHULZ: And I think that, in a way, this whole point of learning not being deductive. Right? About it being probabilistic inferences that you can get the world wrong. And that those bets that you place are hard to overturn. If you're going to bet one way, that means you're not betting another way.
I have actually a bunch more, if there's time too, I can show you some other examples where something that's a very, very sensible thing to do and consistent with the data can nonetheless get you stuck. And that is clearly true in science and that is clear here, too.
And I think something that I didn't talk about now, but maybe later, learning is also costly. Right? You have how many years of graduate school? What are you going to do with it? Right? How much time or resources? How many grants are you going to write to do which kinds of experiments? That makes learning hard. There are resource limitations both internally and externally, and choosing where the greatest potential for information gain is, knowing that in advance, there is [INAUDIBLE]. You don't know what the world is like. You don't know what's going to pay off. Right? So there isn't a fact of the matter about what is best. There's just an attempt. And sometimes that's going to pay off, and sometimes it isn't. And that is part of what makes learning hard-- and irreducibly hard.
Associated Research Thrust: