Developing Intelligence Mission
December 5, 2022
November 4, 2022
All Captioned Videos Advances in the quest to understand intelligence
Joshua Tenenbaum - MIT BCS, MIT CSAIL, MIT Quest, MIT CBMM
Rebecca Saxe - MIT BCS, MIT Quest, MIT CBMM
Laura Schulz - MIT BCS, MIT Quest, MIT CBMM
PRESENTER: OK. So again, I'm introducing the Developing Intelligence team. I think today, we have Josh Tenenbaum, Rebecca Saxe, and Laura Schulz presenting on behalf of that team. And maybe also Vikash Mansinghka.
So I'll turn it over to my colleague Josh, who's a professor in Brain and Cognitive Sciences, and one of the scientific directors of the Quest for Intelligence, Josh.
JOSHUA TENENBAUM: There's going to be four more talks today. Three, specifically under the heading of Developing Intelligence by me, Rebecca and Laura. And then Vikash, who's a key part of this mission, but is also heading his own new mission, will be talking about Scaling Inference.
As many of the speakers have done, including what you saw when I talked in the morning, we think about what we're trying to do in natural intelligence, often by reference to what we know how to do and understand on the engineering side. And the gap that I highlighted in the morning and that we've seen in many of the other talks, is this one here.
It's that, while pattern recognition is, as a driver of AI technology, an amazing tool, and it's certainly part of human intelligence, human intelligence is so much more than that. It's all the ways in which our minds model the world and ourselves in it, and all the things we can do with that.
So all the ways we explain, understand, imagine, plan, solve problems, learn by building models, and doing this all collectively and socially. Sharing our models to learn and to make joint plans with others and to grow knowledge culturally. So if we had AI that could do this, it would be amazing. If we understood how humans do this in computational terms, that's what we're trying to do.
And we are far from that point, at least at the level of maturity that we have been in pattern recognition. We're not even at the level of a four-year-old child. We're not even at the level of a 1 and 1/2 year old child.
But imagine if we could get there. Imagine if we could build a machine that grows into intelligence the way a person does. That starts like a baby and learns like a child. This may not be our only bet for building real, general AI, but it could be our best bet. Much as [? Mehrdad ?] said before, there's exactly one example in the known universe of a learning system that reliably, reproducibly, safely, grows into full human intelligence, starting from much less, and that's it.
So if we could understand that, and even make small steps towards that in engineering terms, it would be extremely valuable. But most fundamentally, it would be a way to understand where we come from, where our own minds come from and all the value that that would unlock. So that's what's motivating our team.
And again, there's Rebecca, Laura and Vikash over there. Some of the robotics people like Leslie and Russ Hedrick have also been key team members. As well as other people who have been part of CBMM. And our collaborators outside MIT, like Tomer Ullman and Liz Spelke at Harvard, and Danny Gutfreund at IBM.
Now, it's a great team, but we're far from the first people to think about this as a grand challenge. In fact, it goes back to Alan Turing and really all the great computer science and AI researchers, have posed some version of the challenge over the years. Turing put this out there as a route to building a machine that could pass the Turing test, that could achieve general intelligence.
He suggested, "I don't really know how to do this, but maybe my best bet," he said, "was instead of trying to build an adult machine, to build a child machine. And teach it like the way we teach children." Why? Well, I don't know. Presumably, it will be simpler.
Or as he put it, "Presumably the child's brain is something like a notebook as we buy it from the stationers. Rather little mechanism with lots of blank sheets."
And that quote encapsulates why I think so many people have failed to deliver on this vision, because if your view of the way children start and grow is this blank slate idea, then it's--
Well, we see it hasn't worked. And most importantly, it's fundamentally not the way it works. And this is one of the many reasons why we think now, and with this team, we can make progress.
Turing was brilliant and he was also wise. So he put, presumably. He knew that he didn't know, but now we know. I mean, there's a lot we don't know.
But due to my colleagues like Liz, Rebecca, and Laura and many others, we have started to answer the basic questions of how do we start and how do we go beyond that. And it turns out, that each of those is much more sophisticated than Turing and many others in AI, and philosophy, have speculated.
There's a lot more built in than you might have thought. Even though you can't see it, obviously, looking at those little babies. Not apparently doing very much. And learning is a lot more sophisticated than just copying things down from the blackboard or just soaking in patterns of big data.
So the starting premise for this group is to take the insights from several decades of developmental science-- You're going to hear about some of those here-- And put those into engineering terms, in our computational models. At the same time, there's both the big questions, many of them are still open on developmental science and many of the details. So there's really this bidirectional feedback loop of models and experiments driving themselves forward.
And the team here, as I mentioned in the morning, we've actually been working on this for a couple of years now. I mean, like a lot of people, we were slowed down by the pandemic. Although, as you'll see, in some ways, we were also accelerated by the pandemic. But we're excited that we have some steps to report.
And again, thanks to the generous support of the Siegel Family Endowment. And also some big external grants separate from CBMM, like the DARPA Machine Common Sense program, we've already had some resources to get started on this.
The starting point, for me as a computational cognitive scientist, is to take some of these basic abilities that you can see even in young babies. I'll show you here two videos of 1 and 1/2 year olds. That show some of the things that Spelke calls core knowledge, especially about intuitive physics or knowledge of objects and their causal interactions and intuitive psychology, that in some form seems to be built into our brains.
These are 1 and 1/2 year olds, so they're already quite experienced. Although I had to skip going into detail on this, as Rebecca and also Nancy Kanwisher and their students have shown, many of the basic perceptual capacities that support understanding of objects and intuitive psychology-- In their labs, using fMRI, had shown in the brain being present in adults-- You can see them there in a proto-form, pretty much as young as you can look, six months, and even in some cases three-month-old babies.
So there's increasing behavioral and brain evidence suggesting that the kinds of capacities for understanding the physical and social world like this, are in important way, not completely by any means, but in some important respects, built in through evolution. So by intuitive physics, we mean things like what this 1 and 1/2 year old is doing in stacking up these cups. That means being able to understand them as objects, being able to have a goal, quite a non-trivial one, involving a bunch of objects, and being able to make sub goals and as part of plans to achieve that.
In this case, you can see he's trying to put together two things into a stack of two to put them on top of a stack of three, to make a stack of five. And he came up with that idea all for himself, including the bugs that he then has to debug. He's not just copying someone in this, and he's flexibly solving problems to make that plan work.
We've made great advances from robots, and you saw some of those from Leslie's group earlier. But we still don't have any robot that can do what that 1 and 1/2 year old could, and if we did, it would be remarkable.
On the intuitive psychology side, consider this video here from a famous study by 2 psychologists, Felix Warneken and Michael Tomasello. It's causing that red wheel of death, but there it goes. All right.
In this video, the participant is the 1 and 1/2 year old in the back. The experimenter is the big guy there. And we'll watch it again, so don't worry if you missed it. What happens here is the baby watches an adult do something that's a little bit weird, that he's never seen before, and that you probably haven't seen, unless you've seen this video. Kind of bangs up against that cabinet with these books.
And then ask yourself, what has to be going on inside the kid's heads to understand what this weird action is about? He's literally reading his mind, and you can see that when the adult stops and steps out of the way, and the kid comes up and helps him. The best part, I think, is right after here, in a second, he steps back, takes a look up, makes eye contact, then looks down at the hands. It's as if, and more than just as if, he's really understanding the intention and checking to see, did I get it.
So what I'm showing you here are sketches of the computational models that we build of the internal mental models inside kid's heads, that let them grasp what's going on in the physical world and inside the minds of other agents. This picture here is not a picture of how your brain works, it's a picture of how your brain thinks other brains works. And it might also have a lot to do with how your brain works, but the key is this is an intuitive theory of the mind.
Now, sketching the models out is just the first step. To actually build them as engineering models, we need important, technical building blocks.
Number one, is this idea of probabilistic programs and probabilistic programming. Now, you're going to hear a lot more about that from Vikash in the Scaling Inference mission later. But just one way to think about what these models are, or these ways of building models, is a vocabulary and a toolset for integrating the best ideas in intelligence that we've had over multiple decades.
So not just neural networks for pattern recognition and function approximation, but tools like probabilistic inference and causally structured generative models or Bayesian inference, and symbolic languages for knowledge representation, abstraction, and reasoning. These tools are absolutely critical, we believe, and we have a lot of evidence to back it up, if you want to build these mental models. And how they're used for online inference, not just long term learning, but to understand what computations are going on in the moment to make sense of the sparse pattern of data and infer the underlying latent causes.
What's out there in the physical world? Or what's inside someone's mind to explain what you're seeing? Or to be able to do abstract reasoning. So integrating these tools, that's one of the key technical building blocks for this approach to intelligence.
The other one is rich kinds of simulation programs. There's a slogan I like, to put in quotes here, the "game engine in your head." And it's more of a metaphor, although you can actually realize some versions of it.
But it's this idea-- And Tomer Ullman has really helped to articulate this a lot. Tomer is now a professor at Harvard after a lot of work with us here and we continue to collaborate. But this idea that tools, that have been developed in the game industry, for simulating a world to create a rich, immersive, interactive experience for a player that allow a game designer to do that without having to write everything from scratch. But just give you tools for simulating graphics, how light bounces off surfaces, how objects interact physically, even game AI systems to simulate the non-player characters goals and plans and precepts.
That those tools, which can be, at this point, quite efficient approximate ways, let's say, of simulating a wide range of physical systems. If we think of those as not just a way to simulate, say training grounds for a reinforcement learning agent, but an actual model of what might be inside a kid's head, when say, for example, he thinks about what happens if I roll this ball up the stack of blocks will it fall over or not.
Will the bird that's sitting on top fall over? Do I want that? Do I not want that?
Using this internal simulation model to figure out what might happen as a way of planning and checking our plans, I think is a powerful idea. And it's a probabilistic inference, because you're not sure about exactly what's out there in the latent variables and the physics. So running a small number of simulations, and under different parameters and making a good guess, that's where this idea of approximate structured simulation meets the idea of probabilistic model based inference.
So just working with adults, use these tools to build some of the first quantitative models of complex, visual, intuitive physics. Like taking a stack of Jenga blocks and predicting people's judgments of how likely it is to fall over. This here shows an example of the quantitative fit between people's judgments about how stable a stack of blocks is on the vertical axis and our model predictions on the x-axis. And these models give a pretty good account for this. As well as many other questions you could ask about this or many other scenes.
So if they fall, which way will they fall? How far will they fall? The same model can predict those judgments, as well. Or adding in some little extra complexity.
Suppose I tell you that the gray stuff is 10 times heavier than the green stuff. How will that affect which way the blocks fall? Notice we have pairs of towers, which are just recolored, same geometry, but they're recolored. And if the gray stuff is 10 times heavier, then it will affect how they fall, for both people and our model.
Or to work backwards. If you see something that's surprisingly stable, can you infer that one color material is much heavier or lighter than the other? All these inferences, and innumerable more, are supported by the same underlying probabilistic inference in a physical simulation model.
On the intuitive psychology side or planning, we have what we call goal inference by inverse planning. You'll hear a little bit more about this from Rebecca. And a lot of this work really has been, for the last 15 years, joint work between our labs.
So how do you look at somebody acting like this, and you see you'll see this woman in a second reaching for one of the objects on the table. Which one is she reaching for? Raise your hand when you think you can tell which one she's reaching for?
It's in slow motion.
OK. Most of the hands are going up around now, which is also about when that dashed line went up. That's the prediction of our model. That's not the empirical data from people, but what that's based on is this idea of inverse planning. Assuming that agents plan physically efficient actions.
So she has a goal, which is a source of internal reward. She understands that moving in the physical world is costly. And she, like any agent, tries to plan efficient action. So you can work backwards to figure out what was her most likely goal. Building that notion of inverse planning on top of your physics model.
And that doesn't just apply to normal situations of reaching, but like a weird situation like this. What makes this look like reaching? How can you figure out what that guy is reaching for?
Watch it one more time. You see, he's reaching over-- What's he reaching over? A piece of glass. Yeah, do you see that? It's hard to see. It's almost literally invisible.
But you know there has to be an obstacle there, because otherwise it wouldn't be an efficient reach. And the guy who's watching and is trying to help him, can figure that out, to figure out what he's reaching for, and anticipate it. And as one of our other collaborators, Shari Liu, together with Liz Spelke has shown, that idea of efficiency for reaching over something, even three-month-olds appreciate that.
So one of the things that started this research program was taking these models of adults, and then using them to build some of the first quantitative accounts of common sense in infants. So this is an example of an experiment that we collaborated on now 10 years ago or more with Luca Bonatti's lab, Erno Teglas, where they used the standard paradigm of studying what infants know, so-called looking time paradigm or a violation of expectation paradigm in intuitive physics.
Where objects bounced around inside this little gumball machine, and after a brief period of occlusion, one object emerged. And they varied the color of the object, how many there were of each color, how far they were at the point of occlusion. All these factors, which determine, according to the probabilistic physics simulation, how likely it is that one color or another will emerge.
And it turns out, on the y-axis, infants look longer at the ones which are less probable, according to our model. Now, this is not the first study to show or claim that looking time is inversely proportional to probability, but it was the first one to have any quantitative model.
And in other kinds of work. Again, this is from Shari Liu and Tomer. We take the same logic to understanding intuitive psychology, where here, for example, infants-- Again, think about inverse planning and cost sensitive planning-- When the agency declines to jump over a median barrier for one object or one agent, but it jumps over the same barrier for another, you can infer that the red one likes the yellow one more than the blue one.
And that's because they're willing to pay a higher cost. For a lower barrier, the red one will go to the blue one, but you pay a higher cost, you must like it more. And it doesn't matter whether it's going over a barrier, going up a ramp, jumping across a ridge, infants seem to be sensitive of the physical cost of action, the amount of work you have to do.
So that's the foundations. And what this mission is about is scaling this up. So how does this idea scale up to be a full reverse engineering account of cognitive development? Up to say age four? Or even age 1 and 1/2? Or even like age six months?
If you're asking that question, you should be asking that question. It's not just enough to model one-off studies. And how is that going to integrate all the core human intelligence capabilities? So embodied intelligence, language, what about moral understanding, other things that develop. Or even just objects and agents in the immediate spatial environment, or even just intuitive physics.
Even just starting with intuitive physics and six-month-olds, the lowest aspiration level here, there's already a lot of interesting findings from decades of research. Work by people like Spelke, Renee Baillargereon, many others.
I'm not going to go through any of the details, but setting up lots of little one-off experiments like this, showing infants surprising scenarios, and seeing at what age they seem to be sensitive, that gives us a rich picture of what seems to be emerging over the first year of life. And lets us ask the question of, well, can we understand these trajectories in terms of a sequence of probabilistic inference in simulation programs? Can we understand development, learning, as something like growing the program, refining the program, improving the program, adding to the program? That's the research idea.
To make this real, this is where the Quest approach comes in. Is to say, OK, we have to go through a bunch of steps. So we need to implement that virtuous cycle of growing models and testing them on much more than just one-off little studies here or there. We need to, as we'll see, design and implement a much more scalable version of the model. This is where probabilistic programming comes in.
We've made some pretty good progress on those first two steps. But then we also need to go beyond, for example, to later developing things that aren't just there in infancy, more interesting kinds of social inference, and really take this idea of program learning seriously. So a lot of what we've made progress on isn't really learning, it's really about what the starting state in intuitive physics or agents.
So one of the first things we did, and this started a couple of years ago, I think in 2019, is we built a benchmark data set inspired by those infant experiments you saw where we set up a world of objects going behind screens. So just to allow any approach, whether it's a machine learning approach or our approach, is to tune whatever parameters they needed to see the world. I mean, you can see it in about 10 seconds like that.
Then we set up various test sets of things that were not seen in the training set, but that represent the kinds of scenarios that infant experiments have done for a long time. We made many different versions. I'm showing you four different scenarios here, but we can procedurally generate many different movies of this sort. And then, we build models and test them on this to see if they behave like people.
So for example, we built a minimal probabilistic game engine model called the ADEPT model. And there's lots of things which are very heuristic and hacky about this model. It was the first one of its sort, but it implements a minimal starter game engine.
It uses actually a neural net for perception. So some of the neural nets for vision, this system uses to identify what objects are in the scene and set up one of these simulations that you can then run forward with a few guesses. To guess what's likely to happen, especially for the objects that you can't see. So that's the basic flow of information in this model.
We can show it, these stimuli, and you can see what's going on inside of it. So those little dots represent hypotheses inside the model's heads, statistical guesses, of where do I think objects are at any one time. And the model can also generate a surprise signal when its visual input is inconsistent with what it hypothesizes it should be seeing based on those internal traces.
And you can get this for all these different kinds of scenarios. Objects passing behind screens and then mysteriously disappearing, or objects mysteriously appearing. Or like this kind of setting here, this famous case, where the screen goes and passes through the ball, the ball seems to disappear. That's a surprise signal, even though you can't see it, but it's like it should have blocked the object. And then it reappears again, so there's two surprise signals.
So this is a way, for example, to test those surprise signals from the model against people. And long story short, this model does pretty well. And more generic, just pure machine learning, neural network approaches, don't do so well.
So it gives some evidence that maybe this model approach is on the right track, but doesn't tell us what's right or wrong about it beyond that, compared to just a generic deep network.
We did something similar in the case of intuitive psychology with the agent benchmark, and you'll see a poster on this if you're interested. I'm not going to go through the details, but again, a similar thing. Our inverse planning models do qualitatively better than just a generic deep learning model. Even ones which were designed to model theory of mind, but which try to learn from scratch some of the things that our model just has built into it.
And at this point in 2020, there are so many benchmarks. There's the IntPhys physics benchmark. There's this one from DeepMind, from some former colleagues of ours. There's even a new intuitive physics benchmark literally released today, or at least that's what Twitter told me. So they're just literally coming every day.
There's also other intuitive psychology benchmarks, like this one. And so in some sense, that's great. We can make a brain score for babies or something like that, or a baby score. And we are working on that with the great help of the Quest engineering team that you saw earlier.
But there's a problem, which is that actually none of the models work on all these. In fact, there's no model that works on more than one of these benchmarks, without having to be retrained. Some of the models can be retrained for each benchmark, but you don't have to be retrained for all those benchmarks. And babies see all of them out of the box. Let alone, we don't have any model that actually works in the real world, [LAUGHING] except these simulations.
So this is where the stuff that the Vikash's team has been doing, using the Gen probabilistic programming language to build this core agent. And you'll hear a lot more about this from Vikash, including what probabilistic programming is about, how it's different from other ways that people use to do modeling and inference.
But the key is, for us at least, it's a powerful set of tools for writing down programs that describe the knowledge in your head. As well as programs that describe the online inference you can make about what's going on in the world. The good guesses that that model helps you do.
And we've been building this out, as part of the team. Again, this is led by Vikash and his students, building this core agent, which consists of multiple scales of representation about objects, agents, and places. And at this point, it's quite exciting what the team is able to do.
This is just a demo fresh off the presses from last night of an agent that is using these core knowledge representations to just explore space. It sets its own self generated exploration goals. You can see the world from its perspective as it moves around, and the exploration plan it's generating. As well as the 3D world model inside its head that it's building up from that.
Now, what's cool about this, you can imagine all the things you could use this for. We're using that same model, to model what's going on in infant intuitive physics experiments. As well as actually deploying in a real robot with some of our colleagues, in this case, from IBM.
So it's an important step towards really getting this virtuous circle. And the next step is now to test this on a much wider range of physics and agent scenarios. And you'll hear about the tools that are going to let us do that from Rebecca and Laura later, as well as the foundations from Vikash.
So I just have a couple of minutes left. And I'd love to tell you about these other developments, but I don't have that much time. So I'll just tell you two little highlights.
One is, how do we go beyond this initial state of knowledge that we've captured here? Well, one exciting direction that is just deeply important to all of us as humans, as well as an area where we're really making a lot of progress, is in thinking about social cognition and theory of mind. So not just what goal does someone have when they reach, but all the inferences we make about what's going on inside someone's head from seeing what they do, or maybe what they say. But just from seeing their actions we can infer what people want, what they think, and do that with quite high quantitative accuracy.
I'm not showing you this scenario here, but compare these scatter plots to the ones I showed you for intuitive physics. Our models of theory of mind are actually the best quantitative models that I've ever had any role in building in terms of how well a model with few free parameters, if any, fits human judgments. Very high correlations, including not just for judgments about what people think and want, but what they might be feeling, what they're intending to do even when they make mistakes, a very important part of understanding other people's actions, or what they should or shouldn't have done, moral evaluation.
These are all areas where this approach to-- And especially in collaborations with Rebecca, Laura, Vikash, and a number of others, are really making exciting progress. And tracing that out in a developmental sense will be important for us.
Lastly, on the issue of learning. So again, up to now I've talked about development, but I haven't talked about what are the learning algorithms that could actually build something like a mental simulation program. If your mental model of physics is something like a physics simulating program, then you're learning algorithm has to be something whose output is that program.
In other words, it has to be like a program learning program. Or a program that takes as input the current program that I use to simulate the world some data or experience, and then updates that to a better program.
Now, you're going to hear a-- Well, I would say you probably won't hear that much, but we've heard a lot from Laura Schulz over the last two decades about what are the rational learning mechanisms that children use to learn about causal structure, whether it's intuitive physics or anything else from experience.
Again, children are remarkable in their ability to learn from just a little bit of data and update their model accordingly. But algorithmically, it's a really hard problem. It's much harder than the problem of learning in a neural network in terms of the search problem.
One of the reasons why artificial neural networks are so compelling as technology and so scalable for big compute platforms is this idea you've heard probably, the network is end-to-end differentiable. So that means there's a smooth error landscape where you define an error function about how well you're classifying some patterns, let's say, or predicting the next sequence in an image or word.
And in a space that isn't two dimensional, but is now literally billions or even trillion dimensional, where each dimension is a weight in the neural network, the function that describes how poorly the network is doing is smooth. So you can just use calculus, basically, multi-dimensional derivatives, to find the direction of steepest descent and just stochastically move down that. And if you're willing to wait long enough and have a big enough network, at least you'll get somewhere useful.
But search in the space of programs, doesn't have anything like that nice geometry or topology. But somehow, children manage it. At least that's the hypothesis. And we're really trying to understand how.
What kind of learning programs can build and refine and find better programs? So one place where we've made some progress on this recently is the PhD thesis work of Pedro Tsividis and a number of other colleagues, who have been looking at learning Atari game. So again, as you probably are quite familiar with, these classic Atari video games--
The fundamental challenge here, if you want to build a learning algorithm that can build algorithms, is how do you solve the hard problem of search? How do you generate the data? How do you navigate yourself through the space of possible models?
So we've worked on this in a number of contexts, including learning to play Atari games or as we say, learning Atari the human way. That's about when the light went off last time, just checking.
Classic Atari games we're one of the first places where deep learning, and reinforcement, learning together, made some noticeable improvement to the state of the art. But as these learning curves show, and for many games, like this game here, the game Frostbite.
So if you've never seen this game before, you can watch a person playing the game, and get a pretty good idea of what's going on in the game, what you might do, what you shouldn't do, maybe how you might score points, or even win the level. As you see this player doing.
And humans can watch somebody playing the game, and literally in a minute, learn how to play the game. Or they can just do it on their own. And when people do it on their own, they tend to learn within about 5 minutes.
But the standard deep reinforcement learning algorithms can play this game for a 1,000 hours and basically make no progress. Get barely better than random. Now, this is an extreme case. But in general, as you'll see, it's almost in the nature of deep reinforcement learning, or just reinforcement learning, as it's usually used in AI, is that it starts off with a long phase of just doing random things and using that to generate data for your learner.
But because it doesn't really have a mental model of the world, and because it has such a weak notion of exploration, that's why it's so inefficient. But humans are much more efficient model builders and explorers.
So Pedro Tsividis, the student whose thesis work this was, he built an ancient architecture called EMPA, which is similar in some ways to [INAUDIBLE], but it really emphasizes both Bayesian model learning, and model based perception and planning, and a rational way of exploring, setting goals for itself to learn how causes work by trying to make things happen. It also, like me, when you think about fire alarms, assumes if something bad happened once, don't do that again.
So this system is able to do that kind of thing, to figure out very quickly, even from just one example, if I touch something and I die in the game, don't do that again. That allows it to learn to play Atari games extremely quickly. This system, learning to play a few classic Atari games, this is literally the first time the system is playing these games like Pong, or Breakout, Space Invaders, other games.
It's not perfect. It's not very good. It dies a little bit. But it very quickly figures out how to play the game.
And here, we can actually benchmark it with human performance. So the green curves show a human player playing one of these Atari games and what their score is over the first 20,000 game frames. That's like 5 minutes of gameplay. And EMPA is usually in the same range. Not always, but usually in the same range, sometimes a little better, sometimes a little worse.
In contrast, do you see that gray bar in some of these curves? That's just random. That's just pressing buttons at random.
And standard reinforcement learning algorithms, even the best ones, are basically no better than random. Occasionally, they're a little better than random. They're always worse than humans or this EMPA architecture. And usually, they're basically just random, because they're designed to do that.
You can wait a long time and they might do fine in some of these games. But that ability to very quickly and very flexibly learn to do just anything you can do, that's what we're trying to capture in core human intelligence. And at least we've made a step towards that in a system that learns a very simple game engine program from this rational exploration behavior.
How do you go beyond this though? How do you go beyond learning core knowledge to all the things that humans can learn over their lifetime? So this is really just mostly a step looking forwards, but this idea that in a recent position paper we called, "The Child as Hacker." We mean the MIT sense of hacker, not the bad guys who break into your email and steal your credit cards, but creative exploration of code, and making cool things, making things awesome.
This idea, that in some sense, not just core knowledge, but all the knowledge that humans learn over a lifetime of experience in different domains and that we build culturally, it's all some form of code, in a sense. Think about all these different domains of human expertise. The idea of some kind of languages of thought, or programming languages, is really our best universal representation for all of human expert knowledge. And then learning is coding, basically. And all the ways we make our code more awesome.
So as a long term step of what we're really interested in is, how do we build learning algorithms that can do all the things that hackers do when they're making their code more awesome? Just one, again, small step towards this.
In the recent PhD work of Kevin Ellis, he built a system called DreamCoder, which can learn to write programs in many different domains. Here, I'm showing a number of different ones. And it does it, in part, by making up problems for itself to solve. It's called DreamCoder, because it dreams up, imagines problems.
For example, in a drawing domain, if you remember Logo, the simple turtle drawing programming language where, what it is to draw. You might draw all sorts of fancy, recursive structures like ferns and things or snowflakes, but you start off with just the simplest programming language of pen down, pen up, go forward and turn.
So initially, if I make up problems for myself to solve, I just randomly try out things, I get drawings like that. Not very interesting. But after 20 cycles of learning, where there's both an internal dreaming process, and also trying to solve whatever problems are out there in the world and bootstrapping your way to being able to build expertise that lets you solve more and more problems.
And doing that by both like growing your programming language and also learning, in this case, pattern recognition actually using a neural network. But not to recognize patterns in the world, but to learn patterns in your own thought that you experience while you're making out problems for yourself in dreaming.
After 20 cycles, the dreams that this thing comes up with are quite interesting, more interesting, at least. And they represent its growing domain expertise. And you can see the same thing in many of these other domains.
So again, it's just a small step towards the idea of a learning algorithm that can learn to write code. In this case, by learning its own DSL or its own domain specific language, and how to use it. But it's part of, it's a first step, in what's really the long term virtuous cycle that you'll see much more from Vakash and from Rebecca and Laura a little bit, and not just today, but also over the next period of the Quest, where we're trying to build models that can capture how we start and also how we learn. And get those into a virtuous cycle of experiments on both what infants know and also children's learning.
Turn it over to Rebecca and Laura to talk about some of the work they've been doing, both on building that virtuous cycle, and some of the questions that we don't yet know the answers to.
Rebecca Saxe, my great friend and colleague. We have our offices right next to each other. And I've been so grateful to be her colleague for so many years.
She's a professor in Brain and Cognitive Science. She's also an Associate Dean in the School of Science. And what you've been seeing in the work that she talked about is part of a general spirit that she's been trying to lead in our department, in MIT, and beyond. Just in terms of how do we make scalable, reproducible, cumulative, progressive science. And we're really excited to be on the forefront of that here in this project.
REBECCA SAXE: Even before I saw all of the amazing talks today, I was already thinking, it's really clear that if there's a virtuous cycle that's going to happen here where computational models articulate our best hypotheses for the scientific data and are driven to improve our science and the science likewise drives the models, if that's going to happen, the empirical measurement has to keep up with the incredible and accelerating progress on the models.
So I knew I was going to say that even before I saw the talks. I feel this much more strongly after all of today, that there's a challenge for how empirical science can keep up with the pace of progress on the models. And many people across the world are making different bets for what it means to accumulate empirical evidence.
For example, about babies. With the initial state of babies, the world is what they learn from and how they learn. How could we accumulate empirical evidence that could keep pace with the model of progress?
Many people around the world are betting that what we need is large naturalistic data sets where we instrument and record infants experiences, in their homes, or in their lives. I do think that's incredibly valuable. But I think, as many talks said earlier today, there's too much of that and not enough of another thing.
So I'm going to articulate an alternative, which I think we're pursuing more here at MIT. So our approach is rather than accumulate large naturalistic data sets, we need to be able to scale up the experiments that we can do in infants.
So the data that Josh showed you earlier are the basic observations about infant cognition that inspired this work come from experiments. And I think we need to keep doing experiments. We need to put babies in non-natural situations where we're deliberately testing the predictions of hypotheses and models.
So why? One thing is a huge amount of what we know about infant knowledge comes from their reactions to impossible events that would never happen in their natural world. So for example, a lot of what we know about infants knowledge of the physics of the world is from having them react to impossible things like floating balls.
These are a recent replication in my lab of a well known finding that babies, this is 7 to 9-month-olds, who are interacting with the world already, look longer at the floating ball than they would at a ball that was supported by a surface. This is even true of 4 to 6-month-old infants who can barely interact with the world at all, already expressing surprise at a ball apparently floating.
But to know that, we had to be able to do experiments. We couldn't just record them in their homes, because in their homes balls, never float. That's the point.
Another thing that experiments are needed for is to disentangle the natural confounds that infants experience. So something I'm interested in my lab where we mostly focus on intuitive theory of mind.
So I'll just briefly tell you about this research program. So Josh told you-- For 15 years now, Josh and I have been working on how infants, and indeed all of us, understand other people by inferring that other people are basically rational. That if you work harder or travel further to pursue a goal, you must want it more. So if you go all the way out the building, down the street, to get a milkshake, you probably like that milkshake better than the free coffee that was right in front of you.
But if instead of going all the way across the street to get a milkshake, what you do is lean over and take a sip out of your friend's milkshake, what we learn from that is not how much you like milkshakes, but how much you like your friend. And this is a general case of a model that we can use how people act, not only to infer how much they like things, but how much they value people and in what ways they value those people, and how those people make them feel.
So that's a big idea. But this is not only true of us as adults, it's also true of infants. So in a study that recently came out of my lab with Ashley Thomas, we compared what infants inferred after seeing perfectly positive interactions between one character. And then with the same central character had an interaction with another person, this time involving being willing to share saliva, mouthing the same food.
And then asked the babies to predict later on if that same central characters in distress, who do the babies expect to comfort that character? So what we found is that toddlers look first to the character who shared saliva and keep looking at that character like, you're the one who's supposed to help. And this is not true, if we swap out a different puppet.
So in general, we need to do this kind of experiment, where we put babies in non-naturalistic situations that don't occur in their real lives and measure their expectations.
The challenge is that each one of those data points in those plots I showed you take an epic amount of work, and it's vastly slowed down the research progress. So the traditional workflow here involve, first creating those stimuli that takes a lot of time, then recruiting families to come to MIT, bringing them to MIT, settling them down for the experiment, measuring the looking time, then hand annotating those videos for where the baby was looking. And then, a conservative estimate is that we spend an hour and a half per data point to get those data points.
That's part of the main limitation on this science. So what we have in mind now is an automated workflow for this entire thing that could be much faster and therefore more scalable. A huge effort, led by Laura Schultz in her lab, has been to replace the manual process of recruiting participants and running them at MIT with an online platform called Lookit, which is already massively scaled up in terms of the number of families registered to participate in infant science.
A project in my lab and many other labs has been moved-- this was accelerated by the pandemic-- from in-person testing to webcam based testing. So that we could test infants in their own home. We now train parents to be the experimenters, to advance the experiment in response to their infant's behavior. And then we can validate that parents are pretty good. So the green bars are parents doing as well as experimenters at advancing the experiment in response to their infant's behavior.
And then we replace human annotators of the resulting data with iCatcher. This is again, a massive multi lab collaboration, led at MIT by Sherry Liu, which uses machine learning to code, to identify infant faces in online videos.
This is video from a webcam. You can see a small green box has identified the baby face, code their gaze on the screen, and also identify the point at which the baby has lost interest, so that the study can advance. And iCatcher, at least for the good quality data, is very highly coded with the human annotators.
The result is that the total time per data point in this workflow could be at least 12 times faster than the traditional workflow. Possibly even faster than that. And so I just want to say that in terms of the experiments, keeping up with the models, we have some hope of pushing this virtuous cycle forward through building platforms that accelerate the empirical contributions to this entire research program.
This is a massive project involving many people. So I just want to make sure to acknowledge them. And you can find out more about the progress that we're making in the poster session both Gal and Shari will be presenting.
JOSHUA TENENBAUM: And again, these were projects that the Quest engineering team, like you heard from Catherine before, contributed really key to. And we've been incredibly grateful for your guys hard work on that. And the support of the Quest in making that happen.
And last, and very much not least, my other great friend and person who has the office on the other side of me, Laura Schulz. A professor of Brain and Cognitive Science, and an associate department head in BCS. OK, Laura.
LAURA SCHULZ: Thank you very much. In 10 minutes, last time to try to say a few words about how children learn and how we can learn from children.
So many hours ago, Josh, Tommy, and Jim started us off with bets that we might make in intelligence. And one of the bets that has paid off greatly in my field is that Piaget was wrong, that Alan Turing was wrong, that hundreds of years of philosophers were wrong. Children do not start out, babies do not start as blank slates. There's never a stage at life at which infants are only sensory motor learners or when children are only concrete learners.
That actually, the way babies and children learn is in some respects startlingly similar to the way we generate new knowledge in science. They start with abstract causal structured representations, world models. They evaluate those models based on the evidence they observe. They selectively explore evidence that is surprising or confounded. And they generate and learn from informative interventions.
This shouldn't be that surprising, because science is of course a cultural cannibalization of human cognition quite broadly, and helps us learn any number of things. So the work in my lab has been largely an effort to try to bridge the gap between the messy sometimes chaotic behavior or seeming behavior of children. And our best computational models of learning, in a way that will let us do some of the work that Rebecca and all my colleagues have been alluding to this virtuous cycle of going from computational models to cognition. In a way that not only lets us make precise quantitative predictions about children's behavior, but lets us actually understand much better how and why we learn the way that we do.
I'm going to show you one tiny example. And compared to all of the enormously sophisticated work that you've seen across all these talks, I'm going to show you something dead simple.
We gave children a small task. We told them we're going to pour one of these tubes of marbles into this box. We're going to do it behind a screen, so you don't know if it's the nine red marbles, or the three green marbles going into this box. But we're going to give you the box and you can shake it and you can try to guess. OK?
So let's say nine marbles actually go into the box, children here clankity clank clank when they shake it and they can make a guess. Very simple, you can do this in preschool.
But we can also compare these children to a group of children who we tell them either we're going to pour these nine red marbles or these eight green marbles in the box and you get to shake the box and find out what's inside. Now, let's say nine marbles are actually in the box. From a sensory motor perspective, it's the same box. If children are shaking that box just based on what they hear and they feel, they should shake it in exactly the same way.
But if children instead are shaking that box not based on what they hear, but what they don't hear. An alternative hypothesis, a simulation about what that evidence would be like, and how hard it would be to discriminate from what they actually hear. Well, this is a much harder discrimination problem, and children should shake that box much longer.
So I'll show you the task again. It's about as simple as I just described.
All right. So remember, could either be the one green or the eight red. And when you know, you can put your answer right there. So go ahead and play.
One. OK. All right. So remember, there could be four yellow or five blue. And when you know, you can put your answer right there. So go ahead and play.
Very simple experiment. But we're at MIT, we can put lots of different marbles, different tubes, different contrasts in the box. Ranging from really simple discriminations like nine versus one, all the way up to complicated ones like four versus five.
And because I have sophisticated computational modeling colleagues and people who work in perception, we can use models of signal detection to say exactly how discriminable these contrasts are from each other, quantitatively. That's what you're going to see on the x-axis. On the y-axis, you're going to see how long children shake that box.
And really strikingly, we preregistered predictions for some of children's shaking behavior. What you see is an exact line up between the difficulty of the discrimination problem, and how long children explore, how long they shake the box. Again, they are not shaking based on what they hear, but by what they don't hear.
Their exploration time is independent of the number of marbles in the box. It checks precisely with the difficulty of the discrimination they are trying to make, which suggests everything you've been hearing about. Really rich simulation engines for generating ideas about what the physical stimuli would be like, what they will understand, and something like an intuitive psychophysics, and an intuitive power analysis, knowing how much data they would need to make this kind of determination.
So this is great. Everyone believes play and learning are connected. And this seems like the best evidence I'm ever going to get for how they might be for children's ability to explore more as there is more uncertainty. And the kind of thing we can precisely model with ideas about expected information gain.
And the only problem is after I left the office, I would go home to something like this. So this is my daughter. And what she's doing is absolutely mundane, repeated many times a day, in many households, all over the world. And it looks nothing like a straight line to learning. What it looks like is rich, arbitrary, hard to predict, idiosyncratic goals that are much more complicated than reaching for grapes or even shaking marbles in boxes.
There are many goals. There a million plans. And I cannot predict from one moment to the next what she's going to do or what another child would do with the same stimuli. What it resembles is nothing so much as a microcosm of human cognition, broadly. Where we have rich idiosyncratic goals.
Everyone in this room is interested in how the mind and brain work and how we could engineer it, but all over the city, and state, country, people have very different goals and are working on very different problems. Ranging from writing the great American novel to decorating strawberries to winning hot dog eating contests. And there is a real question about what kind of mind generates and can generate such a proliferation of goals, and what that ability does and says about the kind of cognition that we do.
The really harder problem is we're really talking about scaling human intelligence. Lots of things see, lots of things reach, human intelligence. How do we make scientific progress and behavior that looks like what children do and what adults do?
And the answer to that is, I do not know. I can't give you the answer, but I can give you a tiny start and a little bit of a paradigm we're trying to use. That's going to look much reduced, but still portrays a lot of complexity.
This is an eight by eight grid. We call it a button board task. We just put it up online. And we told adults, go ahead and explore this, do whatever you like, we'll pay you if you do it for at least five minutes on MTurk.
And in there, we've hidden some sounds. So if they hit certain buttons, they're going to find that some of those buttons quack and some pop and some make tones and dongs. So they can explore and exploit those sounds. And I'm going to show you what an adult did.
So they explore pretty efficiently pushing adjacent buttons, then they find some sense. And then they go on to do all kinds of things. To set up their own space of arbitrary, idiosyncratic, hard to predict goals, for which they can make very consistent plans. And they go on for twice as long as we pay them. One went on for four times as long.
So at cost, because it is rewarding. They invent new goals on their own, and they develop them, even in an eight by eight grid. And this is the norm. 70% of our participants created brand new designs and goals. There were no clusters that's about even uniform distribution across every kind of design you can imagine in both adults and children. In fact, in this case adults did more.
So it's not just a behavior specific to childhood. So we are beginning, you've seen from DreamCoder, you've seen some other work here, to begin to make progress on how we might be able to formally think and model even behaviors like this, which again, are just a tip towards the richness of cognition. But I think that we can start making progress there.
So that's a little bit about learning. How can we learn from children? Rebecca said a lot about it. I'm just going to echo a lot of what she said.
The traditional way, if you wanted to test babies, which is babies, children, and adults in my lab. But for babies studies, you really are looking at looking. It's the behavior they can do most reliably. And you show babies unexpected event like a ball rolling downhill or an unexpected one like the ball rolling uphill. And if the babies look longer at the unexpected event, then you want to say something like, oh, six-month-olds understand gravity, which is great.
But there are a few problems. You never get 100% of your babies doing something. So let's say a statistically significant majority do it. What does that mean? Does that mean actually all six-month-olds have this ability and some is just noise in the data? Babies sneeze and are inattentive.
Or does it mean there's a real developmental difference between the babies who are succeeding and the babies who are failing? Many other possibilities. If they expect the ball to roll down an inclined plane, do they understand gravity?
If you drop the ball in midair, do they expect it to go down and not up? What if it's a bottle, not a ball? On and on.
Traditional laboratory experiments can't answer these questions for all the reasons Rebecca said. You have to recruit the baby, they have to find a parking spot, they have to bring the baby up, sit them in the lab, and then the baby has to not fall asleep, and not fuss out.
So it's very, very difficult to get even a single data point, let alone repeated measures on children to get these measures. What can we do now today? We can go from taking something like this, to something like this.
Oops. Yeah, we can test very, very many stimuli. And you can test them on very many different aspects of intuitive physics. And that's, in fact, what is going on right now. So when we want to talk about machine engines in the header, intuitive physics, we have the largest sample ever collected of infant behavior.
Up to 12 sessions per baby. A 1,000 study sessions on this kind of behavior. The kind of data we just never were able to collect before, which will let us have detailed maps of conceptual change and data rich enough for testing the predictions of quantitative models.
That's only one reason to put studies online. There are many, many more. And that is what we have done.
When the pandemic hit, this is open source, open access platform, we launched. We had Lookit, which is a platform that allows automated testing. You don't have to schedule the appointment. You can go and Google and test your baby whenever you want or your child or all the way up through adolescents and adults on that system. You give consent through the video camera.
We merged it with another system we'd put up, Children Helping Science, which was just a Square based page. We launched it during the pandemic, because every developmental lab in the country had to shut down. So everyone was going online. And we wanted all the families to be in the same place, because a three-year-old who participates in a study here, can do one at Stanford, can do one at the Max Planck, can do one anywhere in the world.
So that's what we went ahead and built there. There are 90 labs from 70 institutions across the US on there, many other countries. And this is one of the cases where the horrific pandemic made a vast difference for something like telehealth, that was long overdue.
We have 8,000 participants on their right now, families. Nearly 1,000 scientists on that lab.
So that's a little bit about how children learn and how we can learn from children. So I think there's a real possibility now of understanding how we start as babies, and how we learn as children, and connecting them to our best models of engineering, AI, and innovation. It's a bit of a moonshot right now, but I think we can reach for the stars.