Roadmap for the day
December 2, 2022
November 4, 2022
All Captioned Videos Advances in the quest to understand intelligence
Josh Tenenbaum, MIT BCS, MIT CSAIL, MIT Quest, MIT CBMM
Jim DiCarlo: Next, I'd like to introduce my colleague and friend, professor Josh Tenenbaum. Josh is a co-founder of CBMM, along with Tommy, and one of the scientific directors of the Quest. And Josh is going to give us an overview of the program and map the course of what you're going to hear throughout the day. Josh.
JOSH TENENBAUM: Welcome, everyone. It's so great to have you here joining us for what is a very exciting day. And I'm going to try to give you both a roadmap for the day, as well as a sense of some of the roadmap that we, in the Quest leadership, are thinking about what we're doing and why. So when we think about what we're going to be doing in this big enterprise, as you've seen from Jim and Tommy, we're building on a great success.
There's one success story which many of us here played a role in, along with a lot of people outside MIT. This success story of deep learning in neural networks. The remarkable idea that starting from just the mathematical abstraction of how a single neuron works and then scaling that up to bigger and bigger networks. And running that on bigger data sets can build useful computer vision technology that you can deploy in the world. And that actually might tell you something about how the brain works, how the first bit of vision works in the brain. It's remarkable.
But then, looking forward, as Jim and I like to say, look forward with a combination of humility and optimism for how much more there is to do. Because as Jim said, we still don't have any general-purpose AI. Anything that has the flexible common sense understanding of the world and what you can do that can do every one of the things that each and every one of you can do, without having to be built especially for just that one thing by a big team of engineers. And we still don't know how the brain works.
So what we're mindful of is these questions. Where do we go next? Why and how? And that's what you're going to see today. Very exciting. We have a packed program for you. And I want to try to give you some sense of what the day is going to present to you. And again, how we're thinking about how we're spending our time and resources in the Quest. So think about the Quest as a portfolio of bets on ways to answer these questions, what next, why, and how. There's some big bets and some smaller ones.
Now academic research at a place like MIT or any university is traditionally thousands of small bets distributed across all the faculty labs, all the PhDs, and post-doc projects. And this approach has proved its value, as we've seen over and over, where a breakthrough discovery in one lab or one person writes down a single equation, and it can change the world. And we're not going to stop doing that, we're going to keep doing that. But in the Quest, we also know we have to do things a little differently in several ways.
We have to be able to make-- or first of all, we have to see all of the bets as part of a single bold proposition like Jim laid out. This idea that the science and engineering of intelligence work better when they're tightly coupled. And that ideally, the very same models are both our best scientific understanding of how the mind and brain works and also the basis for AI technology. And then second, that we have to make a few big bets in addition to many small bets.
We aren't going to stop making the small bets on elemental research in brains, minds, and machines. But in order to really reach our goal to really understand and build human intelligence, we need to make big bets. So you're going to hear today about both big bets and small bets and also bets at different points of maturity. Some teams which are just getting started, and others which have been going for several years and are already producing.
Not just great research but platforms and products for really scaling up our research and the virtuous cycle between science and engineering that many others outside MIT are already using and joining us to help build. So in particular, you're going to hear about three big bets, three missions that have their start in some form in CBMM, which the Quest is now ready to supercharge. These are Embodied Intelligence, Language Intelligence, and Developing Intelligence.
But these themselves all represent different and synergistic lines of attack on a single grand challenge, which you could call the problem of core human intelligence. I like to define it this way. What is the basic ability to understand the world and yourself in it, along with all that you can do based on that understanding? That develops naturally in every human child by age four or five in every culture independent of formal education. This is what some people mean by common sense. It's certainly what I mean by it.
And each of these missions represents a take on that. Embodied Intelligence, in a sense, is the most direct one coming out of what you've seen from CBMM and the success story that Jim talked about. Because it starts with perception, but not just vision. It's all the ways that we perceive the world around us and ourselves in it. And what we can do with our bodies and our minds to control our bodies. And thinking about the plans we can make and the plans we can make together with others.
Language Intelligence is about this remarkable and truly unique ability of human intelligence to describe what we're thinking, to externalize it to share with other people. And to learn from them, just like what we're doing right now. And then Developing Intelligence, which is understanding, in some sense, maybe the ultimate origin question, where does this come from? How do our brains start in babies and grow like children?
The Developing Intelligence mission, or I should say the Embodied and Language ones, are really just getting started but already doing really exciting things. Developing Intelligence is maybe the most mature one because it started several years ago at the founding of the Quest. And inside CBMM, with generous support from David Siegel and also the DARPA Machine Common Sense Program, which we helped to get started and is also supercharging this idea to many places outside MIT.
And you're going to hear about progress there, as well as a new mission, the Scaling Inference mission. Which has emerged out of the Developing Intelligence one as we realized that the platforms we've been building to make our basic computational models of cognitive development in children are actually ready to scale way beyond that. To enable things that we can do in all the other missions in the Quest, and even well beyond the Quest, to researchers and engineers outside MIT.
So we're very excited about that set of things. But also, new smaller bets in the space of core human intelligence that connect to the bigger picture, you're going to hear those too. And then also big bets and small bets that we like to call looking down, looking up, and looking forward. So that means in contrast to just the individual single organism of a human mind and brain or like a robot, looking down to the basic elements, the groups of cells and circuits that make that machine work.
And then looking up to groups of people, collectives, organizations, that in some sense, are really where human knowledge in its biggest form, like take science, where it comes for. Because no one of us figures out really anything for ourselves, but it's the collective processes that build knowledge across individuals and across generations.
So that's, basically, the picture for the day. There's these three sessions. One more in the morning. Two in the afternoon, which roughly correspond to the core human intelligence missions that are looking at the adult state, but the things that are already there in any child. Then looking up, down, and forward to what our next big bets might be. And then also, in the end, closing with where it all comes from and how we can scale it up.
Now, before turning things over to my colleagues in Embodied Intelligence, I just want to dig in a little bit more to this big bet of core human intelligence. And say a little bit about why this now, why here, and how we're going to do it. That gives you, again, a sense of how we are thinking about how we use our time and resources in the Quest. How we make bets, and also to highlight some themes you're going to see for the rest of the day.
So why study this problem? Well, it's instructive to consider the contrast between today's remarkable AI technologies and something even more remarkable, which is human children. And I want to take two case studies. First, autonomous driving, where again, there have been really amazing real-world deployments from companies like Waymo, Cruise, Tesla, using the Mobileye technology Tommy had talked about, and so on.
And yet, after more than $100 billion of investment, we are still far from solving the full self-driving problem at even the level of a human 16-year-old. Who, as we all know-- we've all been them, and some of us have them. Within just a few minutes, you can basically figure out how to drive. And within a few weeks, feel comfortable, maybe even after a few months, be safe.
And it's not just the 16-year-old, but even the four or five-year-old. Here are a few pictures just taken from the genre of YouTube videos. You can all find these of four and five-year-olds driving cars. And it could be driving the family car around in the woods or out in the mountains. Or driving other vehicles like golf carts, whether on a golf course or in a housing development, or actually out of the woods or other kinds of heavy machinery.
And in each of these cases, these are the first time that this four or five-year-old was behind the wheel. OK, just the first time, and they already basically understand what they're doing and how to achieve some goal. It doesn't mean they're totally safe. [LAUGHS] And it doesn't mean they can reach the gas pedal, but it's that understanding in their mind and brain of what's going on in the world, their place in it, and what they can do. That's remarkable.
Or think about language production. So again, amazing advances in language technology and companies like Google, OpenAI, and so on doing things. And I'm sure many of you have seen, for example, the big language models like GPT-3. You can't not read about them. And many of you might have even used them. And yet, these models are trained with data sets-- again, we're talking numbers above 100 billion. In this case, 100 billion or maybe half a trillion words and enormous dollar and energy budgets.
But they still struggle to communicate a deep, genuine understanding of the world and all the possible worlds we can imagine. Especially when you contrast them with a four-year-old, who with 1,000 times less training data, as all of us who've had four-year-olds know have the ability to really communicate their understanding of the actual world. And often, it's clearest when they contrast it with imagined worlds and show you that they really understand what the world is because of that contrast.
So instead of telling you more, I'm going to let you hear it in the words of two four-year-olds. One is the daughter of Ev Fedorenko, who's one of our mission leads for language. And this is her daughter talking about-- well, being not herself, but a different character named Owly.
OWLY: Oh, hi. My name's Owly. I don't know who you are. But I have a beak. But you have a nose. Beaks and noses are the same thing, but beaks are a little bit longer. You know that, right?
JOSH TENENBAUM: From one of our other Quest partners, Tomer Ullman's four-year-old. Tomar is a great observer of his children and has had an internet classic tweet where he was describing the experience of his four-year-old in distress on the sofa. Dad asks what's wrong, Tutu? And she says, if my fingers were markers, they would ruin the sofa. But your fingers are not markers. I said, if. [LAUGHS]
So that understanding of the real physics, the value and the cost, and also the counterfactual contrast of the world is amazing. So looking at the contrast and these numbers of 100 billion plus, whether it's dollars or words, and asking where's the gap? We've made so much progress, and yet there's still so much more to go, let alone all the other intelligence capacities that every four-year-old can do, and every human adult grows into.
And this presents us with what is both a timeless question for science and a timely opportunity for engineering. Right, the timeless question is, how do human minds get so much intelligence from so much less? And the timely question is, can we build AI more efficiently? That means both faster, better, cheaper, but also much more generally, robustly, and safely, with this tight coupling. This true integration between science and engineering that's our bet that, yes, we can deliver on that.
Lastly, why here? Why now? Well, again, there's something distinctive about what we've been building at MIT for a while. It's this idea you've heard of supercharging CBMM. That this vision of core human intelligence, it really is what CBMM was founded to do. It's what really was at the founding vision in DNA of both BCS brain and cognitive science and CSAIL. And it's just now I think we're ready to do it.
We've built the teams of cognitive scientists, neuroscientists, and AI researchers, who can talk to each other, learn from each other, and build together. And in some cases, even build things that industry is ready to scale. There are some-- going back to what Tommy said about contrarian bets, even inside Google, there are some contrarians. And they're actually investing in some of the things that we're doing building on this human vision, as you'll hear about in Scaling Inference.
But also, we have this distinctive approach. A vision for how to think about intelligence in computational terms, which is very different from the one that's getting all the investment in engineering. The idea that, basically, we can build intelligence by just taking a single simple mechanism, and for pattern recognition and function approximation and just pour in more and more scale, right?
Whether we're coming from cognitive science, computer science, or neuroscience, we have a different perspective. So in cognitive science, we think about all the ways human intelligence isn't just pattern recognition. That is part of intelligence, but it's all these ways that we model the world and ourselves in it and what we can do with those models.
So our ability to explain and understand what we see, to imagine things that we could see but haven't yet, to solve problems, and make plans to achieve those goals, to build new models of the world as we learn more, and to share our models to make joint plans, and solve problems together and grow our knowledge culturally. That's what human intelligence is about, and we need to understand all of these in engineering terms.
Or in computer science-- this is one of Leslie's slides you're going to see later. We understand that there isn't just one single magic equation. Rather there's a rich theory of many different kinds of computation and computational principles that we need to draw on that have been built over decades. And that bringing these together as part of an integrated platform that underlies all the models we build, whether it's of humans or robots, that that's what we need.
And you're going to see that very rich theory that's developing that we see as the foundations of a science of intelligence in computational terms. And lastly, from the biological perspective, there isn't just a single mechanism that biology scaled up to produce human intelligence. Brains are machines, the most remarkable machines ever built. And they're built by multiple processes that unfold over different scales of time, space, and structure, with fundamentally different mechanisms that have different profiles of resource usage.
And remarkable efficiency, each in their own way for how they use data, time, space, material, and energy, and so on. And ultimately, our science needs to take all of them seriously, right? So brains are not just built starting from nothing and then a single learning algorithm with massive data. But rather, evolution is a mechanism that builds brains over millions of years, which then each individual brain, over months and years of development, builds the mechanisms to be able to learn new things in one shot.
Like, learn a new concept in one example that lets you think about new things in just milliseconds much better than anybody ever could before they have that concept. And then all of that together with culture, again, building things collectively now over generations of human time. Understanding those mechanisms and their interactions, mechanisms that adaptively build new adaptive mechanisms as evolution has done, as culture has done, that's, ultimately, where it's at.
And that's especially important when you think not just about individual human minds and brains but looking up and looking down, when you look to the cells and circuits that make up the brain, their remarkable efficiency and energy usage, the fact that only 20 watts of power energy per unit time makes this thing go. Rather than having to run a huge data center and the ways that the big companies are scaling those up to truly planet-size energy users, right?
There has to be a better, more efficient use. Or scaling up, where does knowledge ultimately come from again? Like science and other cultural means, those are the most efficient means of generating knowledge about the world. They don't go on inside individual human minds, but they go in with other kinds of mechanisms. So ultimately, a true science and engineering of intelligence that integrates across all those scales is going to be the way to build a vision of artificial and natural intelligence working together to make us truly actually smarter and better off. So that's our roadmap. And let's hand it over to Leslie now.