Computational Models of Cognition: Part 3 (41:13)
Date Posted:
August 16, 2018
Date Recorded:
August 16, 2018
CBMM Speaker(s):
Joshua Tenenbaum All Captioned Videos Brains, Minds and Machines Summer Course 2018
Description:
Josh Tenenbaum, MIT
In this third lecture, Josh Tenenbaum first provides an overview of recent efforts to formulate neurally plausible models for face recognition, intuitive physics, and psychology, that integrate the probabilistic programming framework with deep neural networks. Some preliminary results of empirical studies provide hints about the brain areas that may be engaged in these computations. The final part of this lecture reflects on how the understanding of intuitive physics is learned from infancy through childhood.
JOSH TENENBAUM: But I do want to take a little bit more time to show you how this can link back to the brain since this is the Summer School for Brains, Minds and Machines and say a little bit about learning. So I'll just spend a little bit of time on these last two questions.
On the neural circuits side again, what we might be asking here is, so here's the picture you've seen from DiCarlo and Daniel Mintz and others again, the striking success in relating some computational model in the form of a neural network, to the parts of the brain that do that thing in particular a neural network trained to do object recognition, relating to the ventral stream that part of the primate brain that seems to be primarily doing that.
So we can ask what are the prospects for doing something like that here? OK, well that's going to require a number of things. We're going to have to find the relevant parts of the brain and we're going to have to build neurally plausible or malleable models. And we've been starting to do all of that.
So just to sort of show you what that might be like. OK, first we might say, well, can we start to build neurally plausible models of how you might invert probabilistic programs. So this is, in a sense, you could think of this as almost like an alternative theory of the ventral stream all right?
If we see vision not as solving a recognition classification task but is actually trying to invert graphics, what would that look like can we build neurally plausible models of that. Now I already said we could, that's these fast models and they're really good models because they are much more satisfying from an engineering point of view and they're also malleable onto the brain.
So here's an example of something that started in a way with the Summer School it's a collaboration with Venrock Freiwald who you'll hear from tomorrow. He's one of the experts along with Dorasan in studying monkey face perception. These days he's really broadened out into a lot of other areas of social perception and other kinds of monkey cognitive neuroscience.
But I'll show you some work that we started doing with Venrock that has really been driven by Ilker Yildirim who's a research scientist working in our lab and with Venrock. Well we said, OK well, can we develop basically a neural network to do inverse graphics that is sort of an alternative to the standard classification view of model of the ventral stream. And it's especially interesting in the context of faces because as you'll hear from Venrock I'm not really going to steal his thunder on this, just going to show you what we've been doing together.
But as you'll hear, he, and Doris and others but they've really done a lot of pioneering work here have really been able to characterize in really unprecedented precise detail what the neural representations at different stages of the ventral stream look like for faces.
And they build on work that you'll hear more about from Nancy Kanwisher also in just functional localization of specific modules of so-called patches of cortical circuitry that really seem to be very specific to processing faces. OK? And they in particular identified using monkey fMRI a network of six and now it's more.
Little patches of stuff along the stream in Macaque IT cortex, that all seem to be responsive to faces and what we did with was with Ilker was to basically build one of these inverse graphics networks, It combines today's deep network technology with an approach to training that's inspired by something called the wake sleep algorithm. If you've heard of this, this is a really nice idea from Jeff Hinton, one of the deep learning pioneers and inventor of that phrase, deep learning.
But before he had the phrase deep learning, he had this idea and I think it's even better than deep learning, but it's a way to basically use a generative model like a probabilistic program although he used a much simpler neural network but we're using a graphics probabilistic graphics program. Use that to train what's called a recognition model, which is a model a neural network that is trained to do the inverse probabilistic inference.
So it looks a lot like a convent but it is a kind of convent but its outputs are not class labels like object labels or people identity labels its outputs are the inputs to the graphics engine. And importantly that gives it two special properties. One is, it's a much richer space of outputs right it's not just a discrete label but it's the whole 3D shape and texture, for example.
The other is, that even though the outputs are much richer the training data don't have to be provided by the world or a human label or whatever, it's what we call self supervised. The generative model the graphics engine in your head provides all the training data you need because you just this is the sleep part or the dream learning part.
The generative model can imagine you sort of imagine different faces from different viewpoints, you just synthesize your own images in your head that's your top down imagery system if you like, or dream system, and that provides arbitrarily large amounts of very rich training data for these bottom up recognition networks.
So it could built one of these models, and this just shows that it can effectively invert the graphics engine but the key cool thing that he showed from a neuroscience point of view, is if you take data like this, this is from one of Venrock and Doris's classic papers by now where they took three face patches so-called middle face patches and then these the interior ones and ones in the middle.
So there's Al which is sort of in between in the feed, just like the rest of the ventral stream there's a primary feed forward circuit here that drives the initial wave of activity when you flashback up images, but there's also feedback but there's definitely a feed forward path through the circuitry and there's these three stages from ML, MF to Al to AM and they characterize them in different ways but in this paper they characterize them in terms of these similarity matrices.
If you guys know the RSA style similarity Metrowest RDM OK, so these are 175, 175 matrices for each row and column is a different face or a particular face from a particular viewpoint and they're blocked by viewpoint. So there's 7 by 7 blocks for the viewpoints and then within that there's 25 by 25 structure of face different face individuals.
OK, and what you see as you go down the hierarchy you're up in processing, is you go from this 7 by 7 blocky structure to this more banded structure where those type bands represent basically the individual level similarity so it's the way they originally interpreted it, is this is your face recognition, it's come to represent invariant identity independent of pose. Whereas early on, you can hardly see the identities in there at all but you really see the poses.
And then it goes to this interesting intermediate phase of what they called mirror symmetry, where you have a lot of similarity between the two different profile views or even somewhat the two different 45 degree views. So it's still viewpoint dependent. There's little bits of identity invariant stuff in there, but it's got this sort of mirror symmetry response.
And what it showed, I still find this remarkable every time I look at this is that a model that's trained to invert the face graphics engine, basically produces those same kinds of patterns and you can quantify this here I'm just showing a picture, he also contrasted it with many other ways of building models and he spent a long time developing and testing many alternative models.
This is a model, which is a state of the art, not exactly state of the art-- it's a few years old at this point. But it represents more or less the state of the art approach that's been practiced in computer vision for the last few years for face recognition. It's the so-called VTG face network, and it's a really good at face recognition, including for these images. It's just not the way the brain seems to work.
It produces just very different patterns, and I won't go through the details, but it's just interesting that a network which is-- this is a network which is trained like the ventral stream models that Jim DiCarlo talked about, it's trained to classify objects. In this case, the objects are face individuals as opposed to inverting the graphics engine and it just does a rather less good job. You can test these models behaviorally with various illusions.
Another thing that is cool about them, is that you can interpret their intermediate representations. So again, one of the contentious issues especially as deep learning and deep networks, really start to transform what models we can build of the brain, is there something interpretable in these models or is the goal of neuroscience now to just like generate a lot of training data and explain a lot of variance. So I don't know if Jim DiCarlo talked about this much in his lecture, but it's something that we debate a lot around CBMM.
But what's nice about these models, is because I mean it is partly in virtue of how they are built all right, if you have a causal model especially one that has multiple interpretable stages as you see on the left there for a face graphics engine, then the most efficient way to invert it is just basically to turn around the arrows that exploits what's called conditional independence in the causal direction.
And that leads to a model which naturally can be essentially interpreted where you can say let me consider different layers and see do they seem to represent an approximation in the recognition pipeline to some stage in the generative model.
And interestingly this middle patch level here, which I was describing that's really the first stage of the proper IT cortex part of the middle on advanced part of IT cortex. I mean it has this kind of blocky structure by viewpoint, but it's not arbitrary.
It Is also similarity structure among the viewpoints which is very well captured not by the raw pixels of the face, but actually by what are sometimes called intrinsic images or it's very similar to map 2 and 1/2 D sketch a map of surface normals or depth basically, our people call envision a mid-level vision representation.
Basically, a representation of the visible surface properties of faces seen to be a reasonable approximation to what's being computed in that middle stage of processing. But that is also a standard representation in the graphics pipeline.
As you go in especially in game graphics engines all right? As you go from 3D to 2D it's very useful in your graphics engine to pass through a 2 and 1/2 D your surface representation because if you want to render in real time as things change, it's really in your interest to ignore all the parts of the world that you can't see. And something like a 2 and 1/2 to sketch or an intrinsic image depth map does exactly that.
It represents let's say the depth or the surface orientation to just the nearest visible surface along any direction from where I am. So I think-- I mean for those of you who know the history of computational vision and David Marr's important role in that or the idea of intrinsic images, which is something that my dad actually helped to invent and pioneer so I have a lot of affection for these ideas which were early ideas in how to build computer vision systems well before deep learning neural networks.
And these are ideas, which especially Marr's version-- David Marr's book on vision is a great book and everybody usually will recommend to read it for the philosophy, but he's probably not right about any of the details, I actually think that he was right about important details, especially the 2 and 1/2 D sketch.
And I think we've started to understand both why because of the way the role that plays that mid-level representation plays in doing inverse graphics, and we can now actually build systems in which something like that through a combination of design principles and learning seems to emerge and we can even see evidence for it in the brain. So I think that's pretty exciting and this isn't common sense, this is just what we got when we started to think how do these tools meet up with the brain.
In work that I mentioned a little bit before Jonathan Wu who's an advanced PhD student working partly with me and with a number of others at MIT like lots and lots of people, he's a super collaborator in addition to a Super researcher. Has used similar ideas but now in computer vision settings well beyond faces, to build systems that can say for example infer the 3D structure of common objects like chairs, but also using these mid-level 2 and 1/2 D type representations and I'll just refer you since time is short to his work If you're interested in that.
But check out, for example, either his nips paper from last year, which was called Barnett or his recent to be presented ECCV paper, which has some other name that something about incorporating a shape prior, which adds that takes the same idea but also adds in a prior On shape.
And it's really pretty cool right now that we can build systems that, for example, we'll take an image of a chair they're built with the knowledge that a lot of the knowledge of 3D chair so this is not a system that-- it's not designed to discover the 3D world it's just designed to see 3D.
But it's pretty cool at this point that we can build systems that can take many different kinds of real images of chairs and then produce-- well his system is this one it's not perfect, but they are producing these pretty rich 3D models of both the visible and the non visible parts of the chair.
A lot of progress is still being made on this and there's a number of other groups that work on this. A lot of people, for example, at Berkeley and Jitendra Malik's group have made very nice progress on these things and I think together the computer vision community is really making great strides.
And the problem of 3D object perception from 2D images, is one that will be solved in the relatively near future using the combination of tools that I've been talking about here. all right, the idea of probabilistic, generative models, probabilistic graphics models and in this case deep learning systems that can be trained to provide fast approximate inverses. All right.
Now what about for actual common sense physics and psychology. Well, So here we have to start off by finding the right parts of the brain we don't already have the ventral stream all right, and some work that you'll hear about from both Venrock and Nancy Kanwisher in the next couple of days. Nancy has been working on this thing for longer than anybody really using fMRI to try to find parts of high level vision if you like. or where perception meets cognition.
Recently we've done some work together, this is work mostly done by Jason Fisher who's now was in her lab as a postdoc and is now at hopkins, and has been excitingly extended and in the process of being studied by Sarah Swetman who's a current PhD student with Nancy and with me.
But we've tried to identify what we think of as the brain's physics engine, so this is basically we give people tasks like the ones you've seen a number of different tasks, and we say what parts of the brain functionally, selectively underlie these computations.
And we found a network this was published two years ago in PNAS of premotor and parietal areas which seems to be the key area for the computations I've been talking about. Now what's interesting about this, there is a number of things interesting about this, that motivate further study of this area.
Our goal in this work is not to stop here, this is just the first step. Having found what we think are the relevant parts of the brain, now we want to study how they work. But one thing is that we're far from the first people to focus on these parts of the brain.
So similarly if not maybe perhaps identical brain networks have been identified with a number of other functions previously especially as a network of areas involved in action, planning and tool use.
Have mentioned tools before all right like the crows, and the other animals and the kids and I've also mentioned action planning all right I mentioned the MuJoCo physics engine, all right. So this to me was I can't say I predicted exactly this but it was exactly what I was hoping to see in a sense.
Based on not my personal preference, but just what I know from the engineering and robotics side all right, at this point if you look in for people who build humanoid robots and try to get them to walk around, especially over complex terrain or to pick up things or to manipulate objects, the best ways of doing it, and producing really natural humanoid robot motion, take these physics engines that try to calculate the physically most efficient path for the robot given the physical constraints of its body, the environment of the objects, OK.
So it's not-- from that point of view, wouldn't be a surprise to say that well, given the needs of having to do that, if that's what I mean if the brain, if evolution effectively found a similar solution, then that same solution should be useful for a lot of other things than just moving your body around. Just like those MuJoCo engines or other similar engines that roboticists have been building, it's all about forces and masses and friction.
OK, so this does seem to be true at least there is suggestive evidence that that's what's going on. Sarah has been looking at and trying to actually decode mass, looking for key physical properties starting with mass which is one of the most important.
And she finds that she can decode from individual subject's brains, the mass of objects having observed various physical interactions here and she's worked hard to control and tease out a number of possible confounds and we're really at this point pretty confident about this.
You'll hear a little bit about this also from Nancy, but most excitingly she's used what's called a searchlight method, which means if you don't know this method, it means you basically look over anywhere in the brain to see where can you reliably above chance decode a certain property like mass of an object And it turns out it picks out exactly the same areas as we independently identified just what seems to underlie like physical prediction. So this is really striking that she's found this.
And you'll hear a lot more from Nancy and Venrock about analogous brain networks for understanding agents. I think I'll skip this other than to just advertise some cool paper that we just helped to inspire but it's really great work by the roboticist systematic saint who inspired by these ideas about tool use in physics engines has gotten robots to be able to do the things that crows or apes can do as well as humans and children which is flexibly use objects in all sorts of complex ways to achieve goals.
And he's actually put these sort of intuitive physics engines into the best robotic. He sort of combine these with robotic symbolic action planning things in order to get robots to do this thing. Get robots to-- let's say here's the goal is you want to get that object and those are the tools you have so you have to figure out that you should move the object to the wall and then slide it along.
Although here's another way to use that object to achieve the goal, which is about it to yourself or there, there's a piece of paper so you could push the ball onto the paper and then pull the paper OK, for example, here's a classic experiment that's been done with animals. Use the small hook to get the big hook.
So this is really great stuff in robotics, this paper actually I can brag a little bit on this paper because I had almost no role in it. It won the Best Paper Prize at the most recent major robotics conference RSS.
And I think what people have been excited about of course, people are very excited about the role of deep learning in robotics as in many other but there's no deep learning in here at all. There is differentiable physics, so this is using an interesting differentiable program.
But the key is it's using it in the context of a symbolic action planning system which can form goals and sub-goals and can think I need to get the ball so I need to get this so I need to get that. That symbolic layer supported by an underlying layer of differentiable physics to actually take those plans and use your intuitive physics to make it real, seems to be an exciting way to actually get robots to flexibly do behavior that they were never programmed for or trained on.
OK, and one small contribution we made is to show that humans do roughly similar things in roughly similar distributions of actions. OK, now and so I think we can point to where in the brain and maybe even how that's being done, but none of these models get to an actual neural circuit implementation.
So if we want to be able to do the thing that DiCarlo and Yildrim and many others have been doing in the ventral stream at the single cell level with these networks, OK, then we actually need neural implementations of these physics engines, All right.
I mean again, you could try to build one of these, a deep network I think I've already hopefully convinced you or at least I argued, that you don't want to build a neural network that just implicitly solves one particular physics task, you want to build a system that actually models the physics.
But then you can say, well, what do you start with, and what do you learn and how do you build that, all right? for a while Jeff Hinton and colleagues and other people interested in building recurrent dynamics models, for example this paper by Hinton on the current constrained restricted temporal Boltzmann machine.
They were interested in modeling similar kinds of intuitive physics situations but they really interested in learning from scratch and not putting in any notion of objects or any other symbols and so they trained it purely from pixels but to predict other pixels for objects in motion.
And there were some initial results that were impressive for the time, but what people have found sense in trying to do this, is that if you try to build l call it a neural physics model that doesn't explicitly have objects it just goes from pixels to pixels, it doesn't work very well if you can train it with a lot of data but it doesn't really generalize.
But what people have found exciting is, you can get some much more impressive generalization performance by explicitly putting in many of the concepts of the physics engine, and in this sense these are models which are another way hybridized symbolic probabilistic program approach with a neural network.
So here these things, one of these is what was literally called the neural physics engine by Michael Chang and colleagues that was involved in that work Michael did as an undergraduate at MIT, he is now a PhD student at Berkeley.
And but there's many other versions of this idea Battaglia and colleagues have something called interaction networks which is very similar, and there's a number of other ideas here. But the basic idea which is in common to all these things, is to say they're not going from pixels to pixels although they can be interfaced to pixels OK.
And actually Jiajun did this very nicely in some work that I will mostly skip over but I'll show you a little of this in a second. But this is just working with a symbolic representation of discrete objects and their relations or interaction.
So it could include things like spring couplings or collision events, like a physics engine and a game engine, all right it represents the world in terms of discrete objects or like object files OK, but where the learning of the neural network part comes in, is and how the force dynamics work.
So it's not given ethic or some A, it's given the overall algebraic structure. It knows about pairwise interactions within objects which are kind of forces, and objects can have properties which are mass that modulate how the forces add up or basically add up the proto forces and then add in some object specific properties and outcomes basically acceleration.
So there's sort of given the kind idea of ethical sinepathy they learn about how the forces work and they can even learn the concept of mass in some form. OK, at least in some form. And these networks can be trained on a certain number of objects and then generalize to a much larger number of objects or different configurations like with walls that have different geometry so because they're not bound to pixels they generalize much better.
They still don't generalize nearly enough though, so you train them to model with lots and lots of data of balls bouncing off each other, and they model balls bouncing off each other. That's not like what we have in game physics engines, where they can model blocks and balls and fluids and glass of ice with we pour water into it, and there's water and ice in the glass and you can spill it or you can drink it.
The level of expressiveness and flexibility that we have in today's symbolically structured physics engines, we are far from being able to build any neural circuits that can do that. But maybe this is sort of a step in that direction.
I think our current best models for how to think about both where the ventral stream and let's say these physics areas, is the thing that Johnston Wu and Ilker have also been working on. Where we say, well let's say we think that the ventral stream is doing something like inverse graphics, and we found the physics parts, then basically what we have right now is we have deep networks like the ones I showed you for infographics Jiajun a has something that he calls the Neural Scene De-rendering which is sort of a more flexible object based version that can see objects.
And then our best model of the physics engine is still it's not these neural circuits, it's still the things that we have in game physics engines but they're stepping in that direction. But what's cool is that we can glue these things together now not train end to end although you can tune it or fine tune it end to end if you want.
But you can glue basically a deep network for object, 3D object perception in multi object scenes, and then you can just glue that directly onto a game physics engine and you can get things that can-- a system which can very quickly, because all the hard stuff is all-- the things which would otherwise be slow in top down MCM franse are done in a bottom up way by the network mostly, could be a little top down refinement but mostly bottom up, can very quickly look at a scene like that and do basically everything that I think you need to do, which is perceive the 3D structure and then know what's going to happen.
So know that, if that scene is unstable, for example, if you look at these scenes here think about how the blocks will fall. Imagine these are like the first frame of a movie imagine what you're going to see when I show the movie, and then I'll show you the actual movie and I'll show you our system imagining the same thing.
I mean you really don't need, really just a quick glance is enough to figure out what's going to happen. And now I'll show you what actually happens, on the left is what actually happened and on the right is what our system imagined. And hopefully you imagine something similar. Yeah, did you?
OK, I'll show you another one here. So take these scenes and imagine what's going to happen and then you can see what actually happened and you can see our system. And I think hopefully in your own intuition and in our system, what you see is you're not probably perfectly imagining exactly what happens, all right? It's not exactly right, but it's right at the level of intuition. OK?
And it's pretty close and it's also what you'd need to hear a few more it's what you need to actually effectively plan interventions like where would you hold it if you want to keep it up, or where would you attach it and we can use the same systems to effectively plan interventions which keep it from falling or make it fall, all right.
There's other ways we've been working on trying to integrate physics engines and bottom up perception but I'll skip that although if you're interested, check out the Galileo work by Yildirim and Wubin colleagues is I think another nice stab at that.
And as our best models right now, I think of how to integrate basically bottom up perception and mental intuitive physics for perceiving actual physical attributes like mass. Making inferences about those. And that's the thing that Sarah starting to decode so I just want to point where these ideas are starting to come together.
And this is probably a good point to end and go to lunch, but I'll just leave you with just-- give me two more minutes to just show you what the future will look like which is the question you started asking, the question I also started asking is, how do you actually build these?
There's going to be a lot more stuff next week on learning and development. So you'll get plenty of that. But if we want to say, how could we build some learning program learning algorithm, that could actually build a physics engine? All right?
And it's going to be some combination of nature and nurture or evolution development and learning, our best evidence is it's not just built in your lifetime, but these are genetically unfolding programs built by evolutionary exploration programs and then some actual learning mechanisms in your life.
But when I showed this thing with Turing and I said, well we now understand that children's learning is much more sophisticated than just writing things down, well, again you'll get to see this next week, especially in Laura Schulz and then also in Sam Gershwin's lectures.
But what people sometimes call the child a scientist, or the idea like inspired by the way scientists build their own you physics all right, how to think about how to actually come to his physics it wasn't by finding patterns of data it was by exploring a space of theories, seeing what made sense with the data very limited data perhaps all right, but really a lot of like creative what Laura Schulz calls Goal Directed Problem Solving basically goal directed exploration in an intuitive space.
So this idea that children do some similar sort of theory driven search process, has been-- again there's a whole line of work which I don't have time to tell you about but you'll hear some about next week. Studying children's learning from this point of view and some of the ways they play with objects as a intuitive form of experimentation. But what we want to know is, can these methods somehow explain how you could effectively learn to program the game engine in your head?
So if evolution gives you something like a general game engine, then learning is figuring out the game of your actual life or the different games. All right? Just like in game engines and just as in our training life we can learn many games. but somehow we have to do that.
So what we'd like to ask, and again this is just the program of research going forward is, can we build effectively programming algorithms that learn to program the game engine in your head? And that might be able to capture some of the developmental data. So you'll hear about this is from infant data mostly not from spell kids lab although I should tell you a little bit about this trajectory, but from work by some of the other great infant researchers like Renee Bio Shan.
You can map out very roughly the things that infants know let's say about physics over the first few months and years of life and then you can ask, can we capture these different stages with say different stages in a game physics program, and can we have a learning algorithm that in response to the experience that a kid might have sort of reliably develops you from one to the other.
Now if you actually want to build up learning It's a very hard problem, we call it sometimes the hard problem of learning because it is much harder than the problem of gradient descent in neural networks. All right?
The reason why people like end to end differentiable systems, and algorithms, because you have algorithms like stochastic gradient descent, which turn the problem of learning into the problem of just rolling downhill, all right? And that's a lot easier to roll downhill than to write a program if you have the right data set, which gives you the right cost function landscape.
But if you want to talk about search in the space of programs, That's just much harder. And I think that these slides don't have the modification I made, which is just to jump to the end. So where we're going, is this thing, to complement what you'll hear about next week as the kind of child a scientist if you like is an algorithmic perspective that we call the child is coder or the child is hacker.
At MIT we think hacking is a good thing, the rest of the world as Laura is always pointing out to me thinks that hacking is like a bad thing when you break into people's email or credit card accounts or something. But what we mean by the child is hacker or coder, is this idea that effectively that learning is like programming.
If your knowledge is a library of programs then learning has to be modifying your knowledge to get better and that's basically programming all right? And think about all the activities you do when you're coding, when you're hacking on your code to make it better, all right? I think all of those have correspondences with what children do.
At the moment, we only have effective algorithms for a couple of these. So like tuning parameters of existing code that's basically what you do with gradient descent or other parameter optimization algorithms.
But all these ways in which you actually modify the code or write new code, or write whole new libraries of code so-called domain specific libraries that capture the functions you need to use and reuse in a domain. Or whole write actually a whole new programming languages. All right?
The activities of cognitive development look like all of those, children do all of those kinds of things as they come to learn about the world, especially when they are able to access natural language, a language like English, OK.
So I think the frontier is working on algorithms that can do this, CP Andadosy he is a former student from our department is one of the leading people on this. So if you're interested, I would point you to a lot of his papers a limitation of his work.
A lot of his best work is that basically what he has tried to do is to he's developed some really cool tools for basically doing that top down MCMC type inference that I showed you was the slow inference in face perception doing it in the space of programs.
So he does random walk in the space of programs, and again if you're willing to wait long enough, you can come up with a good program, like programs that capture concepts of numbers. Or programs that capture all sorts of interesting expressions and language.
But the basic problem is that, just like we don't want to wait that long to see faces, well if we're modeling cognitive development we can wait a while it's OK, you can wait some time it takes a while.
But Laura in particular has pointed this out and others too. There's no way that kids learning, or a kid's learning is probably not just a random walk in the space program. It's got to be more like the way real programming works which is more like you have goals and those goals give you sub goals and you actually try to solve problems. It's more systematic.
So the last thing I just want to leave you with is what I think is an exciting one of the most exciting developments both on the machine learning side, and this is really technical stuff, so I'll just point to it, and if you want to explore it more you could but also on the cognitive development side, which is the idea of basically coming up with algorithms that write code in a systematic rational kind of Bayesian learning sense.
So Kevin Ellis is a student in our department, whose works partly with me and also with Armando Solar-Lezama who is a faculty in computer science at MIT in the field of programming languages. So he's not a cognitive scientist or neuroscientist, he's not even by an AI researcher his field is the field of designing new programming languages.
Armando specialty and it's sort of been a niche area inside the PL field as they call it, is writing programming languages that are incomplete where the programming language fills in the code that the user hasn't gotten to yet.
So in his PhD thesis, he built something 10 years ago called sketch, where a programmer writes the sketch of a code and the machine fills in the rest. It's neat if you can do that, but effectively what he's done is he's developed very efficient tools for solving these hard search problems of coming up with code that solves the problem.
And Kevin's been combining this into a Bayesian learning paradigm where you can try to find the best code that solves some learning problem. He's applied this to a bunch of problems in linguistics, which I won't tell you about he's applied them to a bunch of problems in-- sorry in my earlier version of slide before my PowerPoint crashed these were all hidden so I'm just scrolling through them.
He's applied these to really cool things of interpreting line drawings, for example, where he can look at a hand drawn diagram and this is a nice interface between neural networks and again, these programs synthesis methods where a neural network can be trained to do again the perception part to go from a pixel image of a diagram to a symbolic trace of the basic objects but then these programs synthesis methods look at those and come up with the more abstract code that captures the repeated motifs.
So that, for example, here each of these cases is showing like a hand drawn diagram and then the machine sees it, passes it into the objects but also sees the abstract program structure and can extrapolate it and add on layers to the diagram that we're not drawn because the machine is able to understand the program that's implicit in those scenes.
So you could ask, could these ideas, could we scale these up could you actually take all of the things that a child learns over their lifetime as they're growing up? And explain them as coding. And I'm optimistic in my crazy way that we might be able to do that.
Kevin's latest work is a version of this idea, what he calls Dream Coder which is inspired by some of the ideas of sleep consolidation where you take things that happen to you in the waking life, and you don't just do sort of sleep, wake sleep learning to find patterns although these systems actually use neural networks in that wake sleep style learning to learn efficient inference.
But they also do a kind of sleep consolidation where they take problems that they've solved in waking life and abstract out common elements, that can become new abstractions that effectively write whole new domain specific programming languages. So these are ways to over time progressively build up whole new languages of thinking.
And again, I'm just trying to show you where the future might be, can we engineer this at the scale that Google wants, not yet but stay tuned. All right, so wrapping up then, and I know I try to do maybe a little too much but thanks for your patience and we'll all go to lunch now.
But just to wrap up, what I tried to show you here was a broad view from cognitive science of what are the problems that we really care about at the Heart of Human Intelligence. All right, can we get at this common sense core, can we reverse engineer these abilities can we understand them at the level of minds brains and machines and how they might even be built, OK.
And I've tried to show you some of the themes that cognitive science has pointed to which are much, much broader. Even if you don't care about intuitive physics or intuitive psychology OK, these ideas that intelligence is about thinking causally and thinking compositionally, having models with parts that can be combined and recombined, all right.
And where there's structure in the models that correspond to the actual causal structure of the world like objects and forces and agents and their plans, or the ability to have hierarchies of different levels of representation and to do inference at different levels to explain both how we can perceive in the moment what we see, but also how we can learn, over longer time scales.
These are ideas, which I think again are really general themes in the study of intelligence and I've tried to show it a toolkit that's been emerging over now a number of years but it's getting to the point that it can be very broadly useful for I think a lot of people.
These ideas are probabilistic programs as the basic knowledge representation and inference framework, specific programs to capture say the heart of common sense from game engines and then these new tools coming from programs synthesis and integrations of those with say Bayesian learning that might allow you to actually grow new code.
And you know what this gives us is potentially a roadmap for this big problem of AI all right? What might be AI's oldest dream? Can you build a machine that grows into intelligence the way a child does? And a roadmap for saying, well, what do we start with and what might the learning mechanisms be? I'm sure these answers aren't right. I'm sure they're very incomplete, but at least we have a roadmap of a place to start.
And so if you're interested in this, it's a very big project. Some of you are maybe even working on little bits like we all are in your own work or in some of your projects here. But I'd love to talk with you about the bigger questions and other ways that we could all work together on these things. So thanks very much.
[APPLAUSE]
Associated Research Thrust: