Doing for robots what nature did for us
Date Posted:
February 5, 2020
Date Recorded:
February 4, 2020
CBMM Speaker(s):
Leslie P. Kaelbling All Captioned Videos Brains, Minds and Machines Seminar Series
Loading your interactive content...
Description:
Abstract: We, as robot engineers, have to think hard about our role in the design of robots and how it interacts with learning, both in "the factory" (that is, at engineering time) and in "the wild" (that is, when the robot is delivered to a customer). I will share some general thoughts about the strategies for robot design and then talk in detail about some work I have been involved in, both in the design of an overall architecture for an intelligent robot and in strategies for learning to integrate new skills into the repertoire of an already competent robot.
TOMASO POGGIO: I'm Tomaso Poggio. Welcome to the first talk of the CBM series of this fall semester. As some of you probably know, CBMM's mission was to push research at the interface of neuroscience, cognitive science, and computer science. And the goal for the first five, six years for the center was really to create a community of researchers in this new area, the intersection of natural science and engineering.
And I think we have by now achieved the-- you know according to most measures of this goal, we have done that through a lot of collaborations, of talks like this one, a very successful summer school. And we have now a community of 25 or so faculty members, many at MIT, but including Harvard and a couple of other institutions, and probably more than 200 or so researchers, postdocs and students and so on.
And during this time, which means the last six or seven years, it made sense for us to nurture this community, to preserve its identity. And now it is the time we decided to essentially broaden up, to try to convince people, researchers, other faculty, where's the boundary with computer science and neuroscience, of why this synergy between machine learning, and neuroscience, and cognitive science, why it's important, why it's relevant. Of course it cannot be all the research done in AI, but I think an important part of this research is really the combination of natural science of intelligence, and the artificial engineering of intelligence.
So with this in mind, and a number of discussions and meetings of the mind, we decided to invite a few more members in CBMM. Leslie is the first one. Daniela Rus is another one, and Aude Oliva, and Nick Roy. And this also will help merging eventually CBMM with the Quest for Intelligence, given that all the people I mentioned are part of the Quest. So with this, I am very happy to have Leslie to be our first speaker this semester.
And Leslie has been at MIT for a number of years. She has been the face of the AI lab for many of us, and a happy face I must say. Among other things she did, a lot of things I'm not going through, but after getting a PhD at Stanford, she took the very good decision to move on this side, on this coast. And among other things, she is the founder of the Journal of Machine Learning Research, which has become one of the top-- I would say the top journal in machine learning. So Leslie.
LESLIE KAELBLING: Thank you. OK. Thank you for inviting me. And I should say that I really enjoy backtalk and stuff like that, and questions. And so I'm going to give kind of an informal talk. I have way too much stuff that I could talk about. So I'm going to adapt as I go. And I really, really welcome questions, or complaints, or anything. And I should say, you know, I picked this title. I've used it before when I talk to engineering type people and it's fine. When I walk into now a room where people actually know something about nature, then I feel nervous because I don't really know very much at all about nature. So you can help me out with that.
But so what I want to do personally is make general purpose intelligent robots. I want to make robots that are indistinguishable behaviorally from you. And I would like to just understand how to do that. So that's my kind of overall goal. And just to help think about that as a problem, because that's like much too hard, lately I've really enjoyed a proxy, which I'll tell you about, because you might want to think about it too. And actually, in some sense, I assigned it as a homework assignment in a class last semester, which is, to make tea in any kitchen.
So I would like to make a robot that could go to your house, to go to anybody's house-- I think approximately everyone in the world puts leaves of some sort into hot water and then drinks it, right? And so if you went to somebody's house, you could figure out how to do this. You could look around, you could find the hot water, you could find the leaves, you could figure out how to, you know--
So what would it take to make a robot that could do that? So let's just take that. That's a simpler goal than all of intelligence. But it's actually really quite ambitious. And I don't know how to do that either. But it scopes the problem a little bit. So one thing I think I know is that it's probably not just a reinforcement learning problem. Imagine the robot going to your house, You take the robot out of the crate and you say, robot, make me tea, and it says, OK. And then-- OK, so probably, that's not it. So the question is, well, how do we make this work out?
And actually, I'll just tell a little bit of the story, because actu-- my PhD thesis was about reinforcement learning. I'll tell you a little story about how I got there, and then why I diverged.
So I showed up at SRI when I had my undergraduate degree, and they had this robot, and they wanted it to navigate through the hallways of SRI. And none of us working on that project really knew anything about robots. It was embarrassing. We didn't know anything about control. We really didn't know anything about anything. And it was my job to write a program to make the robot go down the hallway using the sonar sensors. These terrible sonar sensors.
So I would write a program, and the robot would crash. And I would bring it back, and I would change the program, and the robot would crash. And I did this over and over for weeks. And eventually, I had a program that made the robot go down the hall using the sonar sensor.
So that was good, sort of. But what had happened in that process was that I learned how to navigate down a hallway using sonar sensors. And I decided that I did not want to be in that loop ever again. That the robot should learn, and not me. So that's what really motivated me to think about robot learning. So that's the story. Oops, wrong button.
So then I did this terrible reinvention of reinforcement learning. The only thing that was good about it was that I had pleasure instead of reinforcement. But then eventually, I figured out about actual reinforcement learning literature, and I had a little robot that could do reinforcement learning. So that was pretty cool. That was in my defense.
And then-- when was this? 1995. Oh, for those of you who are young-ish, this is how we used to make talks, by writing on pieces of plastic with colored pens. So these are scans of some old slides of mine, just so you know.
By '95, I was trying to figure out how to take what I understood from reinforcement learning and actually use it to solve big and interesting problems, and I wrote the following thing. "The romantic ideal of a big pile of circuitry that learns to be an intelligent agent can't be achieved. We have to design an agent with lots of structure and many small circumscribed learning problems."
OK. So I still believe that. So I think-- and general purpose goo that you're just going to train up by RL is not going to get to a robot that can make tea in your kitchen. So then the question is, well, what is? And I think, meh, some combination of design and learning. And so the burning question is, well, what kind of a combination?
OK. So here's how I want to think about the problem now, just a little bit more formally. So I think of myself as a robot factory. I gave this talk somewhere, and people thought I was talking about factory robots. I'm not talking about factory robots. I'm talking about a factory that makes robots.
And if I'm the robot factory, I'm going to make these robots, and they're going to go to everybody's house and do what they're supposed to do when they get there. I have to put a program in the head of the robot. Some program, pi. I don't know what it is, but I'm going to put a program in there.
And this program, what is it going to do? Its job is to map the history of observations and actions that this robot ever sees into its next action. That's what it must be. It's all it can be. And that's just how it goes. So methodology-- there's no dogma here. This is just what has to be with a robot if I deliver it to your house.
OK. So the question is, of all the programs I could put in the head of the robot that I'm going to deliver, what program should I put there? And how can I think about that?
Well, the way I want to think about it, anyway, is the program should perform well in expectation over the houses it's going to have to go work in. So I don't know exactly what job this robot is going to have to do when it gets deployed. Maybe when you came and ordered robots for me, you gave me a spec, and you said, well, here's a distribution over places this robot might have to work. So I want to put a program in the head of the robot that's going to work well in expectation over the places that it could be.
And so if in fact, if that distribution is really narrow, if in fact it's a factory robot, then there's hardly any variability in the distribution. I'm taking the expectation over, and I could write a very particular program. If on the other hand, it's all the houses in the world, well, that's a lot of variability, and the question of what program works best in expectation over all those environments is kind of an interesting and complicated one.
OK, so this is the way I want to think about the problem. So if there's good news and bad news, one is that, in some sense, we don't have to fight about whether it should be Bayesian, or reinforcement learning, or what should be in there. There is an optimal-- if you give me the distribution, which is hard, I understand. But if it were specifiable, there would be an optimal program to put in the head of the robot. No arguing required. Yes, good.
AUDIENCE: So what [INAUDIBLE] worst case? I was confused by the expectation.
LESLIE KAELBLING: OK, good. OK, good. Excellent. I'm being lazy, and you could put some aggregator. You wouldn't want the worst case, but you might want some more risk-averse measure than expectation up there. Good. Good. So I'm not allowed to burn down your house, for instance, but do really well in somebody else's. Yeah. OK, good. So we can say, I'm going to optimize some risk-sensitive measure there. Good.
But if we pick a measure, there is in some sense an optimal program to put in the head of the robot that's going to work best, according to some metric like that. So that removes all the religious arguments from AI, I believe, which I think that would be kind of awesome. But except for it doesn't completely, because it might remove the religious arguments about what goes in the program, but it does not remove the problem that we now the engineers have, which is finding that program. If you gave me a specification, you said, oh, here's a distribution over domains that the robot is supposed to work well in, now me, the engineer, my problem is finding that program that I could put in the robot's head that would work well. And we don't really know how to do that. And that's what we fight about, actually, I would say, much of the time.
Is this setup OK? Does anybody want to ask more questions, or something? Yeah.
AUDIENCE: I think what you're calling the religious arguments, because you're not-- this is not general-- you can just scope to the problem, as you said. Is that the religious argument that you're trying to remove?
LESLIE KAELBLING: Well, no. I think the religious argument I'm trying to remove is, you might say this robot has to be a reinforcement learning robot, or it has to not have something built into it, or it does have to have something built into it. And the answer is that if you scope the problem, then it has to have built into it what it has to have built into it.
But general intelligence is also a scoping. We're not generally intelligent in a kind of very fundamental sense. We're good for the distribution-- for our niche. So these robots just have to be good for their niche, whatever that might be. Yeah.
OK. So how are we going to do this? Now I would say, well, the problem is the factory. How is it that we-- methodological problem-- how is it that we are going to find a good program to put in the head of the robot?
And there's a bunch of strategies like you can say, well, the robots are going to learn everything from scratch. So that's kind of like the hardcore RL version. I put the robot in your house. The slides don't advance, it ruins the kitchen. So that's not a good program to put in the head of the robot. It's lazy. Lazy engineering. Says, ah, let the robot just learn it all.
The classical view of robotics, since like forever, has been, well, you get really smart programmers, and they think hard about the problem, and they just write the program. But we're not that smart, I don't think. So we engineers are not so smart. We're not so good at actually just straight-up writing a program. We can't write face detectors. We can't write all kinds of things. We use some other method to find it.
Reverse-engineering humans. Well, that would be a strategy, and if you guys would figure out the whole story, then maybe we could do that. There's some reasons why that might not be the whole answer. First of all, it's a hard biology problem. Second of all, my spec might be for robots that do some distribution of problems that's not like the distribution that humans do. And I would be interested in having a methodology for finding good programs for a different niche.
And then we could, like-- I don't know, try to just do-- this is a little flip, but search offline in programmer, in the factory, for a good program. So I think when we're doing machine learning, a lot of the machine learning we do-- and I'll come back to this in a different direction in a minute-- is in the style that offline, we're going to do some searching for a program that we think is going to work well when we field it. So that's another strategy. None of these totally-- I don't totally love any of these, but I think that's kind of like the space.
OK, so let's think about how-- I would like to think a little bit about how reinforcement learning fits into this. So one version of the problem of when we need to do learning is when the engineers don't know very much about the domain. So that's the case where the domain distribution is really big.
And that's a case where the agent might have to do its own learning in the world. And in fact, if you go back to the earlier parts of reinforcement learning, it was meant to be a story about how individual agents learned in their world. The bees and things learn in the world by doing trial and error. And so that's a setup of reinforcement learning, and that's one where the engineers don't have to work too hard. They have to invent the reinforcement learning algorithm, but after that, they're done.
I would argue that, at this moment, if you go and read papers in [INAUDIBLE], and iClear, and all the places where reinforcement learning papers appear, almost none of them are actually really operating in this setting. The setting in which they're thinking about the agent actually learning in its actual world.
And I would argue that they're not measuring the performance of those algorithms in the way that they should if that's what they care about. So if you were trying to design an algorithm for an agent in the world, it hurts every single time it runs into the wall or does a stupid action. You're not interested in the asymptotic optimality of the policy that you learn. You're interested in how many times, how much it hurts while you're learning, so that you want to integrate the reward over time. And that really needs to be the measure that you use.
But almost nobody makes this curve. This is total "penalized totally from the beginning" cumulative reward. But if you say, I'm making an algorithm for an agent that's behaving in the world, I think that's the curve you should use.
OK. So what's the other setting of reinforcement learning? The other setting of reinforcement learning, I like to call it learning in the factory as opposed to learning in the wild. So learning in the factory is when I, the engineer, I actually know a lot about the domain. Maybe my domain is rotating a cube in my hand, or something like that. I know a lot about it. I know enough about it to make a really awesome simulator of it, in fact. But I don't know enough to just write the program.
So in that case, I do something that kind of-- it's kind of like a pun. It doesn't have to be like this, but it just turns out that a good way to compile a simulator into a policy is to run an algorithm that looks a lot like the algorithm that we would have run if the agent was actually executing in the world. And most of the reinforcement learning stuff that people are doing, I would argue, now falls better into this setting than into that the previous.
So different setting. Different goal. And the ideas-- I think of it as a compiler. You wrote the simulator. That was a hard work, the engineer. But the engineer found it easier to make the simulator than to make the policy. Then this is a crank you can turn to make a simulator into a policy. Then you put the policy out there, and the robot does what it does.
But if that's the game you're playing, I want to argue that you should measure the performance of your algorithm in a really different way. It's a compiler. It's a compiler that takes the simulator and makes policy. And all that matters is how long it takes. It's just a computational problem. It's a computational crank you turn. What matters is how much computation you need.
But people don't normally make this plot. So what matters is how much computation you take and how good the policy is that comes out. Normally, people plot on the x-axis, like number of interactions with the world, but the world is a simulator, and so it doesn't really matter. It doesn't hurt it. You can do anything you want. But that--
So this is the right way. This is the plot everybody should make. Nobody makes it, and this is a plot everybody should make if they're doing reinforcement learning in the factory. OK. Have I annoyed anyone yet? No. OK, good. Good.
All right. So what do I want to do? Well, I want to try to figure out how to really aggregate a whole bunch of these approaches. So I think somehow, we should take-- we the engineer. So I'm an engineer. I have this problem. I don't think I can do pure reenforcement learning. I don't think I can do pure software engineering.
So what can we do? We can take constraints that we understand from the physical world that matter coheres and so on. Stuff we might have learned from studying humans and other natural creatures. We should also be developing engineering techniques that work better. Strategies, things like meta learning, and transfer learning, and so on. And eventually make agents that do go out into the world and then do actually learn from their own experience.
So the question is, how do we aggregate all these sources now of insight, and technique, and knowledge, and ideas about how to make a system, and use them all as best we can? So that's what I really want to do.
OK. So what am I really doing? What am I really doing lately is trying to think about a set of computational mechanisms that I feel reasonably comfortable about building into my robot. Now, almost everybody has gotten comfortable about building in convolutional layers in a vision system. And we're comfortable about that because, I don't know, it makes sense from a signal processing perspective, and we see stuff like that in brains, and we say, OK, I'm going to build that stuff in. I trust it. It seems like a good idea. I'm going to learn the filter weights, but I'm going to build in the structure.
So the question is, what 10 other things are there like that that we could build in that we would feel pretty confident about building in, and that we could learn the parameters of? So that's what I would really like to do. And so I want to tell you about in this talk at first is an exercise that we did for several years of just trying to build, actually, a robot system by hand in order to try to understand what a set of good mechanisms would be that we could then use, and that we could do learning with respect to later on.
So what are the kinds of things that I would want to build in? Well, convolution of space and time. The idea that I'm a kinematic agent embedded in space. That I can reason forward and backward with some kind of models. Abstract over object, state aggregation. There's a bunch of things. I'm not particularly wedded to this list, but I think everybody here would maybe have some list sort of like this, things you'd be willing to build in.
And so what we've been doing is building in those algorithmic mechanisms and hand-building the models. And then eventually, what we'll do is talk about learning the models.
So let me say something about the strategy that we built by hand, just to give you an idea of a point in space that works sort of OK, actually. We don't want to say it's the greatest thing in the world, but it's not so bad. I use this as my motivating picture. It is not my kitchen. But imagine that you had to clean that kitchen, or make tea in it, or something.
So what makes it hard? Well, so one other thing that makes it hard is we're going to have an actual robot which has actual perception and motor control when the motors are not so great, and so on, so there's just the fact of actual physical interaction. There's a lot of objects. So robotics people like to talk about how many degrees of freedom their robot has. But then I like to ask, well, how many degrees of freedom does this kitchen have? How many?
It's not a well-formed question. A degree of freedom is like kind of a state variable that you can vary. But you could think about the positions and orientations of all these objects, but there's like-- I don't know, what's the salinity of the grapes, and how many lettuce leaves do I have, and just-- you could get in as much detail as you wanted to to model this kitchen, so I don't know how many degrees of freedom it has.
Long horizon. So if you thought about how many primitive actions you'd have to take to clean the kitchen, it would be really a lot. And the uncertainty is kind of pervasive and fundamental. Not only-- in the robotics world, sometimes you say, oh, I worry about uncertainty, and the answer is, well, just get better sensors.
But you can get better sensors up to a point, but better sensors won't tell me what's in the oven, or when the people are coming home. So there is some uncertainty that you just have to actually act to dispel, and other uncertainty that really is very difficult to dispel. So maybe you can make predictions about the people, but really, how are you going to sense when the people are coming home? So this collection of issues makes a problem like this really difficult. And I can't do this problem, but it inspires me.
OK, so let me tell you about a pretty classical old school fuddy-duddy thing that we've been doing to try to solve this problem. And I'll just skate through some high-level points about our solution, and show you some examples, and then I'll talk about learning.
So the first thing that we take to be pretty important is the idea of actually explicitly representing, somehow, the robot's knowledge state. What does it believe about the state of the world? So you can think of that belief as a probability distribution over possible worlds, although there might be certainly other kinds of ways of representing this. But if you're ever going to a reason that you should do an action in order to find something out, like ask somebody a question or look in the cupboard, then you have to, in some sense, know you don't know something so that you can maybe resolve to figure it out.
So we take this decomposition. It's kind of a standard composition where you say, well, I have one module whose job it is to aggregate my observations over time into some representation of what I know about the world, and another module whose job it is to take action based on that representation.
OK. And our belief state is this very complicated thing which I'm not going to talk about, but there's kind of a database of hypotheses about what objects might exist, and distributions over their properties. There's some representation of the space around the robot, and our certainty about whether it's traversible or not, and there's some representation about what objects tend to be nearby each other. So this is a whole giant research enterprise all by itself.
There's another set of issues which comes up uniquely in these kinds of robotics problems, which is, for some reason, it's not so hard to solve robot motion problems anymore. And it's not so hard to solve discrete AI planning problems. But problems that exist at the intersection, where you have to both move the robot and make choices about which objects to move before which other ones, or how to grasp them and where to put them, those are tricky. So a big chunk of our intellectual effort is focused on these questions of, how do you integrate discrete and continuous planning, basically.
So another high-level point here is, what do we do about the fact that we have some fairly fundamental uncertainty about the world? So another big piece of my former research life was studying POMDPs-- partially observed Markov decision processes. It's a very beautiful formal framework. If you formalize a problem as a POMDP, you find out that there are exact solution algorithms that are unde-- the problem is undecidable, or maybe only doubly exponential. So basically, although it helps you think about the problem, it does not help you solve the problem, and exactly perfectly solving these kinds of problems is very difficult.
So what we do is gross, gross, disgusting approximation. And we learned a little thing from control theory, which is, your controller can be kind of not so good as long as you very quickly re-evaluate the effects of the action that you just took and redesign. Try to just have a tight feedback loop.
So we operate on this principle of bad models, not very good planners, but really good feedback control. And that makes up for a fair amount of badness in the models and the planners. Not all of it, but a fair amount.
So the strategy that we have is to say, well, we're going to plan in belief space. And there's a whole story about that which I'm not going to tell that I can answer questions about it. We're going to make a plan. We're going to say, ah, yeah, we're going to clean this kitchen. And we execute the first step. And then-- and that's going to change the state of the world, and we're getting observation, and we're going to incorporate it into our belief. And then we're going to see if we like that or not, and then we might replan.
And what's interesting is that if you take this view-- there we go-- then from the planner's perspective-- so from my perspective, my reasoning about what actions I'm going to take, this whole gray thing is the plant. So the plant is-- this is a control person's description for the environment that I feel like I'm sensing and controlling.
So from the planner's perspective, everything is in belief space. The goal is in belief space. I say, I want you to believe with high probability that there's a cup of coffee on the table in front of me. The robot can't make things true in the actual world because it can never know what's true in the actual world. So it's going to try to drive its own belief into a state that I like.
And so it's kind of a funny stance to take toward planning, but it makes a bunch of things work out fairly nicely. So that's how we think about this problem. And it's simpler than exact planning under uncertainty, because we're actually not taking into account-- we are willfully not taking into account all the possible things that could happen. We're mostly planning for the most likely outcome.
Now, if the most likely outcome doesn't occur, then we replan. But we're not really hedging our bets in a perfectly optimal way. That gets very, very expensive, so we're not doing it.
OK. So that's that. Another thing that we do to make the problem tractable is make serious use of hierarchy. So the story goes like this. We start with some high-level objective. Maybe the breakfast is made, or that I am in San Francisco. And make a plan at a fairly high level of abstraction, leaving out a lot of details. Ah, I'm going to get to the Boston airport, and then I'm going to get to San Francisco airport, and I'm going to do something.
And our planner can compute with the Gs here. You could think of them as subgoals or pre-images. They're sets of states such that if I'm in one of those states, then I think the rest of my plan will succeed. And so what I do is I take that first pre-image and say, well, that's a subgoal. That's a set of states I would like to get into. How can I get into that set of states?
So I make a plan, a little more detailed plan for that. How am I going to get to Boston airport? Well, I'm going to call an Uber and do some stuff. So then I think, OK, calling the Uber, how am I going to make that happen? And I plan in more detail, and eventually, I'm getting my phone out of my pocket.
And I start doing that. I start executing this level of a plan. As long as the actions are permitted, I execute. And then if something surprising happens, like I call Uber, there's no Uber, no cars available, then I can rethink. But I can actually use the structure-- ooh, that was not-- huh, cool. This is not my normal clicker, and so I don't really know how to-- I'm doing reinforcement learning. There we go. I won't shine it in your eyes, though. I know that's not good.
OK. So I might discover that there's something-- that I can't execute the rest of this plan, right? So that the expected outcome of taking some action doesn't happen. What's good with this hierarchical structure is that I don't have to actually rethink my career choices or the fact that I wanted to go to San Francisco. I can just kind of pop this plan off the stack, maybe, and replan for this.
So by having a hierarchical structure like this, it enables a bunch of interesting reasoning about when to do reconsideration and how much to reconsider when things go wrong. So that's kind of fun. And it also, obviously, makes things much more efficient. It does incur some risk, right? I am assuming, implicitly, that I will be able to walk through the San Francisco airport once I get there. And I don't know why I believe that. I didn't make a plan in detail.
And it would be ridiculous to make a plan in detail, because I don't know what gate I'm going to arrive at or which slow people are going to be in front of me in the hallway. But I believe that I can do that, and I believe that maybe from previous experience and some other kind of high level knowledge. So there's a really interesting question about, how do we acquire the models that let us do these kinds of hierarchical planning? For the robot manipulating stuff, it's not so hard.
OK. So now I'm going to show you a movie. When I show this movie, I have to first give some apologies. OK, what are the apologies? The main apology is that any given thing that this robot does, five undergraduates could make it do better. But it's the same program that's going to do a whole, whole bunch of things, and so it's pretty general purpose. And it can stand a bunch of messing around. So that's what I'll say.
So here's a robot. We told it to put the blue box down where the soup can is. It found the soup can. It said, hm, got to get that out of the way. Move the blue box. What does it know? It knows about space and objects and picking up. Here, we told it we wanted the green box to be on the corner of the table. It has to push it, because it's too big to pick up. So it figured that out. And then it realized it had to move the orange one out of the way and never bothered putting the orange one down. It also knows its pushing is unreliable, so it looked to see if it was working.
Here, we told it to go out of the lab. It knows that it can't move through objects. It sees these objects. Here, we thought it was going to put the chair down back there, but it just brought it with it. So that's an instance of, be careful what you wish for. Here, we asked to put a full oil bottle on the table. It's gathering information by picking up these bottles to see what's in there. And this is some other silly thing.
OK. So what's interesting about this? What's interesting about this is-- OK, what's interesting-- first of all, this was mostly coded in Python by me and Tomas. So professors can still do stuff. But also, there is no machine learning in there anywhere-- not a shred of it, right? So that was just programming. So that was just semi-clever engineers writing a program. And it's not sustainable. The amount of hassling we had to do, and tweaking of the models, and so on was substantial.
So what it is, in my view, is-- what it did do is it gives me kind of faith in the bones of the architectural structure of that thing. So I think it worked out, actually, pretty well. You may not think that that was an awesome robot. But as robots go, it's actually kind of pretty awesome, if you look at all the robot demos in the world.
So the question is, can we keep some of the bones of that thing and learn the models, right? So I would like to get out of the loop again. So how can I get out of the loop again? And the question is, can we learn some of these things? So on thing that I think is interesting to think about is if, let's say, you buy this architecture. This is not gospel or anything. It's just the thing we did. It's interesting to think about all the different kinds of things that you could learn if you keep the bones of this, but say, I'm going to learn models.
And I also think there are two really interestingly different kinds of learning that go on in a system like this. So the first kind, which I call green-- learning about the world. This is, I would say, actual information gathering-- actual acquiring information about the world around you. So a lot of learning goes into object detectors, and perceptual learning, and stuff like that.
Right now, the majority of learning in robotics is learning control policies, how to manipulate stuff, and so on. There's a ton of that. But we might also learn some transition and observation models at a higher level of abstraction. Those are important to be able to do on long-horizon planning.
OK. So that's kind of learning something about how the world works. Another kind of learning, which is just as important, is, I would say-- I don't know. There, I called it learning to reason. Or you could call it analytic learning. It's learning where no new bits of information enter the robot's head, really, but where you rerepresent what you already know so that you can reason more efficiently.
And so I would argue that the learning that you do when you learn to play Go is the blue kind of learning, right? In some sense, once I tell you the rules of Go, if only you were a better computer, you could compute the optimal first move. But you're not a very good computer, and it helps to have that information represented in a different way. And so by some kinds of mental simulation and practicing, you can acquire a different representation that's much more computationally-efficient.
And so we also explore a bunch of different kinds of learning that make the reasoning more effective. So two really important kinds of learning. And so another little hobbyhorse-- which, actually, I'm right now typing a proposal about, and so I don't really have anything to say. But I think the other thing that's really critical in learning for a robot that's going to go out and work in your house, or that I'm going to train in the factory for a while, and then it's going to go work in your house, is that the kind of learning that we do is modular and incremental, right? So that I don't think about training a whole thing end to end, but I can train pieces. And I can accrete. I can learn new things that don't interfere with the things that I knew before. So I'm into modularity.
OK. So good. So let me just tell you-- let's see. Yeah. Let me tell you just about a piece of work-- a piece of concrete learning work-- that we've been doing lately. And that will probably be enough, and then I will take questions. OK. So let's think about learning transition models. So imagine that we have a robot that's already competent-- it already knows how to pick things up and put it down-- and you want to teach it to do a new job, like pour tea or cut up a cucumber, right?
So it would be ridiculous to imagine that the robot would have to reacquire the ability to pick things up and put them down in order to pick up the teapot, right? That's ridiculous. It knows how to pick up the tea pot. It just has to figure out how to pick up the tea pot, and why, and when, and what to do with it once it has it. But it should be able to use what it already knows to do these things. So that's the context we're thinking about. Robot's already pretty competent, but we're going to add a new skill.
And I'm just going to-- OK. So let's think about this. So we'll just look at it in 2D. So imagine that we've already done some basic reinforcement learning, and we've acquired a controller for pouring. Maybe it looks at what's coming out, and it's got some gains and so on. So we have some control program pie which pours.
Now I'm interested in understanding how to use it-- when to call the pouring program. What are the preconditions under which, if I call the pouring program, the stuff will end up where I want it to go? Because I want to be able to take-- the fact that I learned to pour is good. It's useful. But it's not really actually all that useful unless I can integrate it into the rest of the stuff I know how to do, and actually use it to achieve some bigger job.
So I might say, OK. Well, I think that there are these various parameters, like the relative position of the thing I am pouring from and into, and there are shapes and sizes, and how I'm holding the object. All these things, somehow, are relevant to the question of whether, if I call the pouring program, the stuff will go where I want it to.
So the way we think about this-- and this is kind of on the engineering side. It makes some people just not willing to listen further, but whatever. We say, OK. Well, the way I'm going to think about that is I'm going to say, for right now, the engineer says, I think these are the variables that matter. These are the aspects of this situation, which is-- I also want to say that this is a description of the pouring-- the effects of the pouring operator-- which is lifted, right?
So the idea of lifting is, it's not about this cup and that cup. It's about a source and a target in general. And it's articulated in terms of their properties, not in terms of their identity, right? So it's nicely abstracted [INAUDIBLE] individual. And what I want to learn is a constraint. I want to learn some condition on these variables, these properties of the situation, such that, if that constraint holds, if those variables stand in that relation to one another, and I call my pouring program, then most of the stuff will end up in the target vessel.
So that's my job. Engineer decided what the features were, in this case. And my machine learning job is to figure out the relationship on those features such that this thing is going to work out. And so I am not going to do the details here, but we think of this as a Gaussian-- OK, also, experience on a robot is expensive, so we really want to be careful about how we do data gathering and so on. So we're going to treat this as a Gaussian process regression problem.
We take tuples of these inputs, which are basically attempts to pour in various situations. We score them. And we want to learn this mapping. Let me say something about why I want to learn this mapping. I don't want to learn one way of pouring, right? You might say, oh, I just want to learn, how should I pour?
But the fact is, well, first of all, some days, I'm going to have a big bottle to pour from and a small thing to pour into, and other days, I'm going to have different types of vessels. So I have to learn something about this bigger space. But also, it might be that a way my standard, most favorite way of pouring with my right hand along this axis is occluded, for some reason. I can't pour it this way. I'm serving people, and the person is in the way, and I have to do the back-handed wine waiter pour, right?
So you'd like to know a whole space of ways of doing this job so that, when you're planning in a bigger context where there are more constraints, you're not just totally lost when your most favorite way of doing it is no longer feasible. So we want to learn a kind of whole space of good ways of pouring.
OK. So we do this using-- I'm going to play this like a movie now. We do a bunch of GP stuff. The story of this is that Tomas and I did a version of this, and it wasn't very good. And then we engaged the students, and they did a better version. So that was good. And I like to play these movies.
So then we gather data on the actual robot for pouring and scooping chickpeas. Now, every single time Ze and Kaitlyn ran around-- [INAUDIBLE] do this part-- they ran around and picked up the chickpeas and the little people and stuff, right? So experience is expensive.
OK. So we get experience of these things. We tried different bowls and scoops and stuff. And we can acquire-- we can, in the end, acquire a representation of that constraint. When is it good for pouring? And then-- and this is slightly dissociated, because this is work that's going on right now. So in this case, the robot has learned something about pushing and something about pouring, but no scooping. And we've given it a bunch of different goals.
We can put these objects on the table pretty much any way we want. And in each case, we've told it to do something. We give it different goals. Put the stuff in this bowl or that one. You'll see some crazier arrangements in a minute. In this case, that tray is supposed to be, I don't know, a serving tray or something. There, it just reasoned, by geometry, that it had to move the green thing out of the way, because it couldn't pick the object up the way it wanted to. It found a good way of pouring.
Here's another one. We told it that, now, it had to serve the stuff by putting it on top of there. It picks the thing up and pours it. This is the one where-- no, not too bad. We'll watch one or two more of these. The thing is, this is very general purpose. We never told it steps to take. It's just doing means ends reasoning about how to get the stuff in the cup. But it learned constraints.
Oh, yeah. Wasn't that good? There. That's a good one, right? It had to slide the bowl over with one hand so that it was in its workspace, conveniently, so that it could pour with the other hand. It just did that. Again, we never told it that that's a thing. It just knew that, in order to get the pouring to work out, it needed to have the bowl close to it. OK, so this is pretty good. It's pretty general. I don't know. We're kind of happy.
So OK. So there is something that was slightly distasteful about that, which is that we had to say, what were the important features? So another thing we've been working on is trying to learn ways of figuring out what the important features are when you want to solve a problem like this. And so what objects and properties are relevant? And I'll talk about this briefly, because it will make some old people in the room maybe happy.
But so the way we were thinking about it in this work-- so the question is, let's say I want to push an object, or pour, or something like that, and I want to learn a description that's generic, right? It's not about this particular scene in front of me or these particular objects, but it's pretty generic.
And I want to say, well, if I want to pour from this object into another object in this particular world that I'm in right at this moment, how do I figure out which objects in the scene are important for making the prediction of what will happen, right? And I have to describe those objects in some way that's kind of generic.
And the way we're going to describe them is actually in terms of how they relate to the objects that we already know we need to operate on. So there's this notion of a deictic reference. [INAUDIBLE] this is pointing. In natural language, we say this thing or that place. Those are kind of deictic references, because they define things relative to the speaker or to some other objects that we already kind of understand.
So the way that we apply that in these geometric scenes is to talk about-- and in this work, we define some specific relations, like above, or below, or on top of, nearby, and so on. And these relations-- these things, when you apply them to an object, they denote another object or set of objects. So it's a way of taking one object that you know about, like the thing you're trying to push or pour, and using it to recruit some other objects in your scene.
And in this scene, it might be one thing. In another scene, it might be 10 things. But it's a way of naming objects relative to the object that you know about already. And so if you do that, you can write these kinds of operator descriptions that might say, well, if I want to push some object 01, then I'm interested in the object that's on top of it, or maybe the object that it's on top of. And maybe the properties of those objects are actually relevant to my ability to make predictions.
OK. So in a scene like this, we might say, oh. Well, if I'm thinking about pushing object A, I could make a graph of the relations among the objects in my scene. And then if I have a list of these deictic references and a graph of the relations, I can use those to decide which particular objects in this particular scene I really mean to talk about.
OK, I'm going to go quickly now, because the details here are not so important. But it does give us a way of learning rules that name other objects relevant to the things that I'm trying to do. And then we learn a neural network that maps some properties of some objects in the state at time t to some properties of some objects at the next state.
And the reason that this is an interesting representation-- and it kind of related to the thing I started with before-- is that, independent of how many objects I have, I have this neural network that has the same input and output size, because it's being applied to the objects I'm operating on and the ones that got captured by those deictic references. So what this means is that we have a strategy for learning both the structure and the parameters of these kinds of lifted, sparse prediction rules.
OK. And now I'm going to play this like a movie for a minute. We're almost done, right? And so we apply this in some scenes, and it works sort of well. And it works sort of better than some other things, because everything does, or I wouldn't tell you. So, right. So in the world, I think we're making great strides on solving small problems, right?
Lots of really interesting papers about machine learning that address this piece, or that piece, or another little piece over there. And I am really interested in this question of, how do we design a whole story so that we can do learning of pieces and parts, and so that we can do the learning over time so that we can build in some things that we understand about the world and learn the things that we don't understand?
So I'm going to end with this, which is-- so I gave a talk at IJCAI in 1997, and this is my conclusions slide. I had graduated from [INAUDIBLE]. And then I recently gave a talk at IJCAI, and I used this as a little bit of a contrast. So this is my conclusion, right? And it's because, when I saw this slide, I thought, man, I could use that slide today.
So there has been major progress in algorithms for supervised an reinforcement learning. Check, right? And so now, can I say, well, really a ton of progress, right? OK, good. But this doesn't directly yield solutions for building autonomous agents. I think that that is absolutely true. Maybe it yields solutions now for building certain kinds of simple things that drive, but it certainly doesn't give us robots that can make tea in your kitchen.
Human insight is needed to complement the strengths of these algorithms. I totally believe that. I think, now, I have a clearer view that, at least, the approach that I'm taking is that-- the form of algorithmic and structural biases. And so-- so, good. So this is work that I've done with a lot of people, including even some here. And what I'm going to do now is show the robot messing up. And thank you for your attention, and offer to answer questions. So thanks.
[APPLAUSE]
Associated Research Module: