Conveying Tasks to Computers: How Machine Learning Can Help
Date Posted:
September 26, 2024
Date Recorded:
September 10, 2024
Speaker(s):
Michael Littman, Brown University
All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Abstract: It is immensely empowering to delegate information processing work to machines and have them carry out difficult tasks on our behalf. But programming computers is hard. The traditional approach to this problem is to try to fix people: They should work harder to learn to code. In this talk, I argue that a promising alternative is to meet people partway. Specifically, powerful new approaches to machine learning provide ways to infer intent from disparate signals and could help make it easier for everyone to get computational help with their vexing problems.
Bio: Michael L. Littman, Ph.D. is a Professor of Computer Science at Brown University and Division Director of Information and Intelligent Systems at the National Science Foundation. He studies machine learning and decision-making under uncertainty and has earned multiple awards for his teaching and research. Littman has chaired major conferences in A.I. and machine learning and is a Fellow of both the Association for the Advancement of Artificial Intelligence and the Association for Computing Machinery. He was selected by the American Association for the Advancement of Science as a Leadership Fellow for Public Engagement with Science in Artificial Intelligence, has a popular YouTube channel and appeared in a national TV commercial in 2016. His book, "Code to Joy: Why Everyone Should Learn a Little Programming" was published in October 2023 by MIT Press.
LESLIE KAELBLING: I am so happy to introduce Michael. I made notes this morning. But, I know. So Michael is good at all kinds of things. Research is one of them. And when I first met him, he was like the first person-- so he's done everything before it was cool. So one of the things that he did before it was cool was meta learning.
So it wasn't called that. But he was working on combining evolutionary algorithms and reinforcement learning in 1989. OK, that was way before either of those things was particularly cool. And so he was doing Lamarckian evolution. So that you could say, oh, I'm going to have these critters and they learn some things. And then the ones that learn better, maybe they pass on what they learned. OK, that was one thing before it was cool.
Another thing was he has basically the first paper, I think, on the theory of why AlphaGo works, in the sense that why is it that if you do two-player zero-sum games, you can kind of do all the standard reinforcement learning algorithms and prove that they'll arrive at a minimax optimum. So that was way before it was cool.
Another thing, way before it was cool-- I was slightly involved in this-- was basically the first POMDP algorithm that got published in the AI literature. So it existed in the operations research literature. And so, an awesome algorithm and then more and more things. But so he's got a great talent at seeing what matters and making beautiful formulations that are clean and clear. And other people can read the papers and get what's going on, so super good.
I could go on and on, but I'm not going to. And I don't really know anything about the current work, so I'm looking forward to learning about that. He also has an awesome YouTube channel, which you should look at for entertainment and education. And he can sing and juggle. And at one point, he could jump through his hands. So, Michael.
MICHAEL LITTMAN: Through the leg, yeah. I'm not going to do it now because I'm 58. But, yeah.
LESLIE KAELBLING: He could anyway. OK.
MICHAEL LITTMAN: All right.
LESLIE KAELBLING: Yay!
MICHAEL LITTMAN: [LAUGHS]
[APPLAUSE]
All right, that was an intimidating introduction, if I ever heard one. And just to be clear, Leslie downplayed her contribution. But she was my PhD advisor-- not here. So I like to think of academics as kind of like a family where you have students and they're like your kids. And then they have students and they're like your grandkids. But Leslie, who would be my academic mom, remarried.
So she was at Brown University, and then she moved to MIT. So she's here now. So she has all these new kids from a whole other family that I don't know nearly as well. But, yeah, the POMDP work, she was visionary in sort of deciding, hey, this might actually be a useful way to think about things in artificial intelligence. And it really remains a core model in the community ever since.
So if you don't like this work, it's because it's just not cool yet. That's what I learned from the introduction. So hopefully you will like this work. It is a little bit out of sync with some of the things going on. But I think you'll see some themes that are interesting and exciting. And I'm very excited to be here to share it with you. So thank you for having me here today. So, OK, so just let me get rolling.
So I'm going to start with a really basic sort of idea, that I'm really interested in behavior, in acting, doing things in the world. Sometimes it's purely a digital world, but sometimes it is the actual physical world. So just as an example task that I'm going to use throughout this talk-- taking care of tomato plants, watering tomato plants. Is that the most important thing to do? Well, tomatoes are good for you. And Leslie, I think, actually has a family history of taking good care of tomato plants. Is that right?
LESLIE KAELBLING: Yeah, that's a tomato farm.
MICHAEL LITTMAN: Tomato farm, right? OK, so this was a handpicked, as it were, example. All right, and so a person who knows how to take care of tomato plants maybe has information in his or her head that is going to allow that person to take care of the tomato plants. And maybe there's rules or things rules that say stuff like if the soil is too dry, then maybe you should add water, things like that. Obviously more complicated than that.
But now there's going to be an interaction between being able to look at the world, check the soil moisture, see how the tomato plants are doing, and then act on the world to actually try to make the tomato plants more healthy. So that's great if you know how to take care of tomato plants. I don't know how to take care of tomato plants. So me in the same situation, I just have an empty head. So there's just the tomato plant and there's me and just cluelessness in between.
All right, so human beings have this wonderful thing where we can actually convey tasks to each other. So the person who knows about taking care of tomato plants can actually convey to me in a process that we sometimes call teaching, but it has a bunch of other names as well, to fill in the bubble in my head so that I actually know what to do. And now that person can go away and I can take care of tomato plants myself.
If you don't know how to take care of tomato plants, here's a short video that I found online that I just love. And so I'm going to share this video with you. And you'll know how to take care, at least in a very basic way, of tomato plants.
[VIDEO PLAYBACK]
[ACOUSTIC FOLK MUSIC]
- Hi, I'm Ashleigh. And today, I'm going to show you how to water your tomato plant. The easiest way to determine if it needs water is the finger test. Simply stick your index finger about an inch down into the soil. If you bring it out and it's dry, it needs water.
When watering a tomato plant, you want to make sure you don't water on top of the foliage, if at all possible. That can cause disease. Always water at the base of the plant and make sure it's a nice, thorough watering. This plant will love it, as it has great big roots and they love a nice deep water.
[END PLAYBACK]
MICHAEL LITTMAN: All right, now you have everything you need to know to go and take care of your tomato plants. So the reason that I love this little video-- it's super short-- the reason I love it so much is because she demonstrates a whole bunch of different ways that we as people convey tasks, like behavior, to other people. And I like to put them into a 2 by 2 grid because, I don't know, 2 by 2 grids are cool.
So one of the things she did, and you may remember from the video, she says, she says the words, "Stick your index finger about an inch down into the soil." So she's giving us an explicit instruction. Here are the steps that you need to take. And often when we think about teaching, that's what we think about. It's like, well, I'm just going to tell you what to do. But she used a bunch of other things as well.
So another thing she did is she showed you how to move the leaves away so that you don't water the leaves. She said don't water the leaves, but she didn't tell us how to not water the leaves. But if you watch the video, you can see she puts the flat part of her arm against the tomato plant and pushes it out of the way. Why that instead of just like grabbing it and pulling it away?
Well, she didn't say. But presumably because you'd break the tomato plant if you did that. So she gave us this really nice example that we can then use as an example of the kind of instruction. She didn't have to tell us the words, but she just, through her actions, demonstrated it. But even that's not all she did. She also mentioned that can cause disease.
So don't do this, don't let the leaves get wet because that can cause disease. How is that can cause disease helping you water tomato plants better? It's not actually an instruction you can follow. But it is really useful information because it means over the long run as you're watering the tomato plants, if at some point you see disease on the leaves, you're like, oh, yeah, that's bad, I should avoid that.
And it's probably because I watered the leaves. So I should take more care and do that better in the future, right? So she kind of gave us a lesson that will help us train ourselves to be better at it. So she gave us an explicit incentive, an explicit goal that we can actually execute and we can see whether or not we're succeeding.
And then finally, because again, it's a 2 by 2 grid, I wanted to fill in this last box. It's not as common and it's not a great picture of her. But there's the notion of an example incentive. So she wanted to convey the idea that the plants really love a good, thick watering. And so she showed us with her face.
Now, again, it's not the best screenshot. But in the context of the video, what she was conveying is, like, the plants love this. This is a really good thing, right? And so again, she was conveying a notion of an incentive, but she was doing it by example, by showing how she was responding as if she were the plant. And so all these sort of things were at work in this one short little video.
So I connect this sort of 2 by 2 grid with a saying that I found that you can see on the internet in various places. But it originally came from William Arthur Ward in the '60s. And he said the mediocre teacher tells, the good teacher explains, the superior teacher demonstrates, and the great teacher inspires. And this just strikes me as being extremely judgy.
[LAUGHTER]
Because really good teachers actually do all of these things, right? It's phrased as if you don't want to tell or explain or demonstrate, but of course you want to do those things. But you also want to inspire them. All right, so I think these are really important things to keep in mind as we're conveying information to each other. I think a lot of, again, good teachers do this very naturally. But we can be explicit about it. We can these are the main modes of getting this task information across.
So what I want to do next is say, well, this isn't really a talk about telling people what to do. This is really a talk about telling computers what to do. So what's the connection here? Well, the connection is that before, well, there was me with a big question mark. And now there's a computer with a big question mark. We want the computer to carry out a task on our behalf.
And so we don't call that teaching, typically, we call it programming. But we have a process by which we tell the computer what information that the computer needs to be able to carry out the task on our behalf. Now, we have computers that can actually do the watering task. They can actually sense information about plants and they can actually act in the world by spritzing water or whatnot.
And I thought of this example first because I thought it was sort of simple and it got the idea across. But then in my role at the National Science Foundation, I actually was working with folks who run an AI Institute on farming. And one of the demos they had, they had this little kit that they were giving out, which is all the things you need to make an automated plant waterer.
I'm like, yes, that's exactly what I was thinking. So from a computing standpoint, what we're really doing, the actions are basically, well, for my example, you can do a spray today or you can not do a spray today, you can wait till tomorrow. So what we have to do is convey to the computer, well, when do you take this action? When do you take that action?
So the high level point that I want to make is that the four boxes that you can see in the Ashleigh Lemon video-- by the way, great name for somebody who takes care of plants-- is they actually map onto ideas that we use in programming and machine learning to convey tasks to computers. So the idea of telling or explicit instructions is what we think of as traditional programming. You just write out the rules. You tell the machine what to do.
But we also have a notion of giving explicit incentives that the system can follow. And that's studied in machine learning anyway. in the field of reinforcement learning. We also have the notion of giving examples of the instructions, examples of the behavior that we want. And that's studied in the field of supervised machine learning where you demonstrate.
And then finally again, this is the weakest box, but this notion of inspiring sort of corresponds to using examples to infer basically a reward function, infer what the objective is so the machine can then act on that objective. So what I'm going to do is step through all of these probably. I don't know exactly what the crowd is here. Though, I do note that there is kind of a dead spot in the--
[LAUGHTER]
I do like how they have assigned seating here as long as you're in seat F. Did you notice that? I just couldn't figure out where I'm supposed to sit.
LESLIE KAELBLING: They all say F.
MICHAEL LITTMAN: They all say F.
LESLIE KAELBLING: They all say F. [LAUGHS]
MICHAEL LITTMAN: It's very confusing. All right, so what I'm going to do is step through these boxes to make the connection a little bit more solid and then talk a little bit about how we might be combining these, which I think is really key. All right, so first, the notion of coding or control flow where you can write explicit instructions that the computer can follow, things like repeat the following steps forever-- spray today, delay watering, repeat.
So it'll water the plant every other day. So if that's what you want to do, it's fast, we can express that with code. You can do more sophisticated things where you actually use information about the world. If it's not raining or I haven't just sprayed, then spray today, otherwise delay. So you can imagine much more sophisticated programs.
Coding is, of course, probably the most successful method in history that people have for telling machines what to do. And there's many, many, many different kinds of programming languages. There's more sort of visual block-based kinds of languages and there's more textual kinds of languages. But they're really all getting at the same ideas. These are all sort of fundamentally equivalent at a low level.
OK, so how does programming work as a way for maybe people to convey tasks to machines? And by people, I want to be inclusive of not just folks who are trained to program like computer scientists. So it's really interesting to think about, OK, well, how well do people do at this sort of coding problem?
There's a classic example from the 1980s where students at a not so far away Ivy League institution were asked to-- not the super close one, but a little further away-- they were taking a programming class. And they were most of the way through the programming class. And they were doing great. And they were good students. But they were asked to do this what I think sounds relatively simple task-- write a program to compute an average of a list of numbers. The list is terminated by the special number 99999.
So people were asked, the students were asked to do this. 86% of the beginning programming students could not do it. The next semester class, I think it went down to 66% or something like that of students couldn't do it. But this is really simple. If we can't ask a computer to do that for us after being an Ivy League student with a semester of computer science under our belt, maybe this is hard for people. And so this has been dubbed the rainfall problem.
And it's been very well studied in the computer science education literature. And there's a lot of debates about what it really tells us about how hard this is. But it's clearly not trivial for people to do this. And even if we set that aside for a moment, there are plenty of examples where professional developers have created buggy code, programs that actually do bad things, that crash spacecraft or cars or power grids, the entire internet just due to coding errors. Coding can be really difficult.
And I think it's not just programming. So what we've been doing in my group is we've been giving people tasks and ask them to tell another person to do that task. So a programming-like task, but instead of the difficulty of having to express it as a computer program, which you could argue is a very difficult act anyway-- no, no, no, don't worry about that.
We'll just do what Andrej Karpathy said. The hottest new programming language is English. So then we're done, right? Because we speak English. So then it should be easy, right? Well, what we've been finding is, at least in our preliminary data, that getting one person to explain a task to another person is almost as hard as programming, that we're getting error rates like the 86%, a little bit less. But still people are struggling to convey the idea of, this is what I want you to do to another person.
So just very quickly, this is roughly the experimental design that we've been using. We give a fairly long-winded description of a task. Well, OK, let's just make it easier here. We give a long-winded description of a task. We recruit pairs of people on the internet separately. And then we pair them together. One we dub the sender and we give the instructions for the task to the sender.
Then we give them an evaluation to see if they actually understand what it is that we asked them to do. Then we ask them to convey that to a recipient, to another person. So write out what that other person should do. And then we ask that person to do the task too, and we see whether that person was successful. And so in the cases where the sender actually got it, again, more than half of the recipients don't get it from what they're told.
And we've done a bunch of different variations of this idea. Sometimes we give the instructions to a chatbot, to ChatGPT or Llama or any of the other kind of instruction-tuned language models to see how they do. And sometimes we allow back talk So we actually allow the recipient to ask questions, sort of clarification questions to the sender, which is a little richer than what you typically have when you're doing programming.
So what did we find? One of the things that we found is across the different tasks that we had people do, people who came in-- and we did a little mini-evaluation to see if they knew anything about programming-- people with programming experience or programming knowledge were actually better at telling the other person what to do, at least for these tasks. A lot of the tasks that we built were a little computer sciencey, so it kind of makes sense that they would have experience with that.
But nonetheless, having programming experience was really helpful in talking to other people, presumably because it's about structuring what it is that needs to be said to be clear. And they had more experience with that. The programmers were also better at following our directions. So when we told them the task, and people who had programming experience were better at doing the task that we told them to do.
The opportunity to ask clarifying questions helped the recipient, but it also helped the sender. So when it became a conversation where the sender describes the task, the recipient asks some clarifying questions, then when all that's done and, OK, the task has been conveyed, then we ask both of them to do basically a test to see whether or not they understood the task. The senders were better having been asked for clarifying questions, which I don't see a lot of people nodding.
But if you've ever taught a class, it's remarkable how much you learn trying to explain material to a class. The things that I know best are the things that I've had to teach. Why? Because I started to understand how it could be misunderstood and how I internally was misunderstanding it and didn't realize it. So that I thought was really cute.
And the other thing that was really important, and this kind of brings us back to Ashleigh Lemon a little bit, is that when we ask people to give examples-- we said to the sender, convey this task, and would you mind giving them some examples as well? And those examples tended to help. So people had a clearer idea of what the task was when they were given examples along with the description.
All right, so maybe one way we can make programming better or easier for people is if we allow them to include examples and if we maybe allowed the computer to ask clarifying questions somehow. But another scheme that people have talked about is, well, maybe we could just make the programming language itself simpler? So there's a whole push for a different style of programming called trigger-action programming. If you're familiar with the website IFTTT, they support that. Or Lexa supports that.
Basically, any user open programming mode that apps tend to give you, they tend to have this form where you can choose a trigger. So when this happens, do this thing. So there's a list of triggers and a list of actions. And then you, as the programmer, just have to put them together in the right way. So in the context of watering tomato plants, it might be, if, and then this is one of the choices. If rain is forecast tomorrow, then I can do the delayed watering action.
This is a much simpler. There's not all the complicated control flow. There's no variables or anything. It's just choose a trigger, choose an action. It's more limited than general programming. But it does seem to be easier. And in fact, it does seem to be easier. So we've done some evaluations of people, asked them to create programs in this format and recruited people off the internet. So people without programming experience were generally pretty comfortable building programs of this form.
But even this kind of programming can be hard. So 40% of the users in our study actually erroneously interpret the trigger conjunction. So if you have not just, "Rain is forecast tomorrow," but, "Rain is forecast tomorrow and the temperature is above such and such," they don't know what to do with the "ands" and they make up their own interpretation of them. And well, 40% of the time they're not using it the way that it was intended. And so even super simple, scaled down programming can be really tough for people.
All right, so that's explicit programming. Maybe we shouldn't have people program. Maybe we should let machine learning figure it out just from other cues. So the next thing I want to talk about is this sort of reinforcement learning approach where what we need to tell the computer is what we are trying to accomplish, what we want the computer to accomplish, what actions it has available, and then just let it figure it out. The details are too much for me.
All right, so this has actually been a really successful paradigm in the context of getting computers to play games, which is something that AI scientists have been trying to do since the earliest days of the field. Like, oh, maybe we could do chess or something like that. Maybe we could have it play video games. Video games are nice because there is this sort of clear signal of, did I win? Did I get points?
So you could imagine just telling the computer, well, if you want to maximize points, here's the joystick commands. Now just practice, play a lot, a lot, a lot until you figure out how to do well. And in fact, the folks at Google DeepMind applied these to a bunch of really tough problems and did remarkably well. So they had a program they called Deep Q-Networks that actually played Atari games, which to me, this is what a video game looks like because I was a kid in the '80s.
But video games are more sophisticated now. But it actually did really well in these games, arguably as well as people. And it learned to play all these different games. And, of course, as Leslie mentioned, AlphaGo in the introduction, AlphaGo is a program that plays the game of Go. And the DeepMind folks that built this program used this reinforcement learning paradigm to create the program.
They didn't sit down and say, well, this is what's good to do. You want to control the center of the board. You don't want to allow this sort of thing. That eye is dead. They didn't do any of that. They just said, here's the rules of Go. Here's what it means to win and lose. Come back to me when you're good. And it got really, really good. So that's awesome. So reinforcement learning, I should just end the talk here. Reinforcement learning is awesome, we can all agree.
The notion of the machine learning part of this is because the system has to experience the world to know whether or not it's successful in actually completing its objectives. You can use this idea in other settings where you actually don't need to do that, but to me, this is the interesting case. This is less used in the world, certainly, than programming, but also than supervised learning, which I'll get to next. But there are some examples.
The Nest thermostat was actually created in a way that allows it to figure out how to minimize the deviation between your preferred temperature while also minimizing energy used. And to do that, it actually learns about the thermodynamics of your house. So over time by turning the temperature up and down-- I don't think it does a lot of just crazy experimentation-- but it starts to notice, well, the temperature outside is such and such, which it knows because it's on the internet.
And you set the temperature to this, the temperature inside is this, and I've run the heater for this long-- this is how long it takes the actual temperature to rise to the point that I want it to do and this is how much energy it costs to do it. And it learns all those connections and ultimately figures out, OK, here's how I need to actually run the heater to keep that energy cost low while still matching what the target is on your thermostat. So that's really neat.
YouTube uses a version of this to recommend videos. So basically present videos that are trying to maximize the time that the user remains on the site. We can argue, we'd probably agree, about whether this is a societally beneficial thing to be doing or not. But nonetheless, it's nice because measuring how long the user remains on the site is something they can do. They can experiment. The system can experiment with different videos at different times. And it can learn to optimize this objective.
Well, I most recently learned that NVIDIA designs their AI chips using reinforcement learning where they know what it is they want the chip to do and the scenarios that they want it to do it, but they're not sure exactly how to lay it out or what formats to use and so forth. And a reinforcement learning system kind of searches that space and figures out, OK, this is a design that accomplishes your objective and you can now implement this. So that's super neat.
But it is sort of problematic in various ways. And the YouTube example brings this up. But there's other examples as well. So let me give an example that was popular for a little while. So this is a video game called Coast Runner, where you're the white boat and you're trying to get around the track and beat all the other boats.
So that's what it looks like. And so you can define the objective. In this case, there's a notion of points. You get points for doing various things in the game. You lose points for doing other kinds of things in the game. You can see it actually ran into some of those little markers. And it got a boost, boost, boost. So you can measure how many points that the system had. Let it play the video game for a really, really long time. And then see the amazing policy that results, the amazing behavior that comes out of it.
So this is what the behavior, the learned behavior actually looks like. So there's the white boat. It's going, if you notice, backwards on the track. It seems dumb. It is now leaving the track entirely after running into a number of different boats. It's now in the harbor where it is exploding all the fuel containers and smashing itself against the side repeatedly.
And you can just let it run. It will continue to do this loop that it's in now where it smashes into the dock and then smashes into these things. And you think, oh, my learning algorithm didn't work. And then you analyze it really carefully and you say, no, the learning algorithm worked better than I thought it would. What it has figured out is the only path in the game where you can obtain the points that come from those turbo boosts with minimum time between boosts.
So you can see it's figured it out. These things are disappearing when you run into them and then reappearing just at the moment that it's going to run into them again. So in fact, it gets way more points by doing this than it would by actually playing the game. So that's not what I wanted. But that's what I told it to do. That's what I expressed in the objective function, in the reward function.
So it's worth keeping in mind that if we're trying to convey tasks to, well, people-- this happens with toddlers sometimes, where you give them a rule and then they find a way to thwart the rule, or machines. So it's not trying to be difficult, but it's just trying to do what you told it to do. And maybe you told it to do the wrong thing. So it's a kind of bug that you can get with this paradigm of telling machines what to do. Do we know when I should stop?
LESLIE KAELBLING: When you want to.
MICHAEL LITTMAN: All right. Thank you, everyone. Oh, no, no. But, like, within the hour, right?
LESLIE KAELBLING: Yeah.
MICHAEL LITTMAN: OK, all right. All right, good, good. So we've been trying to play around a little bit with the idea of how does reinforcement learning work as a programming paradigm for naive users, for people who haven't learned to program. And so we've been doing this experiment where we came up with a little mini robot world where there's a make pretend robot. This is all in simulation.
It would be nice to actually graduate to doing it with real robots, but we're still playing with it in simulation. There's a little house with four rooms. There's the user person. And sometimes things like coffee appear. And the people who we recruit to do the study, we give them one of these tasks. We give them a possible programming interface and we give them a task.
So a task could be like coffee delivery. Deliver coffee from the kitchen-- that's the kitchen-- to a person in an unknown room. She's in the porch at the moment. The person's location varies, all right? So the people, the participants in the study, have to construct a program. They have to tell the robot how to behave to accomplish that task.
And we run it in four different paradigms. One is we give them kind of a traditional programming paradigm where the controls are things like move to such and such a room, check to see whether such and such is true. If such and such is true, then do this, loop back-- the standard things that you have in programming. So we call that Seq for sequential.
We also had a trigger-action programming version of this, which was kind of neat. All the same conditions that you can check in the regular programming paradigm, you can actually treat as triggers. And then you can actually have all the actions be actions, and you can actually set this up as a trigger-action program. And then we did a kind of a more reinforcement learning sort of idea where we ask people to specify, using the same triggers, specify what the goal looks like. So use these actions however you want to accomplish this goal.
In general, the reinforcement learning type programs were the shortest for all these different tasks. It was just easier. There was basically one line and you could nail it if you got just the right line. But in fact, what we found is that it performs similarly. People were able to accomplish tasks sort of similarly successfully, though the most successful was more traditional programming actually. And trigger-action programming was close. But some of the tasks that we have, the trigger-action programs were really gnarly.
And I was actually surprised that the participants were able to get all the pieces to come together to actually do these things successfully. So I don't know, I don't know. Yeah, this is in preparation. This is not published. But I like to think of it as a plus for this paradigm. But it's not clear that this is a plus for this paradigm. Because it really did the worst out of the three things that we tested. It's like, yay, it did the worst.
So this is something that we're still kind of playing around with and thinking about. But nonetheless, but the idea that we could get people to do anything. Reinforcement learning is generally considered a fairly advanced topic. The idea that we could just recruit people off the internet and get them to set up problems as reinforcement learning problems was neat. But we still have a ways to go on the interface.
All right, the next section is about demonstration or supervised learning. So basically the idea here is that we're not going to tell the machine anything. We're just going to give it lots of examples of what we want and let it figure out how to produce those in novel scenarios. So really, this is about mimicking the expert. So if this were the tomato plant example, again, we might build a big table that looks like this.
So I watched Ashleigh Lemon one day, it was raining. There hadn't been spray the previous day. The temperature outside was high. The height of the tomato plant was short. And she decided to delay. She didn't water that day. And I just record her for hundreds of days, so maybe five years of data or something like that. and just look at all the different scenarios and then try to more or less interpolate between them to say, hey, we want a rule that does these kinds of things in these scenarios, but also does something similar enough in novel scenarios that the system has never seen.
So the basic idea here, the way that supervised learning works is that there's an objective function defined on behaviors with respect to this data. Maximize the matches to the expert choice, the label, and also keep the rule as simple as you can. Roughly speaking, that's all machine learning. So actually, it's very nice, right?
Because in some sense, Ashleigh Lemon didn't have to do anything special. She just was herself. She did what she had to do to water the tomato plants. And then just the exhaust fumes from her behavior could all be hoovered up and then turned into behavior for the machine. This has been wildly successful. I'm sure everybody here knows that.
There's tons of things that are happening in the real world now that are powered by these kinds of supervised learning methods, things like spam filtering in your mail system. Face and image recognition has gotten way, way better. Speech recognition is now basically a thing that you don't even have to think about so much anymore. Translating between languages has become much, much easier.
This is an example that I personally like because it comes up a lot, this sort of context-sensitive grammar checking where I'm typing in Google Docs and I type a thing like, "I've hired manny talented people for domestic help." But I could also type, "I've hired a talented manny for domestic help." So if you don't know the word manny, it's a word that is meant to mean like a male nanny, like a person who's going to help around the house who happens to be male.
So, here, manny, you could argue that's just not a word. But Google Docs is OK with that word in this context. It actually works in this context. But it's not OK with it here because I meant "many," I just spelled it wrong. And it can tell the difference between these two things. How? Because it's actually paying attention to the context. It's seen so much text that it's actually able to figure out that this is the wrong "manny" for the situation. So I don't know, I particularly like that one.
But this is still a hard way of telling computers what to do. It has been wildly successful, but I don't think we're done yet. Because, well, for one thing, you need an awful lot of data. So if we've got to watch Ashleigh Lemon for five years before we can actually water our plants, maybe I'm not even interested in tomatoes anymore at that point. I'm more into arugula now. I don't know. But there's other issues too, because it's kind of an inefficient way of telling the computer what to do.
So an example from this paper from Ribiero, Singh, and Guestrin, I really like this one. Because what they did is they trained a visual classifier on pictures of husky dogs and wolves. And the goal was tell the husky dogs apart from the wolves, which by the way, is a task that I was not that good at. I would have, yeah, that could be a wolf maybe. But what was really neat is they trained it and it was getting 95% accuracy, but it wasn't getting it for the right reason.
So actually, if you gave the learned classifier sort of data that it hasn't quite seen before, it can mess up. So in particular, we gave it a picture of a husky in snow. We gave it this picture. That's a dog, just to be clear to everyone. Because maybe you don't have a lot of experience with this. That's a husky dog. That's a husky dog. These are nightmares. These are actual wolves. This is why there are so many fairy tales about wolves, because they are horrifying.
And so these guys are in snow and these guys are in a domestic scenario. But if you gave the classifier that they trained this picture, they'd be like, oh, that's a wolf. And then you do an analysis to say, well, why do you think it's a wolf? It's like, yeah, because look, there's all this wolf stuff behind it, you know? That is to say snow. Because most of the pictures of wolves, the wolves were outside. Don't invite-- OK, lesson-- when you go home, don't invite wolves into your house. This is just not a good move in general. Majestic creatures, beautiful, intelligent, deadly-- just don't do it.
But the point is that, what do you do at this point? So you train up the classifier and it mostly gets things right. But when you give it these kind of weird cases of-- well, presumably, it would also have trouble with this. I have trouble with this. You give it a picture of the husky in the snow and it comes back with, well, that's a wolf, you want to just say, no, no, you're paying attention to the wrong thing.
But there's no channel by which we can do that directly. What people do is they go out and get more data. They're like, oh, it gets that wrong. Let's get a bunch of pictures. Let's everybody get your huskies. Take them outside. Just wait a little while, it's going to snow in a few months, I'm sure. Take pictures. That's a very inefficient way of getting across a relatively simple point. Well, it's hard to say. Is it simple or not?
We don't know how to write programs that do that. As much as I want to make fun of machine learning for getting this wrong, nobody knows how to write a program just using their hands that can actually do any better. Actually, generally much, much worse. So this is the best way we have of doing recognition. But it has this weird quirk, which is that it's sometimes hard to tell it what to do when it makes a mistake. It's hard to debug without just going out and collecting more data, which can be expensive.
All right, a quick aside on language models. Does everybody already know? OK, I don't know. All right, who's sick of hearing about language models? OK, I'm going to say a little bit about language models. Because some people were slow putting up their hands. All right, so language models are basically next word predictors. They basically say, given a context of words, what words do I think will come next?
And it predicts a probability distribution over possible next words. The way that these things are trained is using supervised learning. But it's what's sometimes called self-supervised learning, which is to say, you just take text and you make learning problems out of it by saying-- oh, well, here's a part of a phrase, the next word is "machine," but let's not show it that.
Let's make it guess that. We know what the right answer is because the text is what it was. And we can just eventually expose to the machine, oh, this is what you should have predicted in that case. It should have been machine after. We can use supervised, machine is the next thing that should happen.
So any block of text can be turned into a bunch of little puzzles, which is for each word in the text, guess what the next word is, given the preceding words. All right, so that becomes just a regular supervised learning problem, a machine learning problem. It turns out it's a little bit hard to get the right prediction architecture and to train it well. But nonetheless, at the end of the day, we've got a system that can actually do a really good job of predicting next words.
And what's remarkable about this is that given enough training, these large language models as they're called, these predictors of the next word given text, using a particular kind of neural network structure called transformer, can actually write new things in different styles. How? Because given the context, it's supposed to predict the next word, we can then actually just choose from that distribution. If the next word is most likely to be machine or highly likely to be machine, I'll choose that.
And that gives us a new context. We can predict the word that comes after that using the same machinery. And we can just spool out next words. And that's how language models actually write. How do they do it in different styles? Well, again, they've been trained on so much text with so much diversity in it that they can kind of mimic the styles of the texts that they've been trained on.
They can translate from one language to another. They can engage in conversations. It's all just by predicting the next word effectively, given enough context-- answering questions, solving basic problems, even taking direction. Even writing programs where you say, hmm, the other day I saw a program and it computed this thing. And the program went like this. And then you wait for it to fill in the next word. And word by word, it'll fill in the program that actually does that thing.
So, for example, I actually just took ChatGPT and I gave it the rainfall problem that I mentioned earlier. So I said to the chatbot, write a Python program to compute average of a list of numbers terminated by 99999. So if you've ever played with ChatGPT, it gave a little discursive thing about well, it's really important to do this. And then here's the program, blah, and then it spits out the program.
And it's like, son of a gun, this is the program. This is better than what the students were doing in the '80s. It even does the 99999 to stop. There is a bug in this program. I think of it as being a bug. We could argue about whether it's a bug. Can anybody spot it? Does anybody know enough Python to be able to say, I have a problem with this?
AUDIENCE: Continue in an exception?
MICHAEL LITTMAN: That actually works.
AUDIENCE: OK.
MICHAEL LITTMAN: Yeah, it doesn't feel like it should. The thing that I don't like is this.
AUDIENCE: Oh.
MICHAEL LITTMAN: So it takes whatever number I type in and then it turns it into an integer before it makes it part of the average. I didn't tell it to do that. Like if there's 4.5 inches of rainfall, that seems reasonable to me. This program won't allow that. It'll turn it into 4 or 5.
But you don't spot that unless you actually know a lot about Python and you don't get confused by the whole exception handling stuff. So I think it's really interesting. So it can actually write the program. But it's then becomes easier for us to write the program because it wrote the program, but harder for us to spot the error because it's in there someplace. But this whole thing is super conceptually weird, right?
So we wanted, for various reasons, a program that can predict the next word. It's super useful for speech recognition, for optical character recognition. It's an important task. We, as human beings, didn't know how to write that program. So instead, we used machine learning. We used supervised learning to write that program. The program that came out of that can actually itself write programs. You tell it things to do and programs can come out of that.
And that's just-- that's surprising. I would not have believed that that would happen. This is already helping engineers code faster. But I think in the context of this talk where we're allowing a broader set of people to write code, it's just not clear. Since it's for people who are less good at spotting bugs, it's not clear that this is the right mechanism for helping them write code. Because it'll write code for them, but they won't be able to tell whether it's actually what they wanted. So I feel like this is still an open area of study.
All right, lastly, I want to briefly mention the counterfactuals idea. So here, the notion is we're going to give examples, and from those examples, the machine is going to extract incentives. Once it has those incentives, then we can plug it into the reinforcement learning methodology that says, oh, here's what you need to do to make those incentives actually happen. So instead of us having to write potentially the wrong reward function, we're letting the computer write the reward function by watching us behave.
So here's an example from my group. So we've got a robot that's in this kind of grid world space. We're looking at it from above. It's wandering through the space. And it goes and it goes and it stops. So we watch the robot do that. And then we ask the question, assuming that each of the different colors has some point value associated with it and that the robot is trying to maximize its points, what do we know about the points by observing that this is what the robot chose to do?
So it's an example of the behavior that we want. And what we're trying to do is extract its incentives. What is it trying to accomplish by behaving like that? Anybody have any insights looking at this? How do we feel about white?
AUDIENCE: Indifferent.
MICHAEL LITTMAN: Yeah, sort of neutrally, right? Are these going to appear one at a time. Yeah, OK, all right. So maybe we'll give that a neutral number. What about green? Yeah, green looks good, right? So we'll give that a high value. $0.46 obviously, that's what you would choose.
All right, what about blue? Blue seems not so good, right? So we don't know that it was avoiding blue, but it kind of looks like it's avoiding blue, sort of skipping squares that have blue in them. So maybe that's bad. Maybe that's negative a quarter. What about red? Whoops. You're right, negative 6 cents, that's exactly what I was thinking. So red is harder to tell because what do we know, relative to green?
AUDIENCE: It didn't stop at red.
MICHAEL LITTMAN: It didn't stop at red, tomatoey red. If red was really bad, then it wouldn't have bothered going to green. Even if it really wanted to go to green, it wouldn't have bother going to green. So it's probably bad because it seems to just skim over it. But it's probably not as bad as green. So it's some small-ish negative value. All right, what about yellow? I won't advance this time so you have a chance to think about it. Yeah? OK.
AUDIENCE: It looks good.
MICHAEL LITTMAN: Yellow, it's not terrible, right? Because if it was terrible, there was an opportunity to go around it and not hit yellow. But it's not great either because it would rather leave yellow and go to green. So I don't know, maybe that's almost background. All right, yeah, because, again, if it was better than background, if it was like plus $0.01, then it could have actually skimmed through the yellows on the way to the green. And it didn't do that, so it's probably relatively unimportant.
So we have a program that does this. This problem is called inverse reinforcement learning, where you try to go from the behavior to the incentives, whereas reinforcement learning is going from the incentives to the behavior. And this is what our program spat out for this single example. It's kind of neat how much information you can get from just one example by doing these little counterfactual what ifs in your mind. And so it's a really powerful tool.
One of the things I learned relatively recently, this is a paper from just last year, that Google ran this kind of algorithm on all the roads in the world, literally all the roads in the world, 360 million segments of road covering the entire globe, to try to figure out which roads do people like to drive on versus not like to drive on. And so they did this inverse reinforcement learning thing based on the driving data that they have, creepily, about all of us driving and choosing roads. In many cases, if you use their app, they know where you're trying to get to so they know what the goal is.
And they see what you chose to do. And by using the utilities that are actually extracted from this process, they get something like a 25% improvement in terms of matching people's own preferences about what roads to go on. So they can do a much better job of reflecting, given your own choice, where would you go? That's what we're going to suggest to you to make it easy for you. So it's really neat. I hadn't seen anything at that scale before.
All right, so the thought that I want to end on is the notion that all these things are actually really powerful. They're doing really amazing things, these four different mechanisms, this programming, reinforcement learning, supervised learning, and this kind of inverse reinforcement learning all have a role to play. But when you listen to Ashleigh Lemon tell us how to water tomato plants, she doesn't pick any one of those and just use that. She uses the combination.
And I think what the combination is doing is it's providing additional sort of buttressing to make sure that we actually get it. All of these different methodologies that I talked about, they all have their holes, their flaws. But when you combine these different modalities of describing a task, they can actually make up for one another's weaknesses. And I think that's actually really remarkable.
There's other examples that I've seen. I got to speak to a submarine commander once on a project. And I'm like, how do you tell a submarine commander what to do? Like, do you give them a program? Do you give them examples? They say, we do all the things. I tell them-- the Admiral will tell the submarine-- the thing about submarines is once they're doing their job, you can't talk to them anymore.
Because that's the whole point of a submarine is to be stealthy. So the conversation ends. So you have to have something like a program, something that's conveyed, and then it's hands off. So what they do is they write down what all the things that they want. They describe things in terms of examples. They describe what the goal is. They give the instructions. They give all the things.
Then they have a conversation with the submarine commander. And they're like, what would you do in this situation? Sub commander is like, well, because you told me to do that, I would definitely do it this way. I see. What about this? Well, then I would do this other thing. The Admiral is like, eh, OK. Yeah, I get that. And only once that conversation has ended in a way that's satisfactory, do they say, go underwater. And so that really struck me.
Because the reason I was talking to the submarine commander in the first place is because I wanted to argue that machine learning is better than all these other things. And they're like, it's just not that way. We're not going to pick a representation. We're going to give everything we can, broad spectrum. I recently read a dishwasher manual and they did the same thing, everything that they said. They would give instructions. They would give examples. They would say, this is what you're trying to do. They combined all the different methodologies.
But yet, the current approaches that we have for making software are generally mutually exclusive. You're either going to use machine learning or programming, but not in any kind of conjunction with each other. And I think that's a mistake. I think we need to be finding ways of getting tasks out of people's minds that use all these different methodologies in tandem.
There are some examples of this that I really like. I decided I'd lead with examples from MIT just randomly. This is a paper that I just adore where people can draw a diagram. That diagram is then visually recognized and turned into code. And then the code re-renders the diagram much better than I could draw it. And what's nice about it is the code is actually exposed to the end user.
So the end user can be like, oh, well, actually, I want more rectangles than that. And you just change one number in it. And suddenly it makes four rectangles on a stack instead of just three. So it's really neat that you actually now can convey what you want drawn by using your own little example, but then also by being able to edit the code that comes out of that. You don't have to write the code. Maybe that's not what you're good at. But maybe editing the code is not as bad.
There's a lot of projects connected with this way, one way or another, through the project that's run out of MIT on neurosymbolic programming, neurosymbolic systems, where the idea is that it's got learning in it, but it also has these kinds of rules, these instruction-like items. And of course, probabilistic programming, which is very popular here and other places, also has elements of this, where there's the ability to learn from the data. But also the thing that you're working with is instructions. And so that gives you some of the power of what instructions have to bring.
I thought I had another. Yeah, I was to say one example from my lab is that we looked at home automation. And we had people just use their office for a few days and we extracted what their rules were, like when did they turn the lights on, when did they turn the lights off. When did they turn temperature on, when did they turn the temperature off? But instead of just using the learning to say, and this is what the system will do from now on, instead we showed them what the rules were that the system had extracted.
And we gave them an opportunity to choose among which rule is really saying what you wanted to say. They don't have to write the rules, but they can recognize the one that actually matches their intent. The snorkel project is another really cool example where they're actually training up classifiers in a way that would just involve machine learning, but they throw the ability for people to write rules into the mix. And what the rules do is they help take unsupervised data, data that doesn't have a label and that no expert has looked at, and basically apply labels to that based on the rules that the people wrote.
The rules don't have to be perfect. The labels don't have to be perfect. But the two of them together can actually triangulate on what the person really wants. And they're showing benefits from doing either of them alone. All right. And just in case you're wondering, because I was wondering, do language models already do this? And to some extent, they do. But I think, like many things in these language models, it's not a fully developed skill. Sometimes it works, sometimes it falls on its face.
But I found an example that works and I thought that was really neat. So I'm trying to get GPT to produce a particular sequence. And so I give it instructions, just the words-- make a list of numbers that aren't prime. And it comes back with 1, 4, 6, 8, 9, 10. That's not actually what I wanted. Like, it's defensible, but it's not what I actually wanted. So instead, I erased that. Forget I said anything.
Let me just give you examples. So make a list of numbers like 4, 6, 8, dah, dah. So it's like, yeah, I can do that. 4, 6, 8, 10, 12, 14. Hmm, no, those are the evens. That's not what I wanted either. What I really wanted, if you combine these two things, make a list of numbers that aren't prime like 4, 6, 8, dah, dah, dah. It comes back with 4, 6, 8, 9, 10, 12, which is what I actually wanted. So it's actually getting information both from the examples and the instructions and somehow, again, triangulating, picking out what the actual concept needs to look like.
Yeah, I haven't studied this at scale yet. And I have reasons to believe that actually language models will typically not do great at this. But the fact that it can mush all these sources of evidence together is really promising. All right, so that's kind of the story that I wanted to tell you about these four different ways we have of telling machines and people what to do and how we should be combining them to really get the most accurate possible way of conveying things.
And I will plug two books very briefly, both by MIT Press, so MIT Press. My book on Code to Joy, which is the book that convinced me to give this talk. I wrote the book and I'm like, oh, yeah, he's got a good point there. So this is supposed to be fun for everybody to read, but mostly computer scientists are buying it because I think that's who my friends are.
And I also wanted to point out a book that is not out yet, but it's coming out soon. This is by Eugene Charniak, who was Leslie's colleague at Brown, my colleague at Brown. He died about a year ago, just after he had finished writing the book, but before it had gone into publication. And so I worked with MIT Press to get this book out. And it's actually coming out next month.
So I'm super excited about it. It's really cool because it's kind of his own little personal take on this moment that we're in, this machine learning moment that we're in, and what it means for AI as a field up to this point and moving forward. So something to think about if you're interested in these topics. All right, that was it. That's all I got to say. Thank you so, so much.
[APPLAUSE]