Patrick Winston: The Story Understanding Story
Date Posted:
June 9, 2014
Date Recorded:
June 9, 2014
CBMM Speaker(s):
Patrick Winston All Captioned Videos Brains, Minds and Machines Summer Course 2014
Description:
Topics: Brief history of AI and arguments against the possibility of artificial intelligence; emergence of symbolic processing capability through evolution; strong story hypothesis: ability to tell, understand, recombine stories distinguishes human intelligence from that of other primates; understanding the story of MacBeth: how to answer questions about information that is not explicit, such as whether Duncan is dead at the end; use of inference rules, explanation rules, concept patterns; Genesis system for story understanding that can find connections between events, integrate cultural background of reader, answer questions about motives, assess similarity between stories, and interpret stories from different domains such as politics and conflict, e.g. understanding analogies between US-Viet Cong and Arab-Israeli conflicts; social animal hypothesis; directed perception hypothesis
PATRICK WINSTON: I have spent almost a year of my life in San Diego in two-week chunks. And every time I go to San Diego, first thing I do is I head for the zoo. And the first thing I do when I get to the zoo is go look at the orangutans. And I say to myself, wow, if those guys would just pitch back a few degrees, they'd be here instead of me. And I'd be in the San Diego Zoo being watched by things with orange hair.
They're really smart. And after spending a lot of time at the zoo, I've seen a full range of behaviors. And it took me almost 20 years before I took that picture of a orangutan using a tool. Pretty smart.
On the other hand, not smart enough. Not smart enough to be with us. We wonder why that is. And that's the stuff what I'm talking about today. Why is it that we're here instead of the orangutans?
Today, to tell the story, I'm going to start with right now and go back about 50,000 years. I'll conclude that this because we can tell stories and they can't, at least not to the same degree.
I'll talk about what follows from that in terms of what kind of research you have to do. I'll tell you about the state of the art in story understanding systems. And I'll show you that on top of a very simple substrate, you can build quite a lot of stuff.
And then I will conclude by talking about why none of this matters unless it's connected to vision, ultimately. So that's the agenda for the next 45 minutes or so.
And of course, you can't give a talk on AI these days without the obligatory Watson, Siri, Deep Blue, and self-landing airplane slide, which I borrowed from [INAUDIBLE] some time ago and rearranged a little bit. So maybe 500 years from now, when people are talking about artificial intelligence, they will have said it began with these things.
But I rather think the anchor will probably be about 55 years ago, when Marvin Minsky published a paper titled "Steps Toward Artificial Intelligence." It told us what to do. Unfortunately, what he told us to do wasn't quite the right thing.
But it did launch a great deal of activity, including the same year, a program that could do symbolic integration the same way an MIT freshman does symbolic integration. And boy, did it seem like we were on the way to a really rapid progress. When you can make a program do that, surely the rest will be easy.
Turned out the rest wasn't easy. And here we are 55 years later still without a very adequate theory of what goes on inside our skulls and what we think.
But maybe that won't even be the anchor, even though it did launch decades of work that led to the sorts of things I put on the slide. All these sorts of things have lots to do with application. Any artificial intelligence course in the world is a selection from this menu.
But still, that paper by Marvin Minsky may not be considered the beginning. This will be probably considered the beginning. If not, it's the original paper by Turing on computing machinery and intelligence. It's the paper that's known for the introduction of the Turing test.
I think Turing didn't give a damn about the Turing test. But Turing's paper actually contains about three pages on the Turing test. It contains about 10 pages on telling you what a computer is. After all, this was a day when computers weren't much there. It contains additional material on why it is possible to make a intelligent computer. And it concludes with a program of research.
I read this paper 10 times because I assigned it to my class, so each year I reread it. And every time I read it, I became more convinced, far more convinced, that Turing actually didn't care about the test. He was just using that to get the question of what is intelligence out of the way because he really wanted to talk about why computers could be intelligent. This is a philosophy journal. He didn't want to argue about what intelligence is.
I think if Turing had taken Marvin Minsky's recent classes, he would have probably approached it a different way because Minsky has introduced the notion of the suitcase word. The suitcase word, it's a word you can stuff anything into. And he notes that words like intelligence, creativity, emotion, these are not really things. They're collections of things.
So if you say to me now, in light of what Marvin said about these words, you say is Watson smart? Is it intelligent? An easy answer is not to argue it, but to say, sure, it's intelligent. It's intelligent in some ways. Maybe not all the ways a human is. Maybe superior in some ways.
But since intelligence is a word that covers so much ground, it doesn't really pay to argue about whether something's intelligent. It might be better to argue about things on a finer scale. Does it have a model of itself? Can it talk about how it works? Can it have a model of somebody else? These are kinds of questions that derive from this notion of suitcase word.
Well, that's 1950. And at that time, there were a lot of [? positive ?] arguments against the possibility of computers being intelligent. Number five there, disabilities. People would say, well, you can't make a computer that is intelligent because they can't do something. Enjoy strawberries and cream. Compose a symphony like Mozart.
But of course that last one, compose a symphony like Mozart, that's a little dangerous to argue from that perspective because that means everybody except Mozart has never been intelligent. And even Mozart wasn't intelligent because he couldn't compose plays like Shakespeare. So that's one of the kinds of arguments that was popular then.
The Lady Lovelace one, that comes from 1840, when Babbage was building his computer. And even then, Lovelace was said to say computers can only do what they're programmed to do [INAUDIBLE]. It would've been better if she said they can only do what they've been programmed to do and what they discover how to do on their own. But that's not [INAUDIBLE] quite the same [INAUDIBLE] facts.
So this is when people started thinking about whether [INAUDIBLE]. And if we go back a little more, maybe 2,400 years, we get to the point where people started thinking about thinking.
And I'll have to rely on Greek students to tell me if this is approximately true or not. Plato's most famous work was The Republic, except that that's a mistranslation of the Greek word politeia. And the Greek word politeia is untranslatable, but it means something like society or something like that.
And The Republic is actually about what goes on in here. It's a metaphor. It's a metaphor of what goes on in the state. It's a metaphor about what goes on inside your head.
So 2,400 years later, Marvin Minsky published a book called Society of Mind, which he talks about stuff going on in here. It's like [INAUDIBLE] society and [INAUDIBLE] things. So it's really the same title as the Plato used 2,400 years ago and thinking about the same kinds of things.
But I don't like to just go back this far. I like to go back 50,000 years because that's when we started thinking. And it's interesting that we had our modern anatomical form for about 200,000 years, but we only started thinking about 50,000 to 60,000 years ago.
For most of our existence, we were not much different from Neanderthals in what we could do. We made the same crappy stone tools for 10,000 years at a time without improvement. We just didn't seem to amount to much. We didn't build anything that lasted. We didn't paint caves like the ones we eventually started painting at Lascaux.
So what happened? It happened rather suddenly. About 50,000 years ago, according to paleoanthropologists, there was more a discovery of how to use stuff that had already evolved [INAUDIBLE] something that had evolved. Something that came along all of a sudden. The question is, what was it?
Well, no one quite knows. The paleoanthropologist Ian Tattersall, who has written a great deal about the suddenness of this acquisition, says we became symbolic.
Well, what's that? I mean, that-- yeah, OK. So I guess those paintings at Lascaux, [INAUDIBLE] symbolic all right. But what exactly does that really mean?
Well, Noam Chomsky, who's an admirer of Tattersall, says that what it meant is that we could take two concepts and put them together to make a third concept, and we could do that without limit. And everything there is important, especially the without limit part, because maybe those orangutans can do it a little bit, but we can do it a lot. We could put concepts together to make new concepts.
So now fast forward a little bit 50,000 years and talk about what I find to be the most interesting set of experiments in modern psychology ever. This is the work of Liz [INAUDIBLE], except for the rat part. She doesn't do rats. She does people. But the rat part is important as a prerequisite, as a preamble to describing what [INAUDIBLE].
Here's what you do. You take the rat. You stick them in a rectangular room. In each of the corners, there's a little basket or [INAUDIBLE]. And while the rat is watching you, you put some food in one corner [INAUDIBLE]. And then you see what happens after you spin the rat around to disorient them. And the rat, after being spun around disoriented, [INAUDIBLE] to the diagonally opposite corners, as it should.
You can't go about these experiments without [INAUDIBLE] rats with big brains, because they're really smart. They got the geometry right. And you can repeat this with a small child or with a human adult and get the same answer. Are you with me?
Now you paint one wall blue, and you repeat the experiment with a rat. What do you suppose it does? It ignores the blue wall and goes to the diagonal corners with equal probability. So does a small child. It's only a genius like me who gets it right. And somehow [INAUDIBLE] geometry of the room with a blue wall.
But you know what? The last part of the experiment is the part that amazes and fascinates me, and that is this. You can engage a human subject in shadowing behavior. So you read something to the human. And the human says it back to you as you read it as if we were doing simultaneous translation into another language, except it's English to English. It's called verbal shadowing. And you start this process off before the person walks into the room. So they're continuously doing this verbal shadowing through the whole experiment.
So what do you suppose happens? They go to the diagonally opposite corner with equal probability. Even though they've seen the blue wall and will report that they saw the blue wall, they just didn't use that in figuring out what to do.
Oh, I've got food here. That's because at MIT, you can use food for adults too.
So this has a couple of messages for us. One of the things that we humans can do is we can put stuff together. That's what's being symbolic means. You can build symbolic descriptions.
[INAUDIBLE] there are a couple of practical observations here too. Point number one is we only have one language processor and you can jam it. And you can jam it by overloading it with language. And so if you're reading your email right now, what that means is you won't remember a thing about what I said by dinner time.
And the other thing it means is that the more words you have on the slide, the less people will understand you because you're jamming that language processor. And they'll be reading the slide instead of listening to what you say.
We actually did some experiments on this at MIT. We had the student do a tutorial on a programming language to a bunch of volunteers. [INAUDIBLE] freshmen in his fraternity. And what he discovered is at the end of the experiment-- half the material was delivered on the slide and the other half was spoken.
And what he discovered was that they didn't get any of the stuff that was spoken. They only got the stuff that was written on the slide. And in fact, one of them said, I wish you hadn't talked so much. It was distracting. So it's an example of how you can overload the language processor.
So what I think is that the symbolic stuff makes it possible for us to build descriptions. And then you can string them into sequences. And then you get this. See, I know better than to talk while you're reading that.
Yeah. And if you believe that, then the question is, what do you do about it? And what I think he needed to do about it is derived from the methodological principles that were introduced by David Marr [INAUDIBLE] a long ago. And you have to, first of all, understand what it is you're trying to understand. And story understanding is what we're trying to understand, and it's extraordinarily pervasive. You recognize it as present in all of these contexts.
And in fact, you might be surprised to see that last entry there, engineering. But if you listen to my colleague, Gerry Sussman, who's a [INAUDIBLE] electrical engineering talk about this circuit, he's telling a story. He talks about how the signal comes in from the left and it walks right past [INAUDIBLE] invisible to it. And it presents itself at the emitter of the transistor at the base of the transistor, and how there's a [INAUDIBLE] between the base of the and the emitter and so on. And it's just like telling a story going from right to left. So even in engineering, [INAUDIBLE] story understanding.
Well, if you believe all that and you believe all those places where story understanding occurs, then you have to say, well, here are the steps we're going to take to deal with it. We're going to characterize some behavior [INAUDIBLE] finer grain. We're going to figure out what the computational problems are and so on. Do all those kinds of things. And here's some behavior we're trying to understand.
This is a particular example of a story that I had built into me by my father, who liked Shakespeare, when I was a kid. So eventually it got imprinted. And what we're trying to do tell that story understanding is not understand real Shakespeare, but understand this summary of a Shakespearean plot. It's about 100 sentences in all. Simple sentences.
They're understood, incidentally, by Boris's START system, which does its syntactic analysis and produces for us a semantic [INAUDIBLE].
But that's the competence that we're trying to understand is how we can take this story and understand more than what's in the story itself. For example, I can ask you is Duncan dead at the end? And you would say, sure, somebody murdered him. But it's not explicit in the story.
I could ask you a harder question. Why did Macduff kill Macbeth? And you probably would have to kind of trace back through here. It'd take you a few minutes to figure that out.
And then I could say something even more sophisticated, like is there a Pyrrhic victory in this situation? And you might not even know what a Pyrrhic victory is until I remind you that it's something that first seems successful and then turns bad in the end.
So that's kind of behavior we're trying to understand. How it's possible to read a story like this and get all that kind of stuff out of it.
So what are the computational solutions to some of those problems? Well, we [INAUDIBLE] by discovering that we can illustrate a lot of human story understanding capabilities with a very simple computational substrate once we have the [INAUDIBLE] system produces.
First of all, we can make inferences using inference schools of various kinds. We have seven kinds of inference schools now in our system, all of them driven into our system by the need to understand various sorts of stories.
So some of them are extremely simple, like if I harm you, I harm your friends, and others are less certain if you make me angry, I may want to kill you. Fortunately that's not always true. But if I'm searching for an explanation and that's only I can find, then I might believe it.
So at the very bottom we have a variety of rule types that tend to put cause-like connections between the events and the story.
Then in addition to those rules, those commonsense rules that operate very locally, we have concept patterns that operate very globally. So here's a concept pattern for events. I harm you and eventually, you harm me. [INAUDIBLE] some continuous path of connections.
This is really the system thinking about its own think. It uses the lowest level to create causal connections. That makes a graph. And then it thinks about its own thinking by inspecting that graph and reaching conclusions about concepts that it can find in that graph.
Here's an example. This is the so-called Genesis system that we built analyzing the story that you just saw in English. Everything that goes into the system is in English. The stories, the rules, the concept patterns, they're all in English.
And of course, you can't read that. But what you can do is you can see all this stuff in white is stuff that's explicit in the story. All the stuff in yellow has been inferred by some kind of commonsense rule. A few other colors there denote special cases, but that's the general trend. The stuff in white is explicit. The stuff in yellow is deduced.
If you zoom in on that, you can see some stuff. You can see that Macbeth murders Duncan, and that's connected to Duncan becomes dead. So that's how you know that Duncan's dead, and because he's been murdered. You have that inference rule.
That's terribly simple. Here's a slightly more complicated one. This says Macduff is Duncan's friend, and the reason is because they have a common enemy. So it's a kind of the enemy of my enemy is my friend. That was the reason for concluding that.
So those are rules that work. But also, there are these long-distance patterns, these concept patterns, are at work too. And here I've clicked on Pyrrhic victory down here, and suddenly all this green stuff has split up. Those are the elements of the story that contribute to this being a Pyrrhic victory. So let me zoom in on that as well.
So what happens here is that on the very left, Macbeth wants to be king. And then shortly thereafter, he becomes happy because he murdered Duncan, and became king, and all that sort of thing. But eventually through a continuous chain of connections, Macduff harms Macbeth. So it doesn't turn out so hot, yeah.
So there's a Pyrrhic victory there. And that's determined by a search that finds that long-distance connection. Doesn't matter how long it is. It's not a local [INAUDIBLE] finds that long-distance connection.
So with that simple substrate, you can do a lot.
Here's the system working now on two versions. It's working on Macbeth again. By the way, it works on all kinds of stories-- political conflict, law, medicine, doesn't matter. Here it's working on Macbeth again, though, but now it's using a model of an Eastern reader, and separately, a model of a Western reader. And we think differently, you know?
Well, here's the difference. These were differences that were ferreted out by Morris and Peng back in the '90s. They did experiments with high school kids in a suburb of Beijing and in Wisconsin. And what they found was that there was this statistically significant tendency of the Asian students to see violence as a consequence of the situation that you're embedded in, whereas the kids in Wisconsin see it as consequence of you, and [INAUDIBLE] something wrong with you inside. So that's what we modeled this particular experiment.
So on the top is the Asian view. Macduff kills Macbeth right here because he's embedded in this revenge situation. Couldn't help himself. But in the Western view, Macduff kills Macbeth because he's crazy. Insane. So it's, to use the fancy terms, it's situational versus dispositional.
[INAUDIBLE] answer questions, of course. That's easy enough, in this case.
I'm switching to now to a political domain. What happened in this particular case was that Estonia removed the Russian war memorial from the center of Tallinn. And the question is, what happened? And if you happen to be a friend of the Russians, you see it as a teaching the Estonians a lesson when they brought down their national web. And if you're a friend of Estonia, what you see it as is a misguided retaliation.
So here's the sequence. I'll go over that pretty quickly. The Estonians remove a Russian war memorial. The next day, the Estonian national web goes down. Websites are [INAUDIBLE], all kinds of things happen. Everybody knows the Russians did it.
And the question is why did they do it. And it depends on who you're on the side of. One side, you'd say that it's misguided retaliation. On the other side, you'd say that it's as teaching a lesson.
What else can you do? You can even have some of the analysis given by the questions you ask the system. This is, again, from Morris and Peng. This is a story adapted from the Morris and Peng experiment on the high school kids. The story's about a graduate student who murders his professor, so don't get any ideas about [INAUDIBLE]. So it's a grisly story.
But the question is, why did he do it? And at first, the system has no idea why it did it. But then you ask the system a question. [INAUDIBLE] because Americans [INAUDIBLE] individualistic? And at this point, the Asian reader of the story inspects its own beliefs, decides that yes, it does believe that America is individualistic, inserts that fact into the story. And that unleashes a chain of reasoning that allows it to conclude, yep, on reflection, I do believe that. And he did [INAUDIBLE] individualistic-- whereas the Western reader has no such thing in its memory, its belief system, so it can't reach that kind of conclusion.
So here's one that's especially liked because it's inspired by [? Simone's ?] work. Once you have this idea that stories have concepts, then you can begin to judge the similarities between stories on the basis of the concepts they contain, not just the words that they contain. So one story might be similar to another because they both involve revenge, even though neither story contains the word revenge.
So what we're doing there is we're measuring the similarity on the basis of these higher-level concepts. It sounds like intermediate features to me. In fact, that's where we got the idea, by reading [? Simone's ?] paper on intermediate feature, visual recognition. [INAUDIBLE] we made [INAUDIBLE] similarity enough by the low-level stuff because that's everywhere, and not by exact conformity to a particular plot line because that's nowhere. Instead, we look for the intermediate things, like repression, revenge, and so on.
And so up here, what we have is we have 15 conflict situations described and embedded by Genesis, and compared on the basis of the concepts they contain, whereas down here, they're compared on the basis of the words they contain. So the lighter it is, the whiter it is, the more similar things are judged to be. So that means the diagonal is, of course, completely white because stories are more similar to themselves than they are to anything else. And the important thing is that they're different.
So over here, you can see that there's a similarity judgment between the Persian Gulf War and the Afghanistan Civil War. It's pretty bright up there at the concept level, but pretty dark down here on the word level. So stories are similar according to the concepts they contain, not according to the players involved, when you look at it from a kind of intermediate feature point of view.
[INAUDIBLE] is something we do in preparation for doing analogical reasoning. Here is a rendering that I was told about by a political scientist. So I'm not vouching for the validity of any of this. I'm only telling you what a political scientist told me.
So he was talking about why we need intelligence systems that help to doing political analysis. And this is the example he gives, a description of the Tet Offensive in Vietnam. So you can read through there. You can see that yeah, we can see that they're mobilizing. [INAUDIBLE] think they would attack because we knew they were losing some of them.
So now we fast forward a few years to the Arab-Israeli conflict. [INAUDIBLE] of course they promptly they attacked. And so according to one school of political thinking, they attacked in both cases for a political rather than a military motive. But the point is these things do run parallel. And so how can you make a judgment about why these things are running parallel and how do you put them into alignment?
We've actually used, we actually borrowed technology from molecular biology to do this because who were the custodians to the knowledge about how to align sequences? It's the molecular biologists who align protein sequences and DNA sequences.
So we adapted that technology to align story sequences. And we put the Tet Offensive and the Arab-Israeli War correspondence. That enabled us to fill in the gaps in one with the other. And that's how we were able to do that kind of alignment.
Now it's showing how even in the fog of war, the system might be able to spot things. Not [INAUDIBLE] happened, but at least it might draw your attention to things you should be thinking about. We like to think of this sort of system in the future as a kind of [INAUDIBLE] stop that tells you what kinds of things [INAUDIBLE] concerned about. But those systems don't have a crystal ball [INAUDIBLE] might have.
So one system might know stuff that another system doesn't know. So you an think about one system teaching another system the things that it knows. So in this next demonstration, [INAUDIBLE]. In this next demonstration, one system is telling a stupider system, a less intelligent system, things that it thinks it [INAUDIBLE] well. So there's a little zoom in on what it [INAUDIBLE] in miniature.
So the teacher has a model of the student, which suggests that the student is from Mars and doesn't know anything. So at this particular point in telling the story, it has the option of a spoon feeding and just saying, oh, Duncan becomes dead, or it can give a little better of an explanation by saying it becomes dead because it was murdered.
But of course, the receiving system has no idea whether this is just special for Macbeth and Duncan. Maybe other kinds of people don't get dead when you murder them. So it has the option of going into a more elaborate explanation that gives the principle behind the conclusion. So this is a system operating in teaching mode.
Now, my colleagues have seen this sort of stuff until they're almost ready to turn blue in the face, so I thought I'd better include something here that I couldn't have shown you 10 days ago, [INAUDIBLE] never seen. But this, I'm going to hazard a live demo. My computer, of course, is connected to the web, which means it's connected to Boris's parser back up at MIT, which means that we can actually run it.
And so what I'm going to do now is I'm going to run a version of-- I don't know. Well, we might as well use-- I'm [INAUDIBLE] point in my mind. It's going to be either Hansel and Gretel or Macbeth. I'll use Hansel and Gretel, I guess, because it's a little different.
And the question is, what do you think about the-- I don't know if you remember the story, but this guy leaves his children in the woods. His wife persuades them to get rid of them or something like that. He feels bad about it in the end.
So the question I'm addressing is, how do you tell a story and make it look good? How do you tell a story to make it look bad? And the answer is you make a person look good or bad by adjusting the details that you suppress and the details you emphasize and put in boldface so as to draw out the kinds of conclusions you want. So this was the subject of a master's thesis that was just submitted this semester.
So up here. So I demonstrated, demonstration of persuasion, Hansel and Gretel. Now we cross our fingers and hope that it works. It's reading in a way to sort some [INAUDIBLE] and some concept patterns. Now it begins to create the elaboration graph. Finds various concepts in it.
Now it's done. But now it starts working again because we've asked it to retell the story using Boris's generator in a particular way that makes the woodcutter be likeable-- likeable person. So we'll just go in here and see what kinds of stuff it's concluded.
Oh, by he way, in making somebody look likeable, the other thing you do is make everybody else look unlikeable. So you'll find a mixture of those kinds of things in the story as we have here.
Yeah, there's the witch trapping Hansel in a cage. That's because we're trying to make her look bad because we're trying to make the woodcutter look good. Anything that makes the witch look good, like building a candy cottage [INAUDIBLE]. [INAUDIBLE] wanted to help anybody. The woodcutter tells the truth to Hansel and Gretel about the plans. He's being honest now, so now that makes him look good.
The wife locks Hansel and Gretel in a room. It makes her look bad. The woodcutter doesn't help. [INAUDIBLE] all that suppressed because we're trying to make him look likable. The woodcutter becomes relieved. Yeah, that makes him look likeable, so we highlight that. So now we're beginning to be able to tell a story with a certain sensitivity to what kind of impact we want to have on the reader in terms of their opinions about various kinds of things like that.
So that's my demonstration [INAUDIBLE]. I'm glad I passed that part. Let me get back to my slideshow because next thing we'll talk about is a couple other hypotheses. See, I've got two hypotheses so far. One is the inner language hypothesis, and that's the thing that makes us different. So I have the symbolic capability that allows us to build descriptions, and that's our inner language.
Then we have the story hypothesis, which says that's the differentiating characteristic of us humans.
And now I have the social animal hypothesis, which is that it's a good thing that we can talk to each other, or we do talk to each other, and that amplifies everything else. See, the outer language-- [INAUDIBLE] wouldn't be an outer language unless you have somebody to talk to. And we wouldn't have any outer language at all if we didn't have anything to say-- a point that's often lost on the politicians, but nevertheless it's true in most circumstances.
So I think we make ourselves smarter when we talk to other people. We'll even make ourselves smarter wen we talk to ourselves.
I always like to talk about this experiment that was done by friend of mine, [INAUDIBLE], some years ago. She was teaching people how to do physics problems, elementary physics problems. You remember all this sort of thing. You know, force balancing and all that sort of stuff.
And what she did is she trained them up to be able to handle these kinds of problems in an elementary way. And then she gave her students a quiz. And what she found is that the top half did twice as well as the bottom half. And then she looked to see what kind of characteristics the top half had relative to the bottom half.
And by the way, she was asking them to talk out loud as they solved these problems. So now you know what the results are going to be, right? So the worst students said about 10 things to themselves out loud when they were trying to solve the problems. And the good students, the one that did twice as well, said 35 things to themselves.
Some of this stuff was [INAUDIBLE] problem solving, I'm stuck. I think I'm making progress. Other things were physics facts. I better see if I can consider gravity, things like that. But the interesting thing is that the ones who did well talked themselves a lot when they were solving the problems.
Of course, this is only one direction. We can't make too much of this because what we don't know is if you tell the bottom half to talk themselves more, if it actually helps. But the point is that talking themselves seemed to make them smart. It's as if language provides access into knowledge. It actuates indexes, brings stuff in.
Now I bring myself back to the vision world because of course, all of this is just manipulating symbols inside of a computer that doesn't understand anything on the outside world [INAUDIBLE] ultimately is connected to perceptual systems. So being connected to perceptual systems is very much part [INAUDIBLE] agenda. And everyone's given examples going in the other direction, so I would like to do mine-- because I think that our language system, our story system has to make use of our perceptual systems to do what it does.
So I can say to you, John, kiss Mary. Did John touch Mary? Yeah. How do you know? Did somebody tell you that? Did somebody offer it up in one of these cloud deals which is trying to [INAUDIBLE] all human knowledge? Or did you just imagine the kiss and read the answer off of the imagined scene with [INAUDIBLE] perceptual apparatus? Eventually it becomes part of your lexicon as a rule. But originally, I think you imagine it and you see it.
So there's your imagination, blurred out to be an imagination. But that's just what happens, I think.
But now let me give you a couple other examples that involve [INAUDIBLE]. So here's example number one. How many countries in Africa does the equator cross? Anybody know? Anybody sure enough to bet their life on the answer? Probably not.
But now you can know. And you can run that little visual routine, and cross that line, and count up the countries. And what are you doing when you do that? Your visual system is following a recipe. But following a recipe is like-- well, a recipe is a sequence of actions. So it's a kind of special case of a story. So all these things are kind of knitted together.
Here's a more grisly example. I like to make stuff. I have an engineering [INAUDIBLE]. So I was installing a saw a while back with a cabinetmaker who's a friend of mine. He said, by the way, you never wear gloves when you operate this machine. Never wear gloves. You know why? Some of you who do woodworking have been told [INAUDIBLE] do machine shop stuff, but others may not know why you don't want to wear gloves. But let me help you along.
You know what kind of gloves I'm talking about. Those kind of loose cotton gloves? And the question is, what kind of disaster can follow from wearing a loose cotton glove around a spinning saw blade? Now you've got it, right? You can almost feel your hand being drawn out at the blade and getting mutilated.
So that's another example of something you'd know because you can generate that knowledge on the fly [INAUDIBLE] imagination system.
So every once in a while, someone gets out a [INAUDIBLE] calculates how many facts you can know. And you can do that too. You can just say, well, let's see. I can get a fact every three seconds. That means I can get 1,000 an hour. You can multiply that times 10 hours a day because you need some rest. And if you multiply it times 365, the number of facts you get in a year. And you multiply that times 20 so you know what an adult knows.
But that's all actually a fraction of what you can potentially know. What you can potentially know is more derived from stuff like this, stuff you can imagine and conclude from what you already know on the fly as needed.
And now one more example of the intersection of story understanding and vision. You all know about Schrodinger's cat if you're a physicist. But if you're here in our world, you have to know about [INAUDIBLE] cat.
You actually saw this in one of [? Simone's ?] slides. I borrowed it from him. And I tell the story very frequently because a good question is, what's the one-word answer for what the cat is doing? Not a trick question. It's drinking, right? So what's the one-word answer for what I'm doing? I'm also drinking.
And now if I put myself in this posture and salute you for this wonderful summer school that you're attending and your attention that you've paid to my talk, what am I doing? I'm toasting.
Which two look more alike? Well, the two that look more alike are different labeled because they tell a different story. And one is a story about thirst, liquid, satisfying thirst. And the other is about something very different. So ultimately, I don't know. That's why I ask so many questions about [INAUDIBLE] and whatnot. I don't know how much an animal [INAUDIBLE] storytelling, recipe-following capability can know about what's going on in the world.
To take a brief aside, [INAUDIBLE] tutored by the work of Matt Wilson recently because his work shows that even those rats have a pretty good sense of sequence as they move through a maze. It astonishes me.
It's been known through his work for many years that they dream that path as they go through the maze when they're sleeping. But what I learned from that quite recently is when they get to the goal and find some food, then they play the sequence in their brain backwards as if they're trying to remember how they got there the next time they're out [INAUDIBLE].
So this is where I think the Genesis System contributes in this enterprise. We have some hypotheses that are story-focused in terms of understanding what we humans-- what goes on inside of our skulls. How we're attempting to follow sound methodological steps.
We, being engineers, believe you can't really understand it unless you build this or building a system, and that system demonstrates all these kinds of story-understanding capability. And I chose that were extremely carefully-- illustrates, not demonstrates.
We should all follow [? Andreas's ?] example and talk about what our systems can't do. And what our system can do is it can illustrate a lot of these sorts of things. But what it can't do is we can't scale it up to deal with real Macbeth. We can't scale it up to deal with dialogue. You can't scale it up to deal with Dickens.
We looked at Dickens. We looked at David Copperfield. If you look at some literature like that, what you discover is that first of all, there are no causal relations anywhere to speak of because he assumes you already have all your common sense.
Next thing you find is sentences of more than 100 words are frequent. He had [INAUDIBLE] more than 100 words. Very complex stuff, [INAUDIBLE].
So meanwhile, before we can build [INAUDIBLE] for real people that have practical applications, that meanwhile we think we are shedding some light on the nature of human intelligence by looking at what it takes to understand the story. And that is the end of the story.
Questions? Yes.
AUDIENCE: [INAUDIBLE] two parts to the question. First of all, there is a big [INAUDIBLE] between [INAUDIBLE] presentation of [INAUDIBLE] and you need to take the language. So what happens between this gap, and do we need language to reason about vision, for example?
And the second question is could you show us an example of a different language [INAUDIBLE] electrical circuit? So could you inform it from sort of [INAUDIBLE] here, but on the [INAUDIBLE] or things like that?
PATRICK WINSTON: Yeah. Well, that's really interesting questions.
First of all, to answer your first question, which has to do with that blue wall business and do we need language to do that, what we need is we need a capability that seems to be closely connected with language. Namely, it's the ability to take information from two very different modalities-- namely, geometry and color-- and put them together.
So it's interestingly related to Chomsky's remarks about putting concepts together. That's what we seem to do when solving a problem. Put the two concepts together. And it doesn't seem to be the case that rats and small children do that.
Oh, I forgot to tell you, but when do small children become adults? And the answer is about five or six years of age. But what it actually correlates with is the child's-- it is the onset of the child's use of the words left and right and their own descriptions of the world. So that's what seems to [INAUDIBLE].
To summarize that answer, it's the combination of capability, the merged quality that seems to be intimately connected with language that makes that possible. And so it's not [INAUDIBLE] we need one to do the other and the other [? can do ?] the one. They're all [INAUDIBLE] connected together.
And the other question was?
AUDIENCE: [INAUDIBLE] other languages [INAUDIBLE].
PATRICK WINSTON: Such as how do we view these other things as story understanding, like doing math and so on.
Well, we have gut feelings about that. We don't have any demonstrations or illustrations. We know that our story understanding capability can be used to deal with all the kinds of things you always learn by case study. Medicine, business, law-- we've done illustrations of all those kinds of demands.
We believe that the others are a matter of specialities being a special case of story understanding. And then for each domain, you have a different kind of recipe. So we have thoughts about where to go with that, but we haven't proved anything yet.
Yes?
AUDIENCE: You think there's a limit to the number of concepts we can have? And do you think that we've reached that limit? [INAUDIBLE].
PATRICK WINSTON: Do I think there's a limit to the number of concepts we can have? No, because we can make new ones when we need them, or on the fly, as appropriate.
We have a song at MIT. It's a drinking song. And I always tell my students there are an infinite number of verses. What could that possibly mean? Well, it means that if there were a finite number, and someone would make up another one, [INAUDIBLE] infinite.
So that's how we use the same argument here. Our potential for having an another concept is unbounded at any given time, at some finite number.
And you can work out those numbers on the [INAUDIBLE] number too. You can say, well, each concept needs a story that goes with it. How many stories can you accumulate in 20 years [INAUDIBLE] by the numbers? It's not [INAUDIBLE].
AUDIENCE: You don't-- well, I mean, we can have more stories now than we could have had 10,000 years ago, right? Because there's more concepts in our environment that were given, right?
PATRICK WINSTON: I guess I would say that there are probably concepts that they had that we don't. For example, I know how to use a table saw, but it would baffle if they put me in front of a stone and told me to make an ax. When I was an undergraduate, we learned how to do multiplication with [INAUDIBLE]. They don't do that anymore. So concepts come and go.
We have a finite brain so I would say that at any given time, we have a finite number of concepts. But we can generate new ones by blending old ones in interesting ways and by being creative about how we blend them. So we can get new ones by new stories being told to us or by our own creative generation of new stories.
Any others?
AUDIENCE: [INAUDIBLE].
PATRICK WINSTON: Yes.
AUDIENCE: So there's [INAUDIBLE] of a story that's part of [INAUDIBLE]. I was wondering if you can [INAUDIBLE], they're the same thing, like coherence or redundancy [INAUDIBLE]. Is it possible to [INAUDIBLE] what makes a good story and what makes a good concepts?
PATRICK WINSTON: The question is what about some other ways of looking at stories like coherence. And yes, we're starting to look at questions like that.
We've done some work with some students [INAUDIBLE] how to deal with story coherence. And is the story interesting? Is the story surprising? Is the story memorable? Those are all things we think we can put more substance into by the virtue of the fact we have machinery for modeling those kinds of things in a way that builds on top of what we thought.
So coherence might be how long is the chain in the story, how much does it break up into little clusters of independently connected stuff.
AUDIENCE: Yeah. So there's a similar notion in concepts. Like sometimes they're hard to learn, but they're not supposed to be because they're less coherent or something like that. I'm wondering if [INAUDIBLE] is supposed to be the same.
PATRICK WINSTON: No. That's [INAUDIBLE] completely understand the question, so probably better deal with that offline.
One of the things of course we're interested in is-- has at least a key word in common with what you said is we're interested in how we might have a system discover concepts on its own. Sometimes people complain that we're spoon-feeding everything that the system knows to it because we're providing the rules and we're providing the concepts. But then again, I think that mostly what we know is told to us too, so I'm not embarrassed by that.
Nevertheless, we are interested in, and have begun to build systems that will take an ensemble of stories and look for common patterns in them, which will then become candidates for being concept patterns. They wouldn't have names like revenge, but they would be there [INAUDIBLE] as the equivelent of the concepts that we told about.
One more. Yes.
AUDIENCE: Two-part question, [INAUDIBLE].
PATRICK WINSTON: Just two more, oh, OK.
AUDIENCE: Is there a difference between storytelling and explanation?
PATRICK WINSTON: Is there a difference between storytelling and explanation. Sure.
AUDIENCE: OK.
PATRICK WINSTON: I mean, I don't have a story to tell because I haven't thought about the direction of that. But, you know, I can tell you a story, and I may or may not provide the details that would accommodate to my model of how well you understand what I'm telling you. So an explanation would build into it my understanding of what you understand so that I can be educational as well as merely relating events.
AUDIENCE: So is storytelling a superset of explanation?
PATRICK WINSTON: Let's see. I'm not sure which way the containment goes. I would say that a longer story would be one that provides not only the sequence of events, but a greater amount of explanation according to how the pieces fit together. So it's contained in smaller than I would say explanation is contained storytelling.
AUDIENCE: Do you have any ideas why four-year-olds ask so many why questions?
PATRICK WINSTON: Do I have any idea why four-year-olds ask so many why questions? No, but I think we can model it because when you tell a story, some of the elements of the story are left side, and you don't have explanations. So those would be the questions that you ask why.
So in the Macbeth example, Lady Macbeth persuades Macbeth to want to be king. And there's no explanation or story as we've told, so it would be natural to ask why and get an answer about [INAUDIBLE] behavior or something.
So by asking questions about stuff on the left side, stuff that connects into the story, but doesn't itself have explanation, you can imagine [INAUDIBLE] building up.
Associated Research Thrust: