Tomer Ullman vs. Laura Schulz debate: Theories, Imagination, and the Generation of New Ideas
Date Posted:
June 6, 2014
Date Recorded:
June 6, 2014
CBMM Speaker(s):
Tomer Ullman ,
Laura Schulz All Captioned Videos Brains, Minds and Machines Summer Course 2014
Description:
Topics: (Tomer Ullman) What good is a theory; problem of search in theory space; stochastic search and relevance to cognitive development
(Laura Schulz) Issues with stochastic search: the search space is infinite and does not make use of knowledge and abilities that children seem to have; proposal for goal-oriented hypothesis generation; what does it mean to think of a new idea?
(Tomer Ullman) Response to the critique of stochastic search
TOMER ULLMAN: OK, so this is going to be more of a debate. We'll mix it up a little bit between me and Laura. And this whole thing is going to be sort of how do we learn practically. How do we come up with new ideas practically?
You've heard a lot about the computational models, but they tend to be stuck around more level one, you could say, like a computational level. And I think a lot of the questions I've heard-- I haven't been here the whole time-- but a bunch of the questions I've heard have basically been more about the actual algorithms and implementations of these things. And I think that's a very relevant question for the whole thrust one thrust, which is development, and how we actually implement this.
So the question's sort of what does it mean in terms of being like a scientist. When we say the child is like a scientist, I find it quite persuasive, but how do we actually implement that in machine code, when we get down to it? So this is going to be a bit of a debate between me and Laura Schultz, where I'm going to defend more the view that what we do when we invent these things, the algorithms that we should look at is something like a stochastic search algorithm.
And I'll explain what I mean by that, [INAUDIBLE] to say that no. And I should say I phrased it down there sort of Ullman versus Schulz, which is a bit pugnacious. But of course, we're on the same team. We agree about a lot of things. It's not like we're engaged in some sort of boxing match.
But having said that, I'll try to live up to that useful fiction as much as I can, not least because some of Laura's research, which she hasn't mentioned, shows that when you have two opposing views, it helps retention and memory and things like in children when you present it as each view belonging to a different person. So I'll attempt to present one view.
Another thing I should mention is that we already had this debate once. This is actually round two. And the last time I used this picture, it was at the developmental conference. And Laura told me, oh, Tomer, you lost from the get-go. You used robots at the developmental conference.
So instead of switching the picture, I decided to switch the audience. And here we are. But again, all this fun, and bells and whistles, and things like that-- we're on the same team. And this isn't actually a boxing match.
Now having said that, there will be a small bell to signify transitions between us when we change. So here's a 65-second prologue setting up what it is that we're talking about. And I think a lot of you sort of kind of know this already, and especially Joel set the stage up really well. But still, what is going to happen, or what is the story so far?
So the story so far is that we have this very interesting notion coming out of development, like Laura and others, which is children as intuitive scientists. And we also have built computational models that have been very successful in putting some formal meat on the bones of that idea. But if we can critique, as I mentioned, these computational models, is that they don't accurately portray the search process that any real reasonable learner, like a child, would have to go through in order to hit upon the right theory.
Even if we all agree that defining it as computational theories and programs is the right thing to do, what about the process to define these right theories? And of course, oftentimes the space of theories is large-- very, very large-- infinitely large. How can we fit an infinite space into the head of a child? We can't possibly be earnestly suggesting that.
And the question becomes, how do children investigate the space? And I've kind of hinted at that when I gave the tutorial on Church, on how Church searches the space of programs. But we'll get into that.
Now, of course, this is an issue not for children. It's an issue for computational models as well. We can no more fit an infinite space into the head of an iCub shown over there than we can into the head of a real cub shown over there.
I mean we have problems as computational scientists doing this well. And the thing that we often resort to is search algorithm for these higher level structured theories of the sort that Josh mentioned. We usually use something called stochastic search.
OK, so the point is, child as intuitive scientists led to some ideas and theories of how we can do these computational theories, but there's a problem of large theory spaces, so we use stochastic search algorithms to search these large theory spaces. But then myself and others have suggested, well, if these are the algorithms that work for the large computational spaces, could this be what children are doing? I mean, we use this as sort of a practical way to search these computational spaces.
But then the point becomes something more. It's not just the way to search the spaces. It could be what children are actually doing. So it becomes a claim about more level two, about the algorithms.
Laura has issues with this claim, as she'll highlight in a second. And this is a picture of Laura looking skeptical. Those of you can't see her can just look at her right here.
OK, so the outline of the discussion is I'm going to give a bit of background. Again, I think you've gotten some good background so far, but I'll still give you a bit more of background, get you sort of up-to-date with thinking of theories like grammars. And then you'll probably understand a bit more about why certain theories are better than others a priori, because some grammars tend to produce shorter theories which are seen as better theories.
So we'll talk a bit about how good these theories are, representing the theory like a grammar, and then stochastic search-- how do we actually find good theory. That is just background. But then I'll reassert the claim that stochastic search is what children might be doing. Then I'll hand it over, with a ding of a bell to Laura, who will raise some issues with that. I'll give a response, and then Laura will summarize, give her last word.
OK, so the discussion is the actually interesting part, I think, but still-- we're still doing some background right now. And of course, by the way, I should mention at the end, if there's time left-- which I hope there will be-- we'll of course open it up. But if there are any questions during this, this is a discussion between us, but it's also, maybe even more so, a discussion with you guys. So feel free, certainly if something is unclear, stop me.
OK, so what good is a theory? I think Josh is already done most of the work for me there. And probably everyone in this room would by now agree that having a theory is a good thing, where by theories we mean some sort of structured data, some sort of structured knowledge that is based on data. But it goes beyond the data. It organizes in some way abstraction, allows you to do sort of prediction and compression, in a way that sort of the old data does not allow for.
So as a running example that some of you may have seen, imagine that you brought a child-- a scientist child-- into a room, and told them listen, here are some blocks. OK, and they all look the same. Play with some blocks, and figure out what is going on here. Collect observations and see what you can do.
And I'm going to tell you secretly that some of these blocks are magnetic. Some of these are metal. And some of these are plastic. They're just blue blocks, but they look the same.
So what's going to happen? Now, the child doesn't know this, of course, don't tell. So she begins to play with the blocks. There's metals, magnets, and plastics.
So she begins collecting observations. And sometimes she notices that well, you know, she puts them together, kind of like Laura's sticky things, and nothing happens. But sometimes, amazingly, they do stick.
You put them together, and they move around together. Well that's interesting. And she does this to all the blocks.
And now what is she to do with this data? How is she to explain it? OK, one way is to just say, we'll just use a big bag of data.
We'll build a giant matrix. There's ten objects, so we'll do 10 by 10 and just put a little mark where things attract. And I can say that A attracts B and B attracts A, C attracts B and B attracts C, and just go on like this. We'll have a 10 by 10 matrix. And that's all there is to it.
Now, of course, this representation is a kind of representation of data, but it's not a theory. It's just the way of tallying and tabulating your whole data. And it's a horrible scheme, because A, in terms of memory, if I add another 10 or 100 objects, you'll end up with a 100 x 100 matrix. And certainly, if I give you even one new block, you won't be able to predict anything from your old data. You've just collected observations. So it's a bad scheme.
What else could you possibly do? Well, you could build what any child and scientist and person would probably do in this case, which is maybe build something like a theory. And let's use some concepts here.
I'm going to use schmagnet and schmetal, which is sort of the philosopher's favorite move to indicate that I'm talking about a concept. We should use it as the sort of this empty, sort of abstract concept. You shouldn't think of schmagnets and schmetals as actual magnets and metals.
Let's just label them as empty concepts for a second. I invented them. OK, I'm going to call these blocks schmagnets and these blocks schmetals.
And I'm going to come up with some rules. Like, for example, if something is a schmagnet and something else is a schmagnet, then both of those have interact. They should attract. Or if something else is a schmagnet and something is a schmetal, then they should interact. And interactions are symmetric. So if x and y interact, then y and x interact.
And using the sort of predicate logic and some sort of extension-- so not only do I need to define this theory, I also need to defined this extension. I need to say what is a schmagnet and what is a schmetal in this domain.
But once I do that, I kind of a generative theory that can tell me what things should attract what. And I can predict some observed data. What have I gained here?
Well A, I can predict my own observed data and feel good about myself. It's also a compressive scheme in the sense that yes, there's some overhead for keeping around these laws and keeping around what is a schmagnet and what is a schmetal. But these are vectors.
It's going to be like a 10 by 1 vector. And if I add another 100 objects, it's going to be 100 by 1 vector. It's not going to be this huge matrix. So I've gained something in terms of compression. Is that idea clear?
So I gained some compression. And also, I can predict new stuff. So if you show me something else, and you say, look, this thing attracts a schmagnet. And I say, oh, OK, I can already tell you what it's going to do with all the plastic that I've seen, and all the schmagnets that I've seen, and all these other things you could potentially show me, this infinite number of things.
So theories are good. And you might even say, OK, that's a good theory. And you might say, well, we know from the research of distinguished people like someone sitting right over there, that it's quite possible children learn something like these theories.
And the problem in computational terms, and more level one, is just to find the right theory. It's "simply"-- notice the quotation marks-- "simply" a matter of inference. Define the space of all possible theories, good and bad, see your data, use Bayes' rule, and churn out the right theory that you want, that best explains the data. Where by best, we mean that it best accounts for the prior and it best accounts for the likely-- best explains the data.
Now we're still after the question, how do we actually get this prior, and how do we represent the theories, beyond how do we find the theories, which I'll get to in a second. And popular approach with people like Steve Piantadosi, and Noah Goodman, and other people in our lab, has been to use probabilistic grammars.
Now a lot of what I'm going to say is going to transfer just as easily to probabilistic programs, which Josh has been talking about-- the child's [INAUDIBLE] and things like that. But the example that we were using for stochastic search was the grammar. So stay with me.
This, for example, is a grammar that can generate logical predicates of the sort that you've seen before. When you run this grammar forward, how many of you have heard of-- I'm sorry, I haven't been here, so maybe you've heard of this and it's obvious to you-- but things like grammar trees, like Chomsky trees for languages and things like that. How many of you have heard of that? Raise your hands.
OK, good, so this is kind of like that, but for theories. But instead of having something like a sentence which goes into a verb phrase and a noun phrase, and then you can make the noun phrase into a verb phrase and the noun phrase you can make into a verb phrase and things like that. And you end up ultimately churning down to this tree and this grammar unit with a sentence like "The clown fish ascended stairs darkly."
You end up with a theory, which might be a good theory or a bad theory. But you start with something like a sentence. And you go into a law, and maybe something that ends up with other laws. But that one ended up stuck.
OK, but let's keep going with this law. So this law will generate a left-hand predicate and a right-hand predicate, and maybe something like Add, which Add means maybe either stop or maybe add a right-hand predicate. You don't need to worry about the details of this implementation.
The take-home message, the high-level point is that we can define the grammar which, when you run it through-- kind of like producing language, which can produce an infinite number of sentences-- this can produce an infinite number of theories. In this particular sense, theories mean sort of logical predicates arranged into rules. OK, so we run through the law.
We get a left-hand predicate, a right-hand predicate and add. The left-hand predicate ended up being interact, which is a terminal. The right-hand predicate ended up being interact by x, and this one ended up being stop.
So now we have a rule. This [INAUDIBLE] rule says that if x and y interact and y and x interact, great-- we have symmetry. And we could have generated a bunch of laws. And we ended up generating a theory which is made up of three laws. That's good.
So this particular grammar that I showed before is quite simple. What it ended up generating is the sort of thing called horn clauses, which seems very simple. But despite the simplicity, it can be used to capture a lot of interesting and very different theories, at least in simplified form. Like simplified magnetism over there, and simplified taxonomy, and maybe kinship, and sort of this lightweight version of preference and intuitive psychology.
OK, so again, just to hammer this point home, this thing about finding theories and how do we actually do the ideal level? So the ideal level notion is that we have sort of theory states. You need to imagine the sort of two-dimensional infinite space of theories, where every point here is going to be a particular theory.
OK, I land over here. This means the three laws that I showed you before. I land over here. This is the simplified theory of taxonomy.
I land over here. This is some theory nobody has thought about. And it's terrible, and doesn't predict any data in the world. But it is potentially possible under the grammar that I've described before.
Now, without having seen any data at all, I already have a prior. And some people have asked [INAUDIBLE] priors on theories and things like that. And why is that?
Because when you define a grammar, a probabilistic grammar, which has these sort of nodes that you can churn out. And at some point, you'd stall. But of necessity, shorter things will be better than longer things.
It's true that you could get a longer, and longer, and longer, and longer, and longer, and longer, and longer, and longer, and longer, and longer, and longer sentence, but it's unlikely that you've ever heard that before. And part of the point there is that you could churn it out, and at any point you could stop. So it's much more likely that you'll stop at some point than keep churning out things.
So already you have a simplicity prior. You'll tend to prefer theories that have fewer laws. Within the laws, you prefer things that don't use a ton of predicates.
And in fact, one of the nice things about the particular grammar that we've used, which actually Noah defined, is that it also favors reuse. So you'll tend to stick with the concept you've invented before. Again, this is purely mechanical. You haven't seen any data yet. You'll tend to prefer shorter things.
But then when you see some data-- so that's the prior. Data comes in. The process of learning described the computational level one is simply this-- we have theory space. It's infinite. A priori, we have some sort of bumps and hills, where the higher they are the more likely they are. And even before we see any data, there is sort of a structure to it. The higher things mean more likely.
But now as data comes in, we shift the probability on these theories. And sort of theories that maybe were not that likely a priori because they were more complicated, like quantum mechanics, become more likely. Certainly, the child equivalent of quantum mechanics suddenly becomes more likely, maybe a false belief. You didn't start out like that. But you learn it. And that's what learning is at the computational level.
What's wrong with this story? I mean, if this was all there was to it, we could just go home now, or at least send you guys off to drink whiskey. I understand it's a favorite.
But this can't really be the suggestion of what's going on, right? I just told you this is an infinite space. What are you talking about? You're going to arrange the mass on an infinite space at the same time, in parallel?
And we often come to, when we explain this notion of what we mean by computational models, and we often hit this point where people say, yeah, but what are you saying? That the child has the notion of general relativity and it just hasn't seen the right data yet to put the mass on it? It just needs to see the right data? It doesn't have that concept.
And the point is that they're right, of course. We're not actually suggesting that inside the child's head is the infinite space immediately available at the same time to touch. But then again, in our computational models, we also don't have these infinite spaces, and yet we are able to make these claims about you see more data, you shift the mass. So how do we do that?
Well, it was never meant to be a description of the actual implementation. The actual implementation is a question for algorithms, from our level two. And when we do that, we resort to search. We have to search this space somehow.
And more specifically, a lot of the current methods, like in certain probabilistic programming languages like Venture and Church, they rely on stochastic search. And in particular, MCMC with [INAUDIBLE]. So what do you do in stochastic search? So I've told you a bit about MCMC.
How many people are familiar with it, have actually implemented it, or tried to, or seen someone implement it? Raise your hands. How many haven't?
OK, good. So I'm going to give you the sort of--
AUDIENCE: I have more question. How many people are familiar with simulated annealing? [INAUDIBLE] popular and closely related.
TOMER ULLMAN: Very good. OK, so consider simulated annealing as I talk through this, and I think you won't go far wrong. So the point of stochastic search, and MCMC, and things like that, that kind of give you a very general sense of it, is that we're at some particular point in theory space. Instead of looking at this entire space, we're here.
And this particular point happens to be theory A, some sort of rule, some sort of things. But again, I said we talk about grammars. You could imagine this like the space of all possible programs in a particular point, as a particular bit of code.
And MCMC, it doesn't even matter. This could be an [INAUDIBLE] model or something, for those of you who are familiar with it. The point is, you're in some sort of point in this space.
And now what you do is you propose a tweak. And the way that you propose a tweak, in particular of in grammars, and programs, and things like that, is that you look at this tree that you've built up. It's like the grammar that we had. And you just decide randomly somewhere to cut this tree.
OK, to basically say, imagine that I hadn't made that choice. And let's regrow the tree from there. You might end up looking at the roots and just regrowing everything, or you might end up looking at just the edge of it. And I'll give you a sense of what I mean by that in a second.
But the point is that you propose a new theory by tweaking your code, by regenerating your code from somewhere. You just pick a point at random in your theory, the current theory that you have, and you regenerate there. You know what the new theory.
You say, huh. How much better does this theory do in explaining the data? And how much better is it a priori? Is it simpler and does it explain the data better? And if it's better, then I'll accept it.
And if it's worse, I might still accept it. This is the nice things about MCMC, which is maybe similar to simulated annealing. You are able to travel in the space in a sort of a two step forward, one step back sort of notion, which is, even if something is a little bit worse, you'll still go there, to the degree that it's worse. If it's infinitely worse, you won't go there.
But you're able to hop around in this space. You propose somewhere. Let's look at this thought. Maybe it's worse. I'll still go there. Let's propose this thought. Maybe it's better. I'll go there.
And you're kind of like a hiker in theory land, kind of walking around at random, looking at these different points, and trying to find a good point. Yeah.
AUDIENCE: [INAUDIBLE] close together in this theory space. Are they similar?
TOMER ULLMAN: That's a very, very good point. So in some cases yes. I've done a bit of cheating by presenting this as sort of a two-dimensional space, and maybe getting a sense of it's quite similar to define this theory's near this other theory in some sort of point. The trouble is with theory spaces-- this is a very good question on its own, but it's even better in the context of this, because my next point was going to be theory spaces tend to be horribly complicated.
I'll answer that in a second. But just to say, unlike some things which you might find in some sort of neural networks, which have to do more with convex optimization, like you might find yourself asking why don't you just go to the direction of the highest hill? You're here. Point your feet in the direction of up. Why are you going down?
Now, those of you who are familiar with simulated annealing and things like that, you know that you might get stuck in sort of local optima and things like that. And also, the space is just such that it's not nice and convex. It's horrible and complicated.
And we were talking about parameters versus structure. Once you introduce notions of structures, and theories, and grammars, and codes, things like that, you make one small move, and you might find yourself somewhere very, very, very different. So there tends to be not so much a notion of things that are smoothly connected, or you make a small change in theory space, and you might go somewhere wildly different.
So it's not so clear how to define exactly a similarity metric. And furthermore, in the way everything is connected, because at any given point you could say, screw it. I'll start over. You might just regenerate from the root node.
So in some sense, all the space is sort of connected, in the sense that you get from any point to any other point. You're right that it's unlikely from any particular point. There are point that are more likely than other points.
And I've had some discussions with Steve Piantadosi about trying to actually visualize theory space and say, is there any sort of meaningful notion to say theories that are closer together should look more like one another. And Joel was sort of skipping over that point, because he was saying it's not relevant. And he was kind of right.
But in learning this notion of number that children go through, we're trying to do that. So we can talk, maybe, a bit more about that. But I think the sense that you should get from that is that it's a good question. It's not entirely clear how to define similarity on theories and the spaces are very complicated.
AUDIENCE: How does it relate to genetic algorithms?
TOMER ULLMAN: This is a very good point. I mean, you could see genetic algorithms as another way of searching through a very complicated space. I wouldn't say that it's quite MCMC, but it's very relevant in the sense that people have proposed that this is how--
AUDIENCE: It's another kind of stochastic search.
TOMER ULLMAN: Yeah, it's another kind of stochastic search.
AUDIENCE: This isn't exactly the focus of what you're doing, but I'm glad you raised that, because I think genetic algorithms are the one example of an actual thing known in nature that works something like this, and builds something like [INAUDIBLE]. If there was [INAUDIBLE] physics [INAUDIBLE] psychology, [INAUDIBLE] as developmental psychologists have argued, they're built by evolution. So they're built by some kind of mechanism like this that this unfolds over millions of years of evolution. Here, the question is whether something like that could also work on shorter time scales in the brain to [INAUDIBLE].
TOMER ULLMAN: I think one major difference is sort of, there is no notion of crossover, of blending theories together to make better theories in the sense you usually find in genetic algorithms. I mean, you could have mutations. But people have argued the main force of genetic algorithms is to make different proposals and things.
AUDIENCE: [INAUDIBLE].
TOMER ULLMAN: Yes. Having said that, I should say the general argument of sort of a blind stochastic process, where you randomly generate and find better things, I'm perfectly happy to go along with, also, proposals like genetic algorithms, and I think Laura would be equally unhappy with it for [INAUDIBLE] reasons.
AUDIENCE: It's also because of this notion of crossing over. Because here it seems like maybe the [INAUDIBLE] were good. But then you change everything. But maybe what was wrong with only how you connected the lower branches. And you don't have this notion of you change the upper, you change everything then.
TOMER ULLMAN: Well, here if you change the upper thing, you're more likely to change the lower things. But it sort of depends upon how likely it is to change the stuff that's high up. OK, that's a good point.
AUDIENCE: [INAUDIBLE] not as large, because you're starting as a baby. Your definition of the type of grammar you can use is starting from nothing is very simple. Maybe I would get yes, no, similar, not similar, less, more.
TOMER ULLMAN: So similar, not similar can be surprisingly difficult, right? No, this is a very good point. When we had this discussion back in the developmental conference, the next session was on same-different, just the notion of same-different, which turned out to be surprising difficult, i in asking when kids can get that.
And then you need to say, well, are they getting the notion of same color, same shape? Are they getting an abstract notion of same everything? So you sort of second order logic for all property? Then property x, property y you kind of have second order logic, which is already kind of convoluted.
But it's true. I think the point that you're making here is sort of a very deep and important one, which is about the primitives that kids might have in their grammars. And do they add sort of new primitives when they build up theories? How do they reuse them? Maybe we start from very, very basic primitives, [INAUDIBLE].
AUDIENCE: One example, even in the example you gave, there were many blocks. [INAUDIBLE].
TOMER ULLMAN: Right, yeah, but it's not so much a question of the data. I mean in a way, if you want your grammar to be very general-- able, in some important sense, to learn any interesting theories-- and again, I would point you towards Steve Piantadosi, which has done a bunch of work on trying to find what are the basic primitives that you need in any language to reason about minimally interesting theories? Like you might want And. You might want Or. And you might want some notion of same and different.
AUDIENCE: [INAUDIBLE]. Maybe the grammar by itself is [INAUDIBLE]. Start with very simple logical [INAUDIBLE] grammar itself.
TOMER ULLMAN: Yeah, but the trouble is if you start with the very simple logical options in a grammar, like And and Or, and things like that, which might seem simple, you very quickly actually get an infinite space in some sense, like [INAUDIBLE] incomplete, and things like that. You get And and Or, and you might start with math. And you have all possible circuits that you can-- all possible [INAUDIBLE].
AUDIENCE: [INAUDIBLE] possible as an idea that they all [INAUDIBLE]. It's really possible that one way to view [INAUDIBLE] lambda and then you build everything out of that-- build any computer program out of that.
TOMER ULLMAN: Yeah, so it's not going to solve, I think, one critique that Laura's going to have and things like that, which is it's true. You might start of with small primitives, and that might get you around this issue of searching a sort of a more subscribed notion of the space. Still, the space itself is going to be infinite.
So it's true that this is a question for the algorithmic level. The people that sort of object to the computational level point of view, they would still object to that. The computational level still, in theory, encompasses an infinite number of theories. You still have some [INAUDIBLE].
OK, so I was talking about hill climbing. And you were talking a bit about genetic algorithms [INAUDIBLE]. So some people, by the way, have taken this to suggest, maybe a bit following what you're saying-- not exactly-- but some people have taken this to suggest that the theory spaces here are just hopelessly complicated. And we should stick with theory spaces or primitives that generate theory spaces, or some sort of representation that looks a priori very simple and smooth and convex and something we can search through.
I don't think that's true. That's a different discussion that we could have. I don't have a killer argument against that. But I think that whenever you get something reasonably complicated, reasonably rich, reasonably interesting-- which I think Laura and Josh maybe have done a good job of convincing that children have-- then you're going to get a hard search [INAUDIBLE].
It's going to be hard to search. And rather than focusing on very, very simple problems, we should just accept the fact that learning is hard, and try to find algorithms that do that. So me just give you one sort of more intuitive notion of how this might work, of searching in that grammar that I was talking about before.
This is just, again, sort of saying, what does that searching in theory space look like? So you might start out by looking at these blocks. And you might start out by proposing a law-- doesn't matter-- some law.
We ended up proposing one law. And there's a predicate like interact x-y. And you might propose one law, like if x is a P and y is a P-- some sort of predicate, it doesn't matter what-- then x and y will interact.
And because my prior tends to generate small theories, I've only generated one rule. Fine. I've initialized. I've started in theory space. And I see how well that does on the data-- predicts some of the data, not all the data.
Now what I do is I propose-- I cut and regenerated from that stop terminal for more rules. And I end up regenerating another rule. OK, so I tweaked my theory to include another rule, which I generated at random. If something is a q-- if x is a q and y is a q, then x and y will interact.
Trust me, this is not as good a theory as the previous theory. It's clunkier. It predicts the same data, but it's clunkier and doesn't explain much more, so you might end up rejecting it.
But you might accept it. And if you accepted it, you might now tweak it, not to include a new rule, but to change a predicate. Does everyone see? I changed q, and p, and things like that.
So now I've changed q to p. And again, you'll just have to take my word on it or read the paper that this is actually an improvement. So you've gone from sort of OK to worse to actually much better than where you were at the beginning.
So you might end up doing this, adding little laws, maybe a step I've done will delete a law and end up being better. And you sort of keep moving around, and you even deleted the point where you started off. This is how you move around in theory space.
Now, as I've said before-- and this was sort of the point for the beginning of the discussion-- is that some people, myself included, have suggested that this sort of stochastic search method, which we've been using for just searching through computational spaces, is just a guess of the priors that we want in finding good theories-- has something of the flavor of children's learning, certainly when compared with the idea level. So the discrete stochastic aha jumps-- the fact that it takes time unrelated to data. The computational level that I described, you just have sort of probability mass.
And you get new data, and that changes the mass. You get more data. It changes the mass, the distribution of the probability mass in the space.
There's no notion of time there. It just happens. Stochastic search takes time.
You might see the same data, and mull it over for a while, and search through theory space, and come up with better and better theories, just because you're given more time. So this sort of fallbacks-- you get worse and then you get better-- two steps forward, one step back. And also the fact that sort of on average, if you average a lot of children together, they seem to sort of these nice, smooth transitions from one stage to another stage if the stage makes sense in development.
But that any individual child, any individual run of the stochastic search algorithm seems to go through this at its own pace. Some discover this first, some discover that first. It's sort of these discrete jumps.
This is just showing one run of these stochastic search algorithms. It sort of goes, bing. Oh, I get it. p's are not actually q's.
When you [INAUDIBLE] sort of how well the theory is doing. When you average all these together, you sort of get the smooth, nice curve. So you can think of that sort of this population, that smooth, nice curve-- that each individual child, each individual run of the algorithm has these sort of setbacks and [INAUDIBLE], and like that.
AUDIENCE: [INAUDIBLE] time here is more experiments? How do you measure time?
TOMER ULLMAN: Time here is more, think of it like clock cycles or something like that. It's not more experiments. It's just the notion of I've done my experiments. I've collected my data. Now I just think about it.
AUDIENCE: [INAUDIBLE] time scale. You don't know if it's really fast or really slow. Maybe you can do trillion jumps in five seconds.
TOMER ULLMAN: That's a good point. And I think Laura will get to some of that point over there, like what do you know? How much time does it actually take to search this thing?
The point was just to say time is important. And time is important for children, certainly, and for adults as well. It seems to be the case that we can think about something, see the same data, and still takes us time to get the right theory. It's not so much a matter of seeing more data. It's a matter of just thinking it over.
And stochastic search has something of that flavor. Other algorithms have that flavor, too, I should say. It's not just this. But the computational level doesn't do a good job of explaining that, and it wasn't meant to. As I said, I'll get back to that in a bit. Yeah.
AUDIENCE: So it seems like it might be some use to have some computation after you propose a theory, because something could be not difficult. It could be [INAUDIBLE]. You can kill off a lot of these things before you start.
TOMER ULLMAN: Very good. I'm not answering that specifically, because I think Laura will address that. And I won't here.
Do let's do a mixed summary point of where we got to here, just to make sure everyone's on board. So as I said, just to recap the move that we did-- theories are useful. And that rich, structured theory defines a rich landscape.
There's no getting around that. If you want to have interesting theories, you're going to have an interesting landscape. It's problematic to search an interesting landscape. And one algorithmic solution that people have hit upon is a stochastic search in that rich landscape. And the claim is that that's maybe what children are doing.
LAURA SCHULZ: All right, I really think this is the most exciting and interesting part of the day. Too bad it's on a Friday afternoon, when you guys are probably burned out. But this really is the one that [INAUDIBLE].
I've been involved in [INAUDIBLE] for a long time. [INAUDIBLE] new ideas. It's the hardest problem that there is. [INAUDIBLE] a huge amount of that work really comes down to selection among [INAUDIBLE] else, but actually generating new ideas, actually coming up with them [INAUDIBLE] because half the work is generating new ideas.
There's this idea of stochastic search. And as I was doing this talk, I kept thinking what [INAUDIBLE] elegant exposition of a formal model on attendant experiments and quantitative data, Laura proceeds to wave her hands around. And that is all I'm going to do. I don't know the answer to the question. But I want to wave my hands around, for some of the things that I think are interesting and telling that I hope you all can talk [INAUDIBLE] the world think about more [INAUDIBLE] place.
But part of what I was [INAUDIBLE] is why am I speaking in this archaic language? What's up with this funny, enriched location? And [INAUDIBLE] because of being inspired by actually what I consider [INAUDIBLE].
AUDIENCE: [INAUDIBLE].
AUDIENCE: It's upside down on your [INAUDIBLE].
LAURA SCHULZ: And it's upside down. Like that? OK, so-- [INAUDIBLE].
AUDIENCE: [INAUDIBLE].
LAURA SCHULZ: Still can't hear me? I could yell. This is now up in the air, though. Is this right? Can you hear it back there?
AUDIENCE: Yes.
LAURA SCHULZ: All right, I'll speak more loudly. And I'll try to speak more slowly. OK.
So as I was saying, part of what I was stuck on, is why am I using this really archaic locution in which, blah, blah, blah. And it was because I was reminded of what I think is a seminal chapter on stochastic search, which illustrates many of the problems with in. So those of you who are familiar with AA Milne, this is a chapter from Winnie the Pooh. And it's, "In Which Christopher Robin Leads An Expedition To The North Pole".
And I just want to pull it up for the sake of some illustration here. So in this chapter, Christopher Robin organizes a search for the North Pole. And [INAUDIBLE] algorithm, he has a few problems, which is he knows a lot about searching, but he doesn't actually know what the North Pole is. And he doesn't know where the North Pole is. But he does know a thing about searching. You start where you are, and you gather all your familiars together, and you iterate over processes within a fixed terrain.
And Eeyore was not optimistic about this. Eeyore says, you can call this an expedition if you want, or you can call it gathering Nuts in May. It's kind of all the same to me.
And I think the really important point Eeyore's making here is, that although you are surely searching, and you are surely starting from a point, and a fixed point-- which is going to in some sense determine what you do if you search out and start in different directions from there-- the fact that you're searching for the North Pole, and that it's a particular kind of thing with particular kind of properties, is neither here nor there with respect to the search process. It's not informing how you conduct your search in any meaningful way.
And that said, I actually think that Christopher Robin and colleagues have a certain advantage over Tomer and colleagues here, which is first of all, that the North Pole really is there. It's only 100 acre woods, and they really will find it in this kind of context. And the other thing is it turns out actually, they know a little bit more about the North Pole than they first believed.
And that is a really important point. And it's going to help them a lot. And I'm going to try to introduce that help to these kinds of processes here.
Which I should start out by saying, in case you can't tell, I deeply admire this kind of approach-- the idea of how you might learn new ideas without new data. It's the heart of what we need to understand to understand thinking. So I think it's hugely promising, although I'm going to make a lot of fun of it for the next little bit here, but just in the hope of inspiring you to understand it a little bit.
I gave you one toy example. Now I'm going to give you a totally fictional example. I'm going to give you another toy, but non-fictional example. This is from my daughter when she was four years old.
We were on an airplane. And Adele proceeds to learn something that is not predicted by any of her prior knowledge about airplanes or about telephones, which is that you are required to turn off your cellphone when you get on an airplane. And she immediately says, oh, I know why you have to turn off your cellphone when you get on the airplane.
Now, I know she knows nothing about FAA regulations and their infinite wisdom. So I say, oh, really? Why is that?
And she proposes immediately an explanation. She says, because when the plane takes off, it's too noisy to hear. OK, this answer is wrong. It's not particularly cute or clever.
And nonetheless, I was struck by it. And I was struck by it for a number of reasons. Because although this is a wrong answer-- wrong even about the causal interaction, it is a reasonable wrong answer.
And consider all the number of other things she might have said that are simple, that are logical, that make use of familiar causal or constitutive predicates that are consistent with her prior knowledge that did not come out of her mouth. She did not say because airplanes are made of metal and so are phones, because airplanes fly over the earth and the earth has phones, because airplanes are big and phones are small, because airplanes and phones are both made in Ohio where her grandparents live. There are a lot of facts she could have put together in a whole number of ways.
And somebody pointed out, these are non-starters. They don't even begin to answer the question. And possibly, she generated and rejected all of them. Because of course, none of them are going to better explain what is not predicted in the evidence she saw.
But there are an awful lot of them, even prima facie, that she would have to generate and reject. And again, I just want to prime your intuition for just how large a space is, and how rapidly we generate made-up explanations for things. And this is the kind of puzzle that I want to get at.
And I want to get at some ideas about what could constrain this, so you don't just have to do a stochastic search about everything you know that might be simple, logical, familiar, causal, [INAUDIBLE] logical, and small steps away, and consistent with your grammar of your intuitive theories. Because that's kind of what a stochastic search looks like.
So how do we rapidly convert? And the kinds of explanations that, even if they are wrong, are good wrong answers. It's an interesting feature of human cognition-- that we are capable of generating good wrong answers. And that's what I want to get to here.
Having given you now a fictional example and a non-fictional toy example, I want to suggest this is a real problem. And it's actually, I think, a problem for everyone in this room. Because it kind of-- I sometimes used to wonder, since I don't do computational modeling, why does it take so long?
You stick these things in, you run the simulation. What's taking so long to see? Why do you have to wait overnight? It's a computer What's it busy doing?
And what it's busy doing is proposing a lot of non-starters. That seems to be a lot of what happens. Iterations are spent search in places where they're not even wrong. They're not even wrong. There's just no possibility of finding a solution.
I want to give you an example to actually understand this incredibly elegant, influential, even ground-breaking work. This is from Inip's paper from [INAUDIBLE]. And I want to show you what they did here.
If you really want to solve the Captcha problem, you can do it in a lot of ways-- edge detection and things like this. But what's cool about this program is it's quite flexible. It can identify Captchas and shapes and objects in the road.
But that flexibility is predicated [INAUDIBLE] being very unconstrained. And I want to show you a little bit about what that looks like. So here's the right answer in this case. There's a square over here.
So it's trying to get to this true representation. And it's thinking. You're going to watch it think, a little bit like watching Watson think, as we did a couple days ago.
I want you to watch what it proposes. And the thing is, it's proposing a lot of things over here. And we look at and say, well, that's a white space. There's no possibility that the answer is there.
Ideally, it would constrain its proposals to the lower left-hand quadrant. But of course, it doesn't know from lower left-hand quadrants. That's not part of the representational space.
It can't say all my answers are here. I don't know what shape it is, but at least I should search down here. That information is nowhere present.
It has to search in all kinds of places that sometimes are not even wrong. It's just not going to be a possibility. What's interesting about human learning is that we do do flexible learning of all kinds of search spaces, and yet we seem to do in a constrained way.
So now I want to wave my hands a little bit more, and suggest that it's not just that the search space is very large. That's not the only problem with stochastic search. It's that humans are smarter than that.
There's a lot of information that is prima facie available to us, even explicitly about our problems-- that this isn't making use of. And it would be really nice if we could figure out how to make use of the kind of information that we have, which is to say long before we solve our problems, we know a lot about what the solution has to look like. We know a lot about what that solution has to do, what kind of work it has to do for us. That information is present well before we actually know the answer. And the question is, can that kind of information, a representation of the goals of the solution, the desiderata, the criteria, the abstract form of the solution, help constrain how we think of new ideas?
In the case of Adele, when the pilot says you have to turn off your phone when you go on the airplane, it's not just that she has some random unpredicted fact about airplanes and phones. She has a fact of a particular form. There's an incompatibility between airplanes and phones. Something about airplanes doesn't go together with something about phones.
And if that's the case, then if she could search for hypotheses just dealing with something about airplanes is in tension with something about phones, she may generate answers that are wrong. But they're going to be good wrong answers. They're going to be wrong answers of the form, well, it has to be quiet when you talk on the phone and take-off is noisy, so maybe you have to turn off your phone so it's off because the planes are too noisy.
And this is a kind of representation I believe that actually Christopher Robin and colleagues benefited from. Because they know something about poles. And well, if it's a pole, then it should stick in the ground, because there'd be nowhere else to stick it.
And if you know something about that, then you can succeed more accurately in looking for the north pole. You may not know where it is, but you know something about what it ought to be, what it ought to look like. I'm going to try to spell this idea in a little more detail.
And it is that there's a lot of information in the problem itself, in the kinds of problems we solve and attempt to solve. And in the nature of the problem, in the nature of our goals that we have, there's information that we're not making use of, which is to say I don't think any of our models or algorithms are taking the fact of having a problem as input to how you should search the space for that problem. And that, I think, is an important part about how you might cut down an infinite search space and make use of knowledge that we definitely do have-- information that is available.
Consider for a toy example, even this interesting fact of question words-- we ask questions about the world. We post queries. And whatever that question is about-- and it could have any kind of content-- the fact of asking a question in a certain way constrains what the answer has to look like to a very great degree.
We only have a few question words, right? And if you know your question word is where, your answer is going to be in some sort of form of a map. If you answer is something about when, it is going to be somewhere along a timeline, some sort of two-dimensional space. If your answer is why, maybe it's going to be a causal network or how, a sort of circuit diagram.
So just having a question tells you something about what the answer's going to look like. If I ask you a why question, and you answer with a timeline answer, that's the kind of answer that's not even wrong. You shouldn't do it. You shouldn't be generating from all the space of your prior knowledge.
You should be generating from that space that confronts the representation of the kind of problem you have, the kind of goals that you're setting for yourself-- ideally. Again, I can wave my hands about it. Then I'd have to say how it actually works.
But I do want to suggest the information we know is available to us, because every model that we use uses this kind of information to decide whether to accept or reject a hypothesis once it's generated. Either it explains the data or not. Either it's a good answer to the question or it's not.
So we can use, in some sense, these criteria to say no, that's a bad hypothesis. Throw it out. And the question is, could we use some of the same kind of information to constrain the generation of the hypotheses as well?
And I also want to suggest that it's not just question words. For all the kinds of problems we have, we have a lot of information that acts as a constraint. We want to solve problems in different forms-- as I said, navigation and explanation. The kind of representation you should be looking for differs depending on those kind of epistemic rules.
But you also have social epistemic events. You might want to find a solution for persuading, or for instructing, or for deceiving. Each of these kind of goals, again, constrains what the right form of the solution ought to be. And we have non-epistemic events. And in some sense, this massive proliferation of goals that we have-- we can have innumerable goals, but each goal can only be fulfilled in a very small number of ways.
So we can want to do anything we want. We can want to spend a beautiful Friday afternoon, Woods Hole in the sunshine, sitting in a dark classroom learning about computational models of cognition. That's a quirky goal, but we can have it. And that's fine.
And we could also have the goal to go sailing. And we could have any of these goals. But for any one of our goals or problems, there are only a small set of things that are going to count as solutions.
And when you're dealing with an infinite search space, the fact of having goals that have certain desiderata for their fulfillment, m might be exactly the kind of thing you need to think of new ideas efficiently. It might help constrain how you propose solutions. So although we can take on these innumerable goals, this actually is potentially a helpful constraint.
So the goal here is not to do away with stochastic search. At some point, if you don't know the answer, you're going to have to search around. You're going to have to propose things and just try them. I don't see any better answer to it.
But the question is, can you do in a narrower way? Can you do it, not over the entire space of everything consistent with your prior knowledge and the grammar of your intuitive theories, but can you do it over those things that would count as solutions for the goals you have or the problems you have? And if you could do that, you would cut down a lot of the search.
You would have things that were wrong, but you wouldn't have so many things that weren't even wrong. So the idea is to give Christopher Robin a bit of a compass. I've been talking about this as a little bit as goal-oriented hypothesis generation.
I want to give you another. Because I can only wave my hands, I'm going to wave my hands in lots of different ways. And I want to suggest to you that this is of real impact.
So people who know me know that I am really, really bad at spatial relations. I do not like Roger Shepard's kinds of paths where you have this-- and I don't understand why people would ever spend their time doing puzzles. And so when I have to do a puzzle, I don't actually know what counts as a solution.
I look at the puzzle piece. And as soon as I look away, I lose the representation of what I'm going to be looking at. And so I kind of randomly search for things of kind of that color, all of that.
But if you're good at puzzles, that's not what you do. You do have a very clear abstract representation of what kind of solution. You don't know where it is, but you know how to guide your search.
It's going to be something with this pattern of concavity and complexity. And you're going to be much more efficient at finding it. And that's the kind of idea of knowing how a problem is shaped-- what its form is, what's going to count-- might guide search.
And indeed, we use this word. We say some problems are tractable and some are not. What does that word mean formally? What would it mean to spell out what it means for a problem to be tractable?
What I want to suggest that it means is it means knowing a lot about the form of the answer before you start, so you don't have to do a stochastic search, or you don't have to do it over such a large space. And if we think that we can't, if we think that we don't even know what would count as an answer-- the problem isn't that well posed-- I think we don't bother searching, because we know that's too costly. That's going to be too [INAUDIBLE].
But if we do have a precise enough representation to guide the search, then we have something we can work on. And I think that's what we would like to give our models, to the degree that we can. I also think that this kind of approach explains a lot of quirky facts about human cognition that are true, but we don't have really any good account and they're not much discussed in human psychology.
And that is this-- we have the sense that we're on the right track of something. We go, yeah, yeah, that's a good idea. And we're doing that in advance of the data.
It might not better explain the idea. The idea might still be wrong. But nonetheless, we feel we're making progress.
What does that mean except that we have the kind of answer that's beginning to fit the form of the solution. It's not data driven. And it's not because we're better able to make better predictions.
We can think an idea is a brilliant idea, even when we know it is wrong. We can look back on the history of science and say, well, it's off, but that was just brilliant. You can look at your students and say, well, we know that's not true. [INAUDIBLE] really smart, really thoughtful answer. And we do that kind of thing all the time.
Well, what does it mean for something to be a great idea when it is actually absolutely patently false? Again, one answer is that we are very, very sensitive to ideas that fit the abstract criteria, the desiderata, in the abstract, even though they turn out to be wrong. And we have this ability. So it would be nice to know what they mean, and how we can [INAUDIBLE] them.
So I want to suggest that we can constrain our proposals on two separate dimensions. One is, of course, how well they fit the data. We have all kinds of verification and fact-checking procedures to see if we're better predicting the data. Those are the hypotheses we should accept, and we should reject the ones that don't.
In addition to that, I suggest that maybe we [AUDIO OUT] ability that maybe Stephen Colbert makes fun of, but I think it might be a very important part of human cognition, which is, how well this idea would solve our problems if it were true. If this idea were true, then it would be a great idea. We don't know yet if it's true, but it would be an excellent solution if it were.
And I'm going to gesture at the very end towards imagination, but this, I suggest, is a really big problem in cognitive development, which is our smartest [INAUDIBLE] are spending a lot of time in an imaginary world. I think what does that mean?
And I want to suggest it's a process, maybe, of setting up a lot of problems and solutions that would work if they were true. And that's a very important part about being able to go beyond the data, which is exactly what human [INAUDIBLE] do. And that's a very important part of thinking.
And that's certainly as hand-wavy as I can get. But I really think that there's going to be something important about understanding how problems could be right, the right form of the solution, even if they turn out to be false. And I want to say just a few other things.
This is not just about radical conceptual change or theory change. So I don't know how much you've talked about that. Maybe not at all.
You've talked about cpre knowledge. But the same developmental psychologists who have spoken beautifully about core knowledge and about intuitive theory have also talked very often about theory change and about children's transition from the kind of knowledge they might have in infancy to the kind of knowledge that's really, maybe, even incommensurate with that early knowledge. And those are kind of big leaps in conceptual development.
But this kind of thinking of new ideas doesn't require radical theory revision or radical conceptual change. It's just simply the kind of thing Adele said-- just a new idea-- thoughts she didn't have before, a hypothesis that wasn't present. And I think it's interesting that we can reliably make up new and relevant answers to any kind of ad hoc question.
And they may be trivial, and they may be false. [INAUDIBLE] department we were talking about this is in principal the ability to model a bullshit generator. You can make up an answer. And it may be wrong, but it sounds good. It's truthy.
And that's an interesting ability, because these ideas are genuinely new. You didn't have it until you thought of it. And it's genuinely made up.
You didn't get it from evidence. You didn't get it from testimony. You haven't fact-checked it. You made it up in your mind.
And it answers a question. It's not an object. So again, the only way I think that's actually possible is if you can use the form of the question itself to guide the search.
I have some examples. Well, maybe I'll give you a couple examples. And then I'll offer an experiment, but I might stop in the interest of time to discuss it.
So let me give you some [INAUDIBLE] things that are totally trivial that you can probably do. What's a good name for a new theater company? Stop and think about that for a minute.
I could pose [INAUDIBLE] problem. These are not just causal problems. They're not just deep scientific problems about magnetism.
Ordinary thinking-- how do manufacturers get the stripe on peppermints? I want you to think about these things for awhile, and think about what you think about the answer. I don't know what you're thinking, but I know you're not probably thinking McDonald's is a good name for a theater company, right? Because you already know the criteria for a good name is that it has to be new. It can't be [INAUDIBLE].
You're also probably not thinking and rejecting names of very tragic diseases. That's not going to be a good start. So just there's so many things you might think of, but you probably don't want to have to think of and reject, even in this trivial case.
Stripes on peppermints is a kind of complicated thing. But you know what, you kind of think it's just something that splashes on it, or just submerges it. That just doesn't have any of the right properties.
The stripes are alternating. They're a pattern. You need something that's pattern generating.
And so somehow, before you've polled people to say, is this a good idea? No, this idea is almost surely wrong. You can say that Fresh Ink is probably a better name for a theater company than that unpronounceable thing, and that a pendulum mechanism is probably better for peppermints than splashing, even though, again, I don't know how you get stripes on peppermints, but I would shocked if it were a pendulum.
So that's just, again, to give an intuition for how readily we did this, how strong and robust our intuitions are, and how little we understand. But I don't think you get that by stochastic search. You could. There are a lot of neurons. Maybe they can do anything.
But I think we know. The fact that we actually have available explicit [INAUDIBLE] knowledge about the form of our problems, even when we're young children. We can say something's wrong here. There's tension between these two things, or this thing's going off, or whatever it is suggests that we can say something about what a solution ought to look like.
I was going to walk you through an experiment, which I am not going to do, because I don't need to. I think I've made the point. And I will just end by saying this-- again, I think this is a real problem for cognitive development.
We have no respectable theories of this type of thing. We're working hard enough to understand how we get the world right. We don't have any capacity really to speak of for why we spend so much time not worrying about whether we're getting the world right.
A lot of things that we're doing within our minds we know aren't true. We're making things up. We're daydreaming.
And I think there's an interesting property of these kinds of representations, though, which is for narratives, for stories, for these cultural universals and making things up. What are you doing? You're setting up problems and you're proposing solutions.
They don't have to be true. That's not a criterion of a good story. But good narratives do have constraints.
And those constraints are that they set up problems and they generate solutions that consistent with the kind of problem or kind of goal set up in that narrative. If it were true, it would solve the problem. And that seems to be really, really important for human cognition, because we do it all over the place. And we do it especially in a lot of early child [INAUDIBLE].
So that's, again, my last and most speculative hand-waving [INAUDIBLE]. I think I will turn it over to [INAUDIBLE]. Oh, except questions, yes.
AUDIENCE: [INAUDIBLE]? Are you suggesting you have the space of theories and you basically have some kind of a metric or a cost function in the space?
LAURA SCHULZ: We don't have a space of theories. That's a metaphor we use.
AUDIENCE: That's what [INAUDIBLE] was suggesting, right?
LAURA SCHULZ: Yeah, but of course we have to build that space of theories. We talk about it. I think we use metaphors that are somewhat-- we all know the questions they're demonstrating, but they can mislead us even when we're trying to be conscientious about it. Which is we act like the space of theories that we are searching for. But as [INAUDIBLE] points out, since none of us is sitting here in our minds with an actual answer to exactly how children learn or what a computational model of imagination is, you don't have that in your theory space. And you're not searching for it. You're going to built it. You're going to generate it from something.
AUDIENCE: It's actually all possible solutions. So there is a space. There is a space that contains all [INAUDIBLE].
AUDIENCE: [INAUDIBLE], you say there is a space. You don't necessarily mean there is a space inside your head right now, and every possible point in it is substantiated in your head at this moment. It's implicitly specified by the mechanism [INAUDIBLE].
AUDIENCE: Yeah, so it's a space, and it's dark, and I have [INAUDIBLE], and I can [INAUDIBLE] for me to look further away from. Away from. But the solution--
AUDIENCE: [INAUDIBLE] something on-- that's certainly very much what [INAUDIBLE] is talking about and what I was proposing as a place to start off with. Because basically that, [INAUDIBLE] computational [INAUDIBLE], that's really the only way we know to formalize the problem. And Laura's raising some questions of that.
And then again, [INAUDIBLE] maybe there are ways which, within that kind of metaphor, that kind of model, you can address [INAUDIBLE]. This is the question, right? So [INAUDIBLE] do that.
LAURA SCHULZ: There are other questions out there. Do you want to hold them all, or?
AUDIENCE: Well, go for it.
AUDIENCE: Do you think this restriction of [INAUDIBLE] by [INAUDIBLE] problem what they might be like, can account for strokes of genius which, like a scientist [INAUDIBLE] dreams up things like what is mother? Have we thought about a way? [INAUDIBLE] violates everything that looks reasonable as an answer, and yet it--
LAURA SCHULZ: I think the way you set the problem matters. And I think that in the chapter I wrote about this, I quoted one of our resident geniuses. And you know, and [INAUDIBLE] has some line about what's really important for discovery is how you set up the question, and how you set up the problem.
And what I'm suggesting is that's important because the problem and the information in the problem is critical to constraining [INAUDIBLE]. So I think it's actually a friendly amendment, really having done all this hand-waving. It's a very friendly amendment to the work that Josh and [INAUDIBLE] did.
And I think it's a hard problem, obviously. But I think that the idea is to say, can you use the information in your-- we know about bulbs. We are able to say something specifically about what a represent of a [INAUDIBLE] would be, or of our problem would be.
If there's something in a kind of problem you ask, the way you pose a question in that part of the form that can actually be used to constrain the way you generate it. I think there will be a bunch more. Maybe [INAUDIBLE] it a little bit so then I'll let [INAUDIBLE].
AUDIENCE: [INAUDIBLE] rebuttal and then summarize?
LAURA SCHULZ: Yeah, if I need to, or we can just move right to the discussion.
TOMER ULLMAN: Should I take a mic?
LAURA SCHULZ: [INAUDIBLE].
TOMER ULLMAN: Laura, how quickly-- do you need to go at 4:00? So should I talk [INAUDIBLE]?
LAURA SCHULZ: [INAUDIBLE] see what's going on. I'll try to linger a little bit.
TOMER ULLMAN: OK, because you need to do the summary. If I have to do it, I'll just say bad things.
LAURA SCHULZ: Yeah, I'll definitely try to stay [INAUDIBLE].
TOMER ULLMAN: OK, and I'll try to talk [INAUDIBLE]. OK, so my turn. First, I think if I can summarize crassly some of the points that Laura made-- and she made a lot of good points-- but I'm going to only summarize the ones that I'm going to address.
The general critique about stochastic search is wrong. And I think the first critique, for those of you who can read it, is that A, it doesn't make use or account for some of the abilities that we know that people have. Like when she was saying sort of goal-directed things, and things that children can do, and come up with wrong right answers or right wrong answers and things like that.
And the second part is that stochastic search is sort of not smart enough prima facie. Like even if I was building some sort of search algorithm and I didn't know about humans, there seems something wrong about a stochastic search algorithm, in the sense that it can just propose weird things and crazy things, and we just have to go on rejecting them until something's write. So it doesn't make use of some things that ideal, smart agents should make use of, regardless of what we are doing.
So the way I'm going to address that is by doing a Hannibal-like pincer move, which is I'm going to cave in the center, and attack on two sides. The first part of my coming attack is going to be to disagree with Laura, which is to moderately suggest that perhaps her intuitions-- as she puts it in her hand-waving stuff-- is misguided, and is based on a view of statistic search as, yes, this process that is very slow, and is very sort of non-parallel and sort of does a lot of stupid things-- which is still true. But maybe instead of a stupid and slow process, this is a really stupid and fast process.
And a lot of things that go under the hood and you just don't notice it about them are really dumb. And the only things that you hit upon are things are kind of look smart. And you think, ah, where did that come from? So we'll talk a bit about that.
And the second part of this cunning scheme is going to be to agree with Laura, and say that's she's right. Stochastic search in its vanilla form as I outlined it isn't smart. But we can try to make it smarter.
And I think some people here were suggesting a few things, like making better proposals and perhaps genetic algorithms are more of a way of doing it. And I think that I'm going to describe very, very briefly some new work that's trying to get at that coming out of computational land, which I think in some ways is addressing what Laura's saying, but importantly is going to keep her unhappy in other ways. And in detailing these views, I'm going to highlight some recent research by Owen Lewis. Is he here, by any chance?
Owen Lewis, Steve Piantadosi and [INAUDIBLE]-- so any credit should go to them, and any fault in what I'm describing should go to me. So the first part of the pincer move, which I'm going to go over quickly, is that we can make quick proposals. I mean, yes, stochastic search-- right.
The point's going to be that it's wrong. And the point is that what if we make many, many, many, many hypotheses and we're just not aware of most of them? We know that there's a lot of subconscious processing.
In fact, most of what we do in terms of processing and searching and things like that is probably subconscious. Maybe the things sort of come to light [INAUDIBLE] the system are the result of trillions of searches. We don't really know what it is that learning is, so maybe it's like that.
But this requires the ability to suggest many hypotheses in parallel, rather than having sort of one hiker that's walking around. Either that hiker needs to move at super speed, or we need many, many hikers. Or we need to do something like a genetic algorithm, which also involves multiple proposals at the same time.
But the particular work I want to highlight is by Steve Piantadosi. And this is really cool work that you guys should know about and ask him if you're interested. But I should say he hasn't bet on any one of us.
He's not suggesting this is necessarily what people are doing. But it's sort of I'm suggesting that maybe that's what they're doing. But don't fault him for this.
So his point was just to say, well, maybe we can parallelize all these stochastic search algorithms. Maybe we can run many, many chains in parallel. And again, using that sort of landscape metaphor, instead of having one hiker proposing, saying maybe it's here, maybe it's here, maybe it's here, maybe we can plunk down thousands or millions of search processes at the same time, searching all over the place.
Now, as I said, this requires parallelizing it. And what Steve has done is sort of said, well, in order to do that, we need a lot of computers. And we need to run it in parallel.
So his point is to take advantage of the GPU architecture or the graphical processing unit that you have in your computers, rather than the CPU. I kind of hesitate to say these things in front of an audience that I'm sure some people here know a lot more than I do. But for those of you who know maybe just a tiny bit less, I'll say that the point is that the CPU can do many things very generally. And the GPU is supposed to be about doing the same dumb thing across many, many sort of small processes that are doing the same dumb thing in parallel, like the sort of computations that you might for graphics.
And Steve has figured out a way of doing the sort of stochastic search, what I described as the sort of cutting the tree. You have the hypothesis tree, and you cut it at some point and you regenerate it. That move-- to do it in parallel. So making it into something that's stupid and can be run on a GPU.
And suddenly, you can order for a few $100 the sort of ability to plunk down millions and millions of suggestions. This is just if you don't want to talk about GPU's and CPU's-- it's Steve plus computers is equal to awesome. But the point is that, for example, you can look at this data from Galileo, which is trying to set up a parabolic curve for how things move when they're falling down a steep hill, and propose millions of ways that that should be done in seconds. Yeah.
AUDIENCE: [INAUDIBLE] with the login numbers to get [INAUDIBLE]?
TOMER ULLMAN: Yes, at the end-- well, the point is that at the end, you can propose sort of a summary of these things. That can be a short operation, where he's not sort of comparing all these to see what's best necessarily, but sort of saying, OK, you guys run in parallel. All of you are going to get rejected.
But after a few seconds, let's just take the mean of what you guys came up with. It can be a short operation. Behind the scenes can look like this, when you take the mean, it can be relatively fast. [INAUDIBLE].
I'm not suggesting this is how Galileo did it-- that he necessarily went with, maybe it's cubic. Maybe it's this. Maybe it's that, maybe it's that, but something of the flavor of that ability-- able to run millions of processes very quickly. And only sort of somewhat smart things thinking about it for a minute, would could be many millions of proposals come to mind.
Does that answer some of your question? It also addresses a little bit what you asked about before, which is this notion of time-- how much time do we have? How many proposals can you make?
The answer is, we don't know. We don't know if people are doing this. We don't know if this is what brains are doing-- they're doing stochastic search. If they are, what it looks like, how fast it is-- we don't know. I'm just saying from a computational point of view.
AUDIENCE: [INAUDIBLE] one of the talks about his [INAUDIBLE] trajectory.
TOMER ULLMAN: Sorry, perhaps you could remind me.
AUDIENCE: [INAUDIBLE] trajectory of a cube falling off a table, or a ball that goes through a trajectory, a possible trajectory [INAUDIBLE] some kind of mean of [INAUDIBLE].
TOMER ULLMAN: OK, so I'm sorry not to dwell on that, because I have to get to a few more points and give Laura time. But I'm happy to talk a bit more about that at the end or afterwards.
So the second point is to say, well, Laura, actually the second part of the pincer move is, you're right. Stochastic search is not smart enough. But there are ways that we can make it smarter.
And I agree there seems to be something inherently wrong about an algorithm that can take some problem like why are these blocks sticking to one another, as I said before, and suggest that maybe it's because people have more than two children on average, or maybe it's because the moon is larger than most insects. That just seems wrong.
And so that's one part of the problem, right? It's that we don't want something that proposes just crazy things. The other thing that we want is, we don't want something that proposes the same dumb thing over and over again.
We don't want something that says, how about x? And say, no, reject that. It says, oh, well, how about x? No. Have you thought of x?
And that can happen in stochastic search. Although I have to say, since we gave this talk, Gabriel has learned to talk, my son. And I'm not so sure that this is not what children are doing.
AUDIENCE: Or professors. [INAUDIBLE].
TOMER ULLMAN: [INAUDIBLE] things like that. What can we see here? It's like a giraffe, and it's a giraffe, and it's a giraffe, and there's a giraffe.
So I think ideally, we would want something that, as you were saying, sort of makes smarter proposals, even before it rejects them, and doesn't always do the same dumb proposal over and over again. So we want the ability to learn from our proposals and to see the way ahead. And I don't know if Josh was able to show you the sort of a Lamarckian learning type figure?
AUDIENCE: [INAUDIBLE].
TOMER ULLMAN: OK, so this is kind of the difference, I guess, between genetic learning and stochastic search and the kind of learning that Laura's more pointing to, which is Lamarckian learning, which we know is wrong. But as Laura was saying about ideas, wouldn't that be a great idea if it were true?
I mean, imagine you're the giraffe, and you're trying to think, well, I really want to get those leaves. What am I going to do? Am I going to grow a dorsal fin, or [INAUDIBLE] shorter. Maybe I'll just die, which evolution does frequently, until you hit upon that point of making your neck longer. But what you want is some ability to say no. I'm going to look at my goal and propose something smarter, like making my neck longer.
OK, so here I'm going to highlight two pieces of work. One is by Owen Lewis about making relevant moves-- so not necessarily the smartest move, but they just have to be relevant. And work by Eyal Dechter, which is really interesting as well, which is about remembering good moves.
So maybe you've made moves before that you didn't a priori know that they were good. But now that they're good, stick them in your bag and make sure that you use them next time. So suppose Owen Lewis' work, if I can describe it briefly-- are we OK with time, Laura? Should I zoom through this?
LAURA SCHULZ: Yeah, [INAUDIBLE] a little bit.
TOMER ULLMAN: OK, I'll still do it.
LAURA SCHULZ: I have a picture of the beach.
TOMER ULLMAN: Yeah, OK. So Owen Lewis' work, I can sort of hand-wave about it a bit, but he recently had a [INAUDIBLE] paper which you can read more about this, is suppose that you have a concept that you're trying to learn. OK, so the concept is going to be something like this. OK, what's the concept?
AUDIENCE: Blue square.
TOMER ULLMAN: Blue square, good-- a very good concept. What about now? This is another example of the concept. This is a set and that's a set. So what are sets? Josh, what do you think sets are?
JOSH: Squares.
TOMER ULLMAN: They're squares. OK, what about this? It's also a set.
JOSH: PowerPoint shapes.
TOMER ULLMAN: PowerPoint shapes, good. OK, well, here's another set, here's another set, here's another set. At this point, some of you have said--
AUDIENCE: [INAUDIBLE].
TOMER ULLMAN: Squares and circles, good, or maybe squares and red circles-- maybe that. That looks good. Yeah, I like that theory.
Maybe it's PowerPoint shapes. That's too general in a way. It's good, but there's [INAUDIBLE] there.
OK, so the point is, you can describe a grammar to capture these simple concepts, these simple shapes. It's sort of built like a tree, and you're building sort of features, where you're saying either it's square or it's an red circle. So this is basically saying the concept is either a square or a red circle.
But now suppose I showed you-- and this is similar to what Steve's done, what we've done and such. It's a grammar that can generate concepts here. The concepts are simple. But it's supposed to be this form.
So suppose I showed you this as a set. So how do you change your theory? Suppose you're in some sort of [INAUDIBLE] theory space. This is where you are. What do you propose?
Well, you're not going to propose, well, maybe it's an orange octagon. That doesn't account for the previous data, and it's not even relevant to the new thing which I just told you. It's wrong according to your current theory.
And you might not say, well, then, maybe it's a square. What you want is a way to say, where am I in theory space? This is my current theory space, and this is the new thing that I got wrong.
And to use that information about what I got wrong to make better proposals. So the point is, you can look at this tree here, and say, where am I going to cut this? And you can already eliminate a bunch of these things as no matter what I'm going to propose, it's not going to be relevant for that.
So the point is that I want to stick on this new thing. For example, it should be a triangle of size two. That's the way to describe this thing.
Or it could be a triangle. It could be a red triangle. That doesn't even matter.
There's a proposal that I have for this bit of the concept, and I need to stick it onto my [INAUDIBLE] tree. And then I can do some, as you were suggesting, a bit of pre-processing and say, it's never going to work for any of these nodes, no matter what it is that I generated for this proposal. It's only going to work for that node, so I need to stick it on over there.
And again, it might be the wrong proposal. Maybe it's just a triangle, not of size two. But the point is to be able to look at your nodes and sort of say, no matter what is going to be next, because I [INAUDIBLE] this particular error, I should cut the tree here. As I said, I'm kind of hand-waving about it, but hopefully, you got the general notion of the [INAUDIBLE] paper.
OK, Part B of this is this notion of remembering the moves. And we already have this point when we were proposing this notion of stochastic search. We were saying, well, surely it's not a very simple stochastic search. There needs to be this notion of templates.
And templates are these things that are true across domain. So as I mentioned before, this grammar can capture simplified magnetism, and taxonomy, and kinship, and things like that. But there are some things that are similar about kinship, and taxonomy, and magnetism in their abstract form, like transitivity rules.
If p is xy, and p is yz, then p is xz. You see that rule? If x and y and y and z, then x and z.
Like if in relationships, you might say, well, if a condor is a bird and a bird is an animal, then a condor is an animal. That's an example of transitivity. Transitivity could be true for a lot of other domains, again, in this abstract way.
So the point is, how do we get these templates? Are we just born with them? Or is it that we look at some sort of domain where we've learned something, and we abstract away some structure?
And Eyal's work is sort of about that. It's about using something called the Exploration Compression Algorithm, which starts with a very, very, very simple library, kind of like where you were asking about before. He started with one of the examples he was using was Boolean circuits. Let's start with really, really basic primitives, like these notions of combinatorial logic, and just add sort of functions together in some basic way, and the notion of an answer. That's it. In principle, I can build any possible Boolean function, any possible search, any possible function going from one truth table to another truth table.
But of course, that's not really great, because a lot of circuits that are useful and out there in the real world use things like AND and OR and NOT, and things like that. So what Eyal's algorithm does is that it solves a lot of these circuit sort of problems where you guys may have seen or students at MIT. Like here's a truth table. Here's the input-- 1010101010. Here's the output-- false.
Here's a new input to the same circuit. It's true. Things like that, like this circuit-- there's an x and a y. If you put it 10, it comes out false. True for that. And the point is, you have a lot of these problems, and you try to solve them. But at the end, when you try to solve them, you look at what sort of structure is similar in between all of these problems. And you say, ah, let's use the structure as a new primitive.
So for example, I can use this NAND gate to construct an AND gate or things like that-- or sorry, and NAND gate and these sort of primitives to construct an AND gate or a NOT gate, or this even new sort of weird function that's not typically found in logical circuits. But it turns out it's really, really useful when you're constructing these circuits.
And now the point is, when you go in and do your stochastic search, and say how should I solve this Boolean circuit, you're not only using your primitives. You're not only using the NAND gate. You're saying let's use a NOT. Let's use this encapsulated thing that I solved for before and I gave it a name, or this weird thing that we'll call E2 or an AND gate.
And now when you have these new primitives, you can reuse information, and you sort of make smarter proposals. This is sort of another pictorial way of saying suppose the space is very, very large. These are the good solutions over here. They're going from short program that'll solve what you want to long programs that will still solve what you want, but it will take an infinite amount of time to get to those programs, because you're searching a very large space.
This is supposed to tell you whether these programs are good. And what you do is you search an [INAUDIBLE] area within this space. This is the only thing you can search, because you can only search a small part of it.
But once you've searched that, you abstract away the good solutions for that space, make them into new primitives. Now that you have new primitives, some sort of really complicated programs are only complicated when you have this language of NANDs. When you have this new language, new primitives, that should become simpler.
And the effective space-- things get sucked up into the effective space. And again, there's a paper on it. Those of you interested in the more technical details of the EC algorithm should definitely either contact Eyal or read that.
OK, so in the semi-conclusion-- and I'm happy to take up this debate with Laura for all the time that she'll allow us-- I think we have a long way to go to live up to children. I think there's a long way to go to meet up to Laura's critique. There are a bunch of things that she said that I haven't addressed that I have some thoughts on.
But I think these are still early days. And it's kind of hard to say what is actually hand. And we shouldn't be led astray by our intuitions of, like, well, when they come up with things, it seems to me like it came out of the blue. Or it seemed to be like I'd solved the problem, and the solution presented itself obviously, which was that looked like one sample, while in the background it could have been lots of samples.
Or it could be that you started with something really, really dumb. And it's just because you're not all four, which is a long time in developmental sense, you've got all these useful new primitives. But it took you a long time to do that. And the way you did that was stochastic search. And in the end, the problem is still stochastic search because it's a better primitives-- maybe. And the point is, people in development, and all of us, should continue to care about search algorithms to everyone's benefit.
[BELL RING]
LAURA SCHULZ: [INAUDIBLE] going to spend more time adjusting the sound system, I think, than speaking here-- just to say this. I think I ended on this last point, that we want our stories, our answers to satisfy the abstract constraints, for them to satisfy these goals. And there's something, I think, also very satisfying of each of the kinds of accounts that Tomer has given. And I admire them. I admire them greatly.
And I think Owen's is very cool. It's error-driven proposals. It's a lot like the example I showed with the [INAUDIBLE] where you get something wrong, and then you try to propose answers that at least take into account the particular kind of thing you got wrong, instead of everything else. And I think the one thing to be said is that they are still driven by the data.
But the problem is it inheres in this triangle-- what to predict and how do you account for the triangle? And ideally, you want something that isn't driven by the data. It's given by the question you ask or the problem you're posing, or some slightly more abstract constraints. Which is, I'd like to find a way where you treat the problem both as part of the data and a really simple way, to explain the importance of it [INAUDIBLE].
You can query your data in innumerable ways. And the ideas that you should come up should not be determined by the data, exactly, but by the relationship between the data and the query, and what you're asking. And that's going to affect how you form assumptions.
I think this notion of Eyal's is lovely. And actually, I've been inspired by it in a number of ways. I think that being able to get greater and better representation of complexity, again, but it's not data driven. You're just thinking what's a better representation with more complex representation, with more efficient. It's super interesting.
And Hedro, who's working in my lab with Josh and with Eyal has been doing this in the domain of tangrams. We're looking at compositional concepts, and how as you get richer ideas of how these things could be put together, you might get more and more efficient solutions to problems. And again, that's a really interesting problem.
And really, my only gripe with it is the space of kinds of things we can think that. So not all of the kinds of things we can think about-- how do you put stripes on peppermints? How do you name a new theater company? All the innumerable questions I could ask you on the fly that you can propose answers for-- they don't all require-- Eyal's problem is in some sense a really good answer for really hard problems of cognition, problems that really require changing the representational format in which you are representing the facts involved, by going from a Roman number system to an Arabic numeral system.
That's a big change. It allows new kinds of manipulations you develop. I think those are deep, hard problems.
But there are a lot of trivial problems that aren't about representational format that we don't know how to solve. In some ways, I think we just have problems of ordinary thinking-- ordinarily being able to generate new ideas and responses to problems. And I would like to know how we do that-- how we do ordinary thought. Steve's just might be true.
So Lamarckian evolution is not true for evolution. You might think it's true of thinking. You might think that intelligence is a form of intelligence, that it doesn't just rely on variation and selection.
But doesn't have to do be that way. It could be the case that we have so many things that we can think about at once, so, so quickly, that we can't generate everything. And that could be true.
That is what expedition might be in a long line of everybody. And if you have a long line of everybody really, really, really quickly, then even if you have a million acre wood, maybe you can find the kinds of things you're looking in it. So it might be true, but it's not as good a story. So I'm going to go ahead and end right there. And we'll just turn over for questions.
Associated Research Thrust: