Tutorial: Probabilistic Programming (1:09:50)
August 15, 2018
August 15, 2018
All Captioned Videos Brains, Minds and Machines Summer Course 2018
Kevin Smith, MIT
Probabilistic programming languages facilitate the implementation of generative models of the physical and social worlds that enable probabilistic inference about objects, agents, and events. This tutorial introduces WebPPL through example models and inference techniques.
slides and resources
KEVIN SMITH: So for this tutorial, I'm going to be your guide through it. Kelsey, who's sitting in the back there, is also going to help out when we start getting to the exercises. We'll both be walking around. We'll both be helping you guys out and answering any questions you might have.
The third person or the other person I want to thank for this is Toby Gerstenberg. He provided all the slides. He gave a great tutorial last year on this, but he couldn't do it this year because he's moved on to a better place. He's now a professor at Stanford.
So since he's just developing his lab, and he gave me the slides, I figured I'd do him a solid. If you're interested in this stuff, either looking for a postdoc, or know undergrads who are interested in this sort of stuff, especially as it relates to causality, responsibility, and blame, you should look him up. He's a good guy.
So to start out, we're just going to go through a simple example of the game of tug-of-war. In this game, you have two teams of people pulling on a rope in opposite directions. And the goal is just pull the other team across the line. So while this is simple, it does require thinking about a lot of different concepts like what makes up a team, how strong are people, how much effort are you putting in. And yet we can make judgments about these sorts of things very easily and naturally.
So for instance imagine you had this game, where Tom and Steve are playing a game of heads-up tug-of-war. So it's just Tom versus Steve. And I tell you Tom wins the game.
Now, I want you to guess, from a scale from very weak to very strong, how strong do you think Tom is? You're going to see this little slider move along the bottom, here. And as it's moving, I just want you to clap when you think the slider's around at the area where you think represents Tom's strength. You guys ready?
All right, seems like a lot of it was here. It's a little stronger than average. He did beat Steve, after all, but he's no Superman. How about now, you see Bill play against both Mark and Nick, and Bill wins. Again, the slider is going to go. Clap when you think it's at the right strength.
Yeah, people seem to think Bill is pretty strong. He beat two people at once. That generally takes some strength.
How about now? Now you see Tom beat Steve. You also see Bill beat Steve, and you see Ryan beat Steve. Now, how strong do you think Tom is? Ready?
These are all-- well, yes. This this is an entirely new set of people. You have not seen these people before. Don't trust the evidence from before. All right, how strong do you think Tom is?
Yeah, much more spread out-- kind of in the middle, here. Now, the interesting thing to note about this is you have the exact same evidence about Tom as you did from two times ago. You've seen him play in one game. You've seen him beat Steve. And yet, your judgments about how strong he is are different because of this non-Tom-related information.
Let's try it again. Now, Tom beats Steve-- different set of people. Different Tom beats different Steve. But different Bill plays against different Steve and loses to Steve. And different Ryan plays against different Steve, and Steve wins again. Again, how strong do you think Tom is?
Yeah, you changed your decision back to being fairly strong. Again, this was just because of a change in non-Tom-related information. All right, yeah, Steve does not like that.
So these sorts of judgments you all made pretty naturally, pretty consistently. And they required combining a lot of different concepts, though-- things like, what is a person? How do those compose into teams? How strong are people? What does it mean to put in effort? How does strength and effort combine for the amount of force that is pulled? And what does it mean for a team to win?
Well, the nice thing about probabilistic programs is this gives us a way of formalizing all of these things into a simple thing that looks like this. This is basically a model of this tug-of-war game that you just saw, where we define all of these concepts and how we think the world works. We say what did we observe? And then we ask, make an inference, about this.
And I bring up this example because a, by the end of the course, or this tutorial, you're going to be able to play around with programs like this and make your own inferences. But it's also a simple model that does a really good job of capturing human judgments. So we can have this model, use it, to ask what is the prediction of somebody's strength after observing a sequence of games, and then have people also do that same thing. They observe a sequence of games, and we ask them how strong do you think this person is? And we get this very nice correlation.
But the very nice thing about this is we can also change up the question as well. So if I asked you, not how strong is Player X, but do you think Player X was trying in one of these games? That seems like a fairly natural extension to what you're doing. You don't really need to change your concept of the world or how tug-of-war works. All you have to do is change the way you're querying that model.
And again, the nice thing about probabilistic programs is we don't have to change our model of the world. All we have to do is change one line that references that query, and we can get the same sorts of judgments out. We can look at what are the predictions for did Player X try hard in the first game versus people's judgments? And we also see this nice correlation.
So for this tutorial, what we're going to first do is ask this question of what is thinking? And describe kind of the foundational concepts that go behind probabilistic programs-- things like what kind of programs do we need and the probabilistic language of thought hypothesis. I'm going to try to go through this fairly quickly because this is a lot of what Josh is going to talk about in his lecture tomorrow morning, which you should all attend. But it's going to give you enough of a baseline that we can then move on to the do-it-yourself part. We're going to go through learning the WebPPL probabilistic programming language and how to build these generative models ourselves and do inference on them.
So to start it off, what is thinking? Now, the general way of thinking about this is kind of two very related questions. First of all, how do we describe the intelligent inferences that people make? And how do we build intelligent inferences into machines?
And the whole idea behind the Center for Brains, Minds, and Machines is that these two questions are very, very related. If you can reverse-engineer human inferences, you can then build them into machines. But in order to do that, we have to ask, well, what is the right way to think about these inferences and how people do this?
And for that, what we can do is turn to the computational theory of mind. The idea behind this is that we can treat the mind as an information processing machine or a computer. And under this analogy, all of our mental representations, our concepts, our models of the world, can be thought of as different computer programs. And then thinking is just taking one of those programs and running it.
Of course, this leaves the question of what kind of program? What programs do we need to explain human intelligence? And while there's a variety of them, one of the core ones is this idea of a working models of a domain. So this is a way of encoding knowledge about how the world works from cause to effect.
So one example that's near and dear to my heart is the intuitive physics engine. And the idea behind the intuitive physics engine is that through perception, we can build up a 3D model of the world in our head. And then we have this mental machinery that knows something about the laws of physics and can run these representations forward in a noisy fashion, because there's a mismatch between what's in our head and what's in the world, and physics itself can be unknowably random.
But once we run that forward stochastically, we can get a probability distribution over possible worlds. And using that probability distribution, we can arbitrarily query it to answer questions. Like, if we take this tower of blocks, we can ask, do you think it's going to fall? If it falls, which direction is it going to go?
These are reasonable questions to ask, but they don't require changing anything about our knowledge of physics to change the way we answer them. And when we do that, we can capture people's predictions of things like stability to a very good degree. So once we know something about how the world works, we can use this, and we can flexibly combine these sorts of programs to answer lots of questions, make inferences, make judgments.
And so in order to build these working models, though, we think there are two core components that we need to focus on-- structure and probability. And the idea behind structure is this encodes the sorts of knowledge that we have-- this idea that we have some sense of how physics is going to work, for instance. If we didn't have knowledge of physics, we'd do a pretty bad job of making predictions. So we need that sort of structure.
But we also need uncertainty about that. So as I mentioned, there is going to necessarily be a mismatch between the model of the world in your head and what's out there in the world. And we need to account for the fact that we're not perfectly sure about where every object is or how it's moving.
And similarly, when you roll dice, we can consider that a random event, basically. And so when we think about unfolding the world, this sort of randomness we need to account for if we want to have a sense of what's going to happen. We'd do a very bad job if we just had a single prediction, we had a single prediction, and-- I mean, I can keep talking. There's not too much to the slide-- if there was a single prediction, and we get it wrong.
So when we think about things like structure and some of the classic ways of approaching intelligence, things like classical AI in the '60s and '70s had a lot of structure. We built out these logical or rule-based programs to try to understand the world and make explanations. But they did a very bad job when any sort of probability or uncertainty got added to them. And we're back. And we're not back. And we're back.
Similarly, more modern approaches like neural networks do a very good job of encoding statistical relationships between items, but it's very hard to build structure into them other than adding inductive biases based on the network architecture. If you want to tell a neural network how to do physics, you can't actually tell it how to do physics. You have to give it a bunch of training examples and hope it learns physics.
So the reason we like probabilistic programs is because this is a natural way of combining both the structure and the probability into one way. And that, we hope, can explain the range of human intelligence and human thought. So it's the guiding principle for how we think about what we need in these probabilistic programs. To explain human cognition, we use the probabilistic language of thought hypothesis.
And basically what this says is that concepts have a language-like compositionality and encode probabilistic knowledge. These features allow them to be extended productively to new situations and support flexible reasoning and learning by probabilistic inference. And so I going to pull out three concepts there-- compositionality, productivity, and probabilistic inference-- because we think of these as the three building blocks of intelligent programs.
So what do I mean by compositionality? Well, this is taking smaller concepts and putting them together into newer, larger concepts. So if I said to you, a giraffe with a blue ribbon around its neck eats ice cream, you guys might have an image that looks something like this.
Now, I'm guessing, unless you've seen this talk before, you've never seen this picture or heard the sentence. But because you know something about giraffes, the color blue, ribbons, necks, what eating and what ice cream are, you can combine those into this concept that looks a lot like this. So that ability to build up into more complex concepts is key.
The other thing is productivity. And we mean this in the Humboldt-ian way of the infinite use of finite means. So he was using this to describe language, where you have a limited set of letters or phonemes, a fixed set of rules for the grammar, but from this, you can express a basically infinite range of thoughts-- things like giraffes with blue ribbons around their necks eating ice cream. So the ability to go from a limited and definable set of primitives up to the basically infinite range of thoughts that we can have is necessary to explain human cognition.
And then the final thing is probabilistic inference. So when I talked about things like the intuitive physics engine, I describe it as these working models of the world, where you can run forward from cause to effect. And that's not too difficult.
If I asked, where will the ball land? You have this ball, here. You can run it forward through your stochastic physics engine, and you get a range of possible outcomes of where it might go. Now, this is something you could do with most non-probabilistic programming languages.
But a lot of interesting questions about human cognition don't go from cause to effect. It doesn't involve reasoning in the forward direction. It involves reasoning backwards from effect to cause. So you see the ball down here. Where did the ball come from? Or in the game of tug-of-war, you see the outcome of a game. What do you know about the people that caused this?
And this, again, comes pretty naturally to people. So I'm going to have this hand go across the top of this Plinko machine, here. And I just want you guys to clap when it crosses where you think the ball actually came from. All right?
Yeah, most people thought it came from somewhere on the right. Now, in non-probabilistic programming languages, you tend to have to use special machinery to work backwards to get there. You can't just take the forward model and invert it. And that's a beautiful thing and why we like to express these sorts of models in probabilistic programming language. Because as you'll see later and do yourself, once you've defined the forward model, once you've defined how the world works from cause to effect, it's trivially easy to code it up so that you can go backwards from effect back to cause. So the fact that we can do this so naturally, we want these sorts of languages that allow us to do that as well.
So the final thing I'm going to talk about before we get into the do-it-yourself portion is answering a question that I invariably get asked every single year I've taught this course, which is how do you implement these sorts of programming languages in the brain? And now, if you are asking this question of, how do neurons give rise to the primitives of these probabilistic programming languages? My answer is, I don't know. And anybody that tells you that they do know is either lying or about to become really, really famous.
The thing is, for the purposes of what we're doing, this is kind of an irrelevant question. That's not to say it's not important. I think there should be a lot of work trying to figure out how you can get this sort of symbolic logic out of these neurons. In fact, if you were at Leslie Valiant's talk the other night, that was exactly what he was talking about.
But when we talk about understanding intelligence, we don't think this is the right level of representation. Instead, what we're trying to study is how do you go from these sorts of primitives that we believe are required, this limited set of primitives that are required to understand cognition, to the wide range of behaviors that we see people do, from constructing objects to understanding the world to negotiating and cooperating with other social agents? And for this, we think the right level of description is probabilistic programming languages.
Just like if you're trying to write a computer game, for instance, you're not going to do it by switching wires on transistors. You're going to think about it in terms of a higher-level programming language like C or Python. We think going back to neurons to try to explain this behavior is not the right way of thinking about it. Instead, let's look at this intermediate representation and use that. Yeah?
AUDIENCE: [INAUDIBLE] the second example?
KEVIN SMITH: Sorry, in which example?
AUDIENCE: [INAUDIBLE] you go from primitives to behavior.
KEVIN SMITH: Yes. So we'll talk about some of the core functions, at least in WebPPL and things like random exchangeable primitives or stochastic items that are needed to build this out.
AUDIENCE: This is a computational [INAUDIBLE].
KEVIN SMITH: This is a computational, yes. If you're thinking about this from Marr's levels, this is all at Marr level 3, the computational model of intelligence. Are there any other questions?
All right, we're going to move into the do-it-yourself side of things, now. First thing going to talk about here is why are we using WebPPL as opposed to other probabilistic programming languages? There are a wide variety of PPLs out there. But WebPPL we think is the best one to use because it's the most expressive. It's the most general probabilistic programming language that captures the full range of lambda calculus and the full flexibility of human cognition.
Now, because of this flexibility, there's some ways that it's slower than other PPLs, just because we can't make certain assumptions that simplify things. But we think this is the only way that you can have the same sort of compositionality and productivity that's required for human intelligence. Other PPLs include a wide range of things that typically have less expressive models. You can't define the same range of things in them.
But because of them, they're a lot faster. So if you're using a probabilistic programming languages for things like data analysis, typically you're going to want to use one of these. If you're using it for understanding human cognition, WebPPL or some sort of lambda calculus is probably the best choice.
So what we're going to do in this section is go through three parts. We're first just going to learn about the basic syntax of WebPPL. Then we're going to build some generative models, and then we're going to invert those generative models with inference. Between each of these, we're going to take about 10 to 15 minutes, depending on timing, so that you guys can sit down and do this yourself and run some exercises.
So if you're at your computer, you should go to this website, here. I think Chris linked it in the Slack channel earlier for anybody else. And then open up two things. One is go into the Notes folder and open up the first, the one WebPPL basics markdown document. And the other is go to this website, webppl.org.
There should be a link at the top of this document because this is going to describe the stuff I'll talk about. This is where you can input all the code and do it yourself. So I'll give you guys a moment to do that.
AUDIENCE: [INAUDIBLE] mathematics is [INAUDIBLE]
KEVIN SMITH: So there's kind of two levels to describe it at, one of which is this is just a way of doing Bayesian conditioning and Bayesian inference over very expressive sorts of languages. The more detailed way is-- I'm just going to very briefly go into this, and you can ask me some questions about it later. It builds out a tree structure of the sorts of ways that a program might evolve, including randomness, and then runs inference over that.
So the math behind this is very complex. There's a number of papers. And then I'll give you some pointers to them at the very end of this. But one of the beautiful things about what we're doing is you don't need to know the details of the math and the hard details of the inference algorithm. You can get that for free, and that's why we like these programs.
Hopefully, everyone's there. We're going to get into the basics, here. And basically all we're going to do here is talk about the syntax of WebPPL-- how to declare and store variables, how to do conditionals and build functions.
So we're just going to start out with this question of how do you store a variable? And the way you do this is, if you remember from the programming tutorial, you want to be able to keep information around. And so the way we tell WebPPL that we want to keep information is we use the var command. So we say var, and then we name our variable.
So for instance, strength. And we say is equal to and maybe 2.5, right? Then if I want to display that, I just say display. I put the variable name in there. And it will tell me I put 2.5 in this. Easy enough.
So we can store things like numbers. We can also store things like strings. So we put those in single or double quotes, and then type display, and you get out what was stored in those quotes.
And we can also store things like Booleans. And here, we can store either true or false. False-- I need to type correctly. False. These are special ways of basically saying is this statement true or false? But these are also items you can store in variables.
However, oftentimes we don't just want to store single things, like just single numbers or single strings or single Booleans. Things out there in the world are collections of properties. So if I wanted to define something about myself, I might want to know things like my first name, my last name, how strong I am, how lazy I am. And we can do that by defining these as Objects and the way this works is you again start with var. You name your object, you say equal, and then you open these curly brackets, here.
And then what we can do is give it a list of attributes and their associated values. So for instance, I could say, first name, colon, Kevin. So this means that the first attribute of this object is Kevin. Put a comma to go to the next one. Last name-- Smith. My strength might be 2.5. And lazy, I'm just going to say false. Yeah, you can call me out on that lie as much as you want, but that's what it is.
And then we can pull those attributes back out by one of two ways. So if I say display Kevin dot say strength and type and press run, then it's just going to give me back what I stored in this strength variable. You can do the same thing for any of the other ones-- Kevin lazy, Kevin first name, Kevin dot last name.
Another way to access this that we won't really be using in this course, but if you want to define the attribute you're looking at dynamically, you can give it a string within these brackets. So here, you see I've got these brackets. And I give it the string saying lazy, and that's again going to say I want to pull out the lazy attribute. Any questions about objects?
Another important data structure are arrays. So this is just going to be a sequence of data. So if I wanted to make an array of instructors, this should all be the same data.
I say var instructors is equal to. And then I have these square brackets, here. I'm just pressing enter to make it easier to see. And then I give it, say, two strings, Kelsey and Kevin, with a comma before that. And then when I say display instructors, it's going to give me back both Kelsey and Kevin.
Now, the nice thing about this is we can pull things out of these arrays as well by index. And this is, as most programming languages are, a zero-indexed language, which means that if you want to pull out the first item of the array, say, Kelsey, that is index 0. And then Kevin is at index 1.
So if I say display instructors 0 and run that, you're going to get Kelsey back. And instructors one gets Kevin back. Any questions there? Yeah?
KEVIN SMITH: Sorry?
KEVIN SMITH: Yes, so once you've defined it, you can redefine things as well if you wanted. So you could make a new array, where you pulled everything but one thing out and stored something new in. And that's perfectly allowable. But once you've defined an object in WebPPL, it's what it is. Yeah. All right?
So some other stuff we might want to do is control the program flow as well. So we want to say let's check something out, and if that's true, then we'll do one thing. If it's not true, then we'll do something else.
So we can pull out, say, my definition of me, for instance. And then we can use if/then statements, or in this case, if/else statements to define what we want. And the way this works is you start with the word if, and then in parentheses, you have to give it a conditional that evaluates to true or false.
So for instance, you'd say if Kevin dot strength is greater than 2.5. So that gives us that conditional. And then we open up these curly brackets, and we say here's what, if this evaluates to true, I want to do. Oh, sorry. I'm going to say is greater than 5, not 2.5. So I will say display Kevin is strong.
Now, if I run this, nothing's going to happen. Why is that? Because Kevin dot strength is not greater than 5. The other thing you can do, though, is put an else statement in here.
So again, you type else, and you put these curly brackets in. And then you say what you want to happen if this evaluates to false. So in this case, Kevin is not strong. So now when we run this, because 2.5 is less than 5, it's going to actually display something.
We can chain these together as well with else/if statements. So for instance, if we wanted to say a strength value of 5 was just so-so, we could say now else/if Kevin dot strength is less than 5, then I'm going to define that else display Kevin is so-so. Now, if I run this, again, I'm going to get not strong.
But if I change the strength value to 5, it's going to type in so-so because what happens is it first goes here, and it says is Kevin dot strength greater than 5? Nope. Move on to the next one. Is Kevin's strength less than 5?
Well, in the first case, it was, so it stopped there. But in this case, nope, it's exactly equal. So then it goes onto this if statement. So you work through these sequentially. Make sense?
So I could just say Kevin dot lazy. And the reason why I don't have to say equal to or anything like that is because this is storing a Boolean. This already evaluates to either true or false.
Then I say question mark, and I can say display Kevin is lazy. So this is the part of the argument that gets evaluated if this turns out to be true. Then you put in a colon, and then you have the part of the argument that gets evaluated if it's false.
So instead of having to write if Kevin dot lazy bracket display Kevin is lazy, else bracket display Kevin is not lazy, this is just a much shorter way of writing that. And it gets used a lot in the sorts of models that you'll typically see in the WebPPL documentation. Any questions, there? All right.
The next thing we need to do is define some functions. Yeah?
KEVIN SMITH: Yes, yeah. So if you might remember from your programming tutorial, functions are blocks of code that take in some arguments, do some stuff with them, and then return. So to define functions in WebPPL, we again say we need to store them somewhere. So for instance, if I wanted to define a function that took in the winner between two people playing tug-of-war, I might say var return winner. So I'm now going to make a function called return winner.
Now, I say equal. I want to assign something to it. And then I use the function command. And the function, the way you write this, you say function. You open up some brackets or parentheses, and then you give it a set of arguments. And these are the things that it takes in. So I can call person A and person B. And then you use the curly brackets.
So what this says is I'm going to make a function called return winner that takes two arguments, person A and person B, and then I'm going to run some code that's in these curly brackets on them. And so maybe I'll just make it very simple, which is just to say if person A is stronger than person B, person A is going to win. And I will return the first name. If not, person B is going to win, and I'll return their first name.
So I can write that as if person A dot strength is greater than person B dot strength, then return person A dot first name. And what this return statement says is once I get here, stop the function and output whatever is here. Then else return person B dot first name.
Now, if I then take, say, me and Kelsey, where Kelsey has a strength of 10 and I have a strength of 2.5, and I ask who's going to win, well, then I could say return winner Kevin Kelsey. And it gives you back Kelsey because that's the first name of the object that has the greater strength value. Does that syntax make sense?
Now, the other thing you can do with functions is make functions that don't take any arguments. So for instance, I might say var say hi is equal to function. And now, I give it no arguments, but I say display hello.
And now, if I run say hi, and then give it no arguments whatsoever, and run it, I will get back out hello. Now, this may seem like a silly thing to do, but once we start getting into programs that have stochasticity in them, so when you do run them over and over, you will get different outputs, this becomes pretty important.
And then the final thing in this section is talking about higher-order functions because this becomes very important for WebPPL. And a higher-order function is a function that either takes in as input or returns as outputs another function.
And one of the most crucial one of these is called map. And what the map function does is it takes two arguments, one is a function, one is an array. And it runs that function on every element of the array.
So for instance, if I say var my square is equal to function x, and then I just return x times x, then if I did my square 5, I would get out 25. Easy enough.
But if I then said var some numbers 1, 2, 3, 4, so I have this array of numbers, I don't want to have to call my square some number 0, my square some numbers 1, so on and so forth. It's much easier to say display map. And it takes in, again, two arguments, the first one being my square, the second one being some numbers. And it should give me back 1, 4, 9, 16.
So how is this working? Well, I think the easiest way to see this-- this is a concept that a lot of people sometimes find challenging, so I like to visualize it, which is you have this, again, this my square function that squares numbers. You have some numbers that are in this array, so this is the input.
When we run map of my square on some numbers, what the map function is doing is it's first looking at the first number, and it's saying, I will run this my square on this. And so 1 squared is 1. That's easy enough.
Then it moves on to the next one, and 2 squared is 4. And it moves on to the next one, and 3 squared is 9. It moves on the next one, and 4 squared is 16. So this is a way of taking in this function my square and applying it to every element of some numbers. And this sort of operation is often important for many operations in WebPPL.
You can't do that in WebPPL. This is part of the reason that you can do this inference requires breaking down your code in certain ways that you can't do with these for loops. But you can fake it in some ways. So instead of writing a for loop, you use map to apply across a loop. Or instead of using wild statements, you can use recursion, which we'll talk about in the next section.
So with that, that's the end of Section 1. And now, you guys will get to do this yourself. So there are these four practice problems at the very bottom of the markdown document that you should have gone to. You should go through them. You should try to answer them and write the code yourself.
If you get absolutely stuck and want to see the answer, you can click on this raw button at the top and go to the very bottom. We have the answers in the code. But it's better if you try to do it yourself first. And while you're doing that, we'll take about, I'd say, 10 minutes. And Kelsey and I will be walking around, giving you guys help.
Now that we have some sense for the syntax of WebPPL, let's start building some of these working models of the world. In order to figure out how to build this, we're going to talk about a few concepts, like how do you sample from random primitives to build these models? How do those samples turn into a probability distribution that we can run inference over? Persistent randomness or memoization, which I'll get into a little bit, and how to write recursive functions, which is a way of going on and on without necessarily-- or using other sorts of functions within a function.
So the thing to keep in mind here is that if we're treating WebPPL as a way of building these models of how the world works, we need to capture our uncertainty in some way. And the way we can do this is basically just assume the stuff that we don't know to be true for sure, we can capture by some sort of random choice.
If we do this, and we build it in WebPPL, this language is actually universal. It's a Turing complete language. It's Lambda calculus. You can express any sort of computable process you want with this. And so when we're doing this, the important things to consider is how do we define this causal process? How do we build a program that goes from inputs, these causes, to outputs, or these effects?
So to do this, we're going to start with the flip function. And flip is what's called a random exchangeable primitive, which is just a way of saying here's some way that we're introducing something unknown or something probabilistic into our program. So we can write something like-- as soon as I highlight this-- flip and run it, and we will get out, well, in this case, true or false.
Good. That worked on the second try. I had a 50-50 chance of that. What flip does is it basically, with a 50-50 chance, will return either true or false.
Now remember, I mentioned there are these functions that don't take any arguments. This is why it's useful. Now, we can repeat this process a large number of times by saying repeat. And so this is another higher-order function.
This takes in a number to start, and then a function to end. So in this case, if I wanted to run flip over and over and over 1,000 times, I would say repeat 1000 flip. And I can run this, and it's going to give me this long list of things that's kind of incomprehensible.
So if we wanted to make this easier to visualize, we can use the vis function which is built into WebPPL. Wow, that's a lot of spaces. And what this will do is, depending on the input you give it, it will give you back some sort of graph summarizing that input.
So in case, because we have these two things that could happen, false and true, this gives you a probability distribution over the outcome of this function. You can see it's pretty close to 50-50 because if we flip a fair coin 1,000 times, we should get pretty close to 50-50, heads or tails. Make sense?
Now, flip can also take an argument. So if I said flip 0.7, and I ran this, most of the time-- actually, exactly 70% of the time-- I'm going to get true. So I can run this over and over and over, and now, 70% of the outcomes will be true.
So if we want to visualize this, well, now we have to define a function here. So we'll call this biased flip function. We're going to just make this an argument-less function, and then return flip 0.7. And then vis repeats 1,000 biased flip, not Philip. When we do this, we get something that says about 70% of the time, we're going to get true. Now, does anybody know why we couldn't have just written repeat 1,000 flip 0.7? Yeah?
AUDIENCE: Would it be initializing the same [INAUDIBLE]?
KEVIN SMITH: Yeah, basically. It's not a function. In that case, it evaluates that. It says that's true. And WebPPL doesn't like it when you try to give it not a function to repeat. So it will actually throw an error here. That's exactly right.
So now that we have this notion of this random primitive, we can start building up models from it. So for instance, we might be able to, say, take three variables, a, b, and c, and each of them have a 30% chance of coming up true. Now, when you sum up these Booleans, it's actually like summing up 1 or 0. So if I say a return a plus b plus c, this is basically asking how many of these came up true?
So I might run this once, and a turns into 1, b turns into 0, c turns into 1. And so the sum of that is going to be 2. Or 0, 0, 0, so that gets to be 0. 0, 0, 1, 1. And I can do this a whole host of times and visualize it and get this sort of probability distribution.
Now, the nice thing about the way that WebPPL works is that this sort of function, sampling from these random functions, specifically defines a probability distribution. So in this case, this is the binomial distribution with three draws. But in one case, we have this generative model, this random generative model, that we can sample from. In another, we have this mathematical distribution.
But-- and this is the nice thing-- any distribution we want can be represented by the right probabilistic program that gets sampled enough. So it can do a lot of the things that require mathematics and inference, just having these sorts of programs. We don't have to define the math of the distribution, and we can still run things like inference and conditioning.
So if we want to do that ourselves, we can build this function by, say, we'll define it as flipping away. And this is going to be a function. We say var a equals flip 0.3. var b equals flip 0.3. var c equals flip 0.3. And var d is equal to a plus b plus c, return d.
Now we can vis-- not zip. Why did that come out? vis repeats 1,000 times this flipping away. And you'll see the output I-- it's because it's not "flippling" away. And you see the sort of distribution I showed you in the last slide. So just by writing this simple program, we can get out this distribution. Make sense?
So flip is the most basic sort of randomness that we can add to WebPPL. But there's a lot of other sorts of built-in probability distributions that we can sample from as well. So for instance, if we wanted to sample from a Gaussian distribution-- like people's strength, for instance, in tug-of-war, is probably not all or nothing.
We can instead say maybe strength is equal to Gaussian 50/10 What this says is I'm going to pull from a Gaussian distribution with a mean of 50 and a standard deviation of 10. And so I also have to display strength. If I do that a number of times, I'm going to get these continuous values out.
We don't just have to use Gaussian distributions. There's a lot of other built-in ones. Some important ones are like the exponential distribution, so we can do the same thing and get out a number of draws from the exponential distribution.
And we don't even have to be limited to treating a distribution just as these basic sorts of probability distributions. We can take-- I don't even want to write this because it's very complicated-- this function that is called what is this, even? Because what is this, really?
It's the absolute value of a Gaussian distribution raised to the power of a uniform times an exponential. This makes no sense. I have no clue what this would look like a priori.
But it doesn't matter because once we've written that model, we can repeat it a number of times, run it, and get out this sort of distribution. So again, before what we saw when we ran the vis on repeat of discrete choices was these bar graphs. When we run it on continuous choices, this is just the probability distribution function, the density of that distribution. Any questions, there?
The next thing we're going to talk about is called persistent randomness or memoization. Imagine I make a function called eye color, which takes in a person. And what it does is it just returns from a uniform draw from one of three colors, blue, green, or brown. Now, if I display the eye color of Kevin, this will get me out green-- not quite right, but you know what? We'll approximate it.
However, what happens if I do this a lot of times? Now when I run this, it turns out my eye color changes between each function call. It's not great. There are some features of the world that, once they've been computed, we don't know what they're like a priori. If we took a random person, we don't know their eye color. But once we've observed their eye color, we expect it to be constant, right?
So there is a way of making sure this happens in WebPPL, and this is called memoization. And the way you do this is with the mem function. And this is a high-order function that just wraps a function. So you say mem. You surround this function.
So you see, the input to mem is this function that's defined here. And what it gives you back is another function, that now, every single time you call it with the same arguments, it's going to return the exact same thing. So if I run this again, I will always get the same thing.
Now, if I restart the program and run it again, I can get something different. But once I've set it within this program, it's going to give you the same thing. And then you can do this for other people, though. So let's see what Kelsey's eye color is like.
So now, there should be three of my eye color, three of Kelsey's eye color. If I had typed it right, there would have been three of my eye color and three of Kelsey's eye color. There. Now we see something different.
So again, what this lets us do is say, I'm not really sure what a person's eye color is like, or some definition or attribute of a person. But once I've set it, that's there to stay. Any questions, there?
Final part of this section is going to be about recursion. Recursion-- it's a very deep complex with a very simple implementation. And that implementation is basically you just call a function from within itself.
So for instance, we can take this function called counting down. I'm just going to copy this over and go through it because it's easier than typing it out by hand. What this does is this takes in as an input some number. It displays the number, and then at the very end, it says, OK, is this number equal to 0? If so, I'm going to display done with counting. If not, I'm going to call this exact same function with that number minus 1.
So the important things about recursion are you need to define a way of calling that function within a function. And you also need a stopping condition. If I didn't have this number equals 0 conditional, here, if all I had was counting down number equals 1, it would say 5, 4, 3 2, 1, 0, negative 1, negative 2, and just keep going on and on and on until it crashes your computer.
So you need to say, when should I end? But if I don't end, what should I do on the next one? So if I run this, I will get 4, 3, 2, 1, 0, and then it says done with counting.
Does that make sense to everyone? So this is a good way of replacing while loops or running something over and over and over, when you're not really sure it's going to stop. Because another thing you do is put some stochasticity in here.
And so with that, we're going to move on to the exercises for this section. Again, let's give ourselves about, I'd say, seven minutes, maybe eight, just to avoid running over, where we define some of these distributions. And hopefully at the end, some of you will get to the recursion problem. So again, Kelsey I will be walking around.
So hopefully, you guys have made some progress on this. If you haven't gotten to the last problem yet, don't worry. It's marked bonus. That's OK. Again, if you've got any questions, the answers are there in the code. And you can ask Kelsey and I afterwards.
So the final thing, and the kind of real meat of probabilistic programming languages, is this ability to do inference. So remember, the motivating example of this is if we have this forward model of physics, where will the ball land? We can very simply go backwards from where did the ball end up to where did the ball come from, just knowing the direction of cause to effect.
So how do we do this? Well, one way of doing this, the simplest way that we can do by hand, is called rejection sampling this sort of conditioning. So what we can do is we can run this function forward, a, b, and c, add them all together.
But we can ask questions like, well, what is the distribution of the sum of these three variables going to be like, given that I know that the sum of the first two is 1? So either a get flipped heads or b gets flipped heads. That's all I know. But I also know how the world works.
Well, what you can do is you can go through these things. You'd get these conditionals out. You know, it's true in this case. It's true in this case. It's false in these. And you just axe those.
And then you count up what is this number down here for all of the ones remaining in kind of a Sherlock Holmes sort of way, which is you exclude the impossible, the things you didn't observe. And then whatever remains is the truth. So the reason this is called rejection sampling is because you run these things forward a number of times, you reject the things that don't mesh with your understanding of the world, and that's the way you can condition on your observations. Make sense?
So it's very easy to run this model in WebPPL. We can write it like this. I'm just going to copy and paste for now. But this is the similar a, b, c, flip with 30% chance. We sum it up in d.
And now here, we use recursive function to say if d is actually less than 2, then I'm going to return as. Or sorry, greater than 2. So this is saying I want to condition on the fact that the sum of these is over 2, what did I get from a? If d is not greater than 2, I throw away that sample, and I run this again.
So remember when I said your recursion stopping point could be stochastic? This kind of is. This is saying there is a probability-- we don't actually know what it is-- but there is a probability that d will be greater than 2. And if that's the case, then I'm going to stop and return a. Otherwise, I'm going to keep going, and I can repeat that process a number of times, and get this distribution out. Does make sense?
So a quick check-- if instead of a, b, and c, I had a, b, c, d, e, f, so and so forth-- I had 20 of these items that had a 30% chance of coming up heads, and I want to condition on the fact that 19 or more of them are true, why would I not want to run this?
AUDIENCE: It would take forever.
KEVIN SMITH: Yeah, it would take basically forever. Remember, this kind of stops stochastically. And if it doesn't stop, it just tries it again. So in this case, the probability of getting more than 2 is fairly reasonable. You'll be throwing away a lot of samples, but it's not a huge deal.
The probability of getting 19 or more heads on a coin that comes up heads only 30% of the time is infinitesimal. And so you'll be throwing away a ton of those samples. And you'll have to run this over and over and over, and that's bad. But this is the simplest way to do conditioning.
Now, the nice thing about WebPPL, though, is we had to write this sampler ourselves. This is exactly what is built into WebPPL. So WebPPL has an inference procedure using the condition function.
And the way it works is you write a model similar to this a equals flip equals flip equals flip. d is equal to a plus b plus c. So this is our working model of the world. We have to define the forward model this way, using this code.
You then use the condition function, and you give it a Boolean argument that effectively says, this is what I've observed in the world. So forward model, conditioning, and then you just return what you're interested in. In this case, it would be a. So you could rewrite this as condition d is greater than or equal to 2, return a.
Now once we've done this, we can't just repeat a number of times. Now, we actually have to tell we've got a model that is capable of inference. It's one that has the generation, has the conditioning, has the query. And we have to give it some options or some information of what to do.
So we're just going to call this options. And we have to give it-- one argument is always method. So I'm going to just say method is rejection. So this is going to do rejection sampling exactly the way we wrote something before. Well, this-- there's a little bit more to it. And then say samples-- how many times do we want to run this and get cases where the condition is true?
But once we've done this, we can just run this function, infer. We give it those options, here. So we have to tell it how we want to infer it, and then we give it the function where we've defined it as conditional. And we can run it, and we're going to get something nearly identical.
So this is why it's so easy to go backwards in WebPPL. You've defined your forward model. You've defined your observations. And then just using these lines will invert it and will tell you what are the causes of the effects I've observed. Make sense?
Now, when we do these sort of conditioning and returning, we don't necessarily need to give it these variables in the exact way. We can give it relatively arbitrary things.
Remember, in the initial example I showed you, I'd said, well, given that either a or b is 1-- so a plus b is equal to 1-- return the sum of all of those. I could write that in very, very easily. So I want a condition on the fact that a plus b is equal to 1. And I want to return a plus b plus c. And when I run that, I'm going to get the exact same distribution that I had showed you in the slides. So anything that evaluates to Boolean here, any arbitrary expression or a combination-- composition-- of parts of your model, you can return that.
AUDIENCE: I have a question.
KEVIN SMITH: Yeah?
AUDIENCE: When you say [INAUDIBLE] so the inference [INAUDIBLE]?
KEVIN SMITH: Yeah. So because I've specifically said method rejection, this is telling WebPPL do rejection sampling, which is run this. If the condition is true, then I'm going to keep the thing that I said return. If the condition is not true, I'm going to throw that away and try it again. And I'm going to do that until I've gotten 1,000 samples where the condition is true. Make sense?
AUDIENCE: Do you have other methods?
KEVIN SMITH: Oh, yeah. There's a ton. This is the worst way to do it. You almost never actually want to use rejection sampling. Oftentimes, for just relatively simple arbitrary models, you can use MCMC. I think I may have to give it another argument here. You also have to say kernel M-H. But it's going to do the exact same thing. And this is going to be much more efficient than rejection sampling in the vast majority of cases.
There's a ton of other methods under the hood. I'd encourage you to look at the documentation if you wanted to know more about those. But the nice thing is once you have some sense of what's the right sort of general method to use for the model and the data that you do have, all you've got to do is give it the options. And WebPPL will do the heavy lifting for you. All right?
So the takeaway from this is we can easily condition on variables by using this condition statement. We can build these forward models and our observations in and run inference without really having to worry about it. And that's the beauty of these probabilistic programming languages, is as long as we can define these rich models, these rich forward models, we can run inference and figure out what sorts of patterns emerge from the model and the data without needing to worry about the guts of inference. It comes for free because we're using this probabilistic programming language.
So with that, I think we're going to-- sorry, one question?
AUDIENCE: Can you [INAUDIBLE] or somehow the [INAUDIBLE] 1,000 [INAUDIBLE]. What I'm saying is that [INAUDIBLE]
KEVIN SMITH: Yeah.
AUDIENCE: Could you, for example, [INAUDIBLE]?
KEVIN SMITH: Yes. I mean, again, when you do this sort of sampling or infer, you need to have an empty function with no arguments. But you could have another higher-order function that says, like, make flipping away with a parameter in it. And then that returns a function that looks like this, but with an arbitrary number in there that you give to this higher-order function.
And that will return a function that you can then run inference on. So that's another part of the power, the productivity, the compositionality, is because you can do this higher-order functional stuff. That's a very easy thing to write in WebPPL. OK?
All right, so I'm going to say take about five minutes to look through the problem of inference in the tug-of-war model. Again, Kelsey and I will be walking around. I will try to end quickly after that and get you out of here by around 6:20. Again, though, if you've got any questions, we'll be around. If you want to work on this further and really understand the model, that's great.
All right, I'm sorry if anybody has not had a chance to get through that exercise. If you're still interested, I very much encourage you to keep working on this. If you want to check your answers yourself, you can just click on this raw button at the top of the GitHub markdown. Go down to the bottom, and we have the answers in the code as well. Or find me and Kelsey-- always an option.
So just to quickly end, hopefully you've seen today the way we can instantiate these concepts of things like compositionality, productivity, and probabilistic inference using these probabilistic programming languages-- the fact that strength and laziness turn into the force of pulling, the fact that we can reason about a wide number of different teams, a wide number of questions, with this sort of conditioning, and the way that once we've written this forward model, we can easily invert it because of the strength of WebPPL.
This is about the simplest model, though, that you'd actually want to run an experiment on. But WebPPL people has been used for a wide variety of other tasks-- WebPPL people or its precursor, Church. So this is going to look a little unfamiliar because this is Church code.
But for instance, learning about physical properties of different objects. So if you see these objects bouncing around, different balls, different colored patches, you can learn things like red balls attract each other, and this green patch tends to slow things down, from some primitives like what objects are and what forces are. And then building those up into more complex things, and using that to say conditioning on what I've observed, what must be true about these individual object types or patches?
If anybody's heard of the Rational Speech Act model, made popular by Noah Goodman and Mike Frank, this is another place where probabilistic programs have been used pretty extensively-- this idea that when you're trying to think about pragmatic inferences, when the speaker is thinking about what to say to a listener, to follow Grice's Maxims of be informative, but not overly much so, there's certain things that they can assume the listener will assume about their speech. And likewise, the listener can assume the speaker is going to speak in a way that's informative.
So for instance, if you have this listener over here and a speaker that's looking at two apples that are red, and one in a bag, and says some of the apples are red. Well, if the listener knows that there's this unknown piece here, it's reasonable to think that a third apple would be red as well. And so that's a reasonable inference. But on the other hand, if the speaker had access to all of these apples, it's not a reasonable inference. And this sort of model, using probabilistic programming languages, can capture the pattern of people's behavior.
So if you're interested in this sort of modeling and going further with it, there's a lot of resources on the theory. I'm not going to leave this up for too long because you have access to the slides and a lot of resources for practice. We've got a lot more examples, a lot more instances of how this model can be used across a wide variety of domains, including physics, social cognition, decision-making, et cetera. So with that, I want to thank you for your time. I apologize for going a bit after. And if you've got any questions, come see me.