Perceptual Organization From a Bayesian Point of View
Date Posted:
April 24, 2017
Date Recorded:
April 21, 2017
Speaker(s):
Jacob Feldman
All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Jacob Feldman - Rutgers University
Abstract: Perceptual organization is the process by which the visual system groups the visual image into distinct clusters or units. In this talk I'll sketch a Bayesian approach to grouping, formulating it as an inverse inference problem in which the goal it to estimate the organization that best explains the observed configuration of visual elements. We frame the problem as an instance of mixture estimation, in which the image configuration is assumed to have been generated by a set of distinct data-generating components or sources (``objects''), whose structure, locations, and number we seek to estimate. I'll show how the approach works in a variety of classic problems of perceptual organization, including clustering, contour integration, figure/ground estimation, shape representation, part decomposition, object detection, and shape similarity. Because the Bayesian framework unifies a diverse array of grouping rules under a single principle, namely maximization of the Bayesian posterior---or, equivalently, minimization of descriptive complexity---I'll argue that it provides a useful formalization of the somewhat vague Gestalt notion of Prägnanz (simplicity or "good form").
Joint work with Manish Singh, Erica Briscoe, Vicky Froyen, John Wilder and Seha Kim.
JOSH: So today's speaker is Jacob Feldman over here, who I'm very pleased to have both for our center as a whole but also personally. Jacob was a graduate of our department. And he was just finishing up his PhD thesis when I started here, actually kind of in the summer before school. And we spent a lot of time together. I helped out a little bit on some of his research on blickets of the day. And I think we wound up having the same PhD adviser. And I think in many ways, Jacob set the ideal example for me of what it was to think hard about an important problem as a student in this place. And he has done that, and continue to do that, about many of the same problems and expanding array of problems for his entire career since then.
He has received a number of honors. He received the Troland Award I think, right? Kind of about the highest honor you can get for doing formal models of cognition. He has published papers with Nature, a number of other great places, pretty much done everything you can do.
But in terms of the deep stuff of his research, I think one of the things that I like most about his research and find inspiring, if you look across cognitive science, whether you're doing formal computational models or not, there's often a divide between people who think about representation and people who think about process or algorithms or inference. And Jacob has really taken both of those seriously and deeply and thought about deep questions of representation, how does the mind represent the world, and what are the principles of inference by which-- what makes a good representation and how does the mind figure that out. And just whether he's been working in perception or concept learning or the questions of what's an object or what is an agent, he's consistently and in some of the deepest and really illuminating ways tried to get at the intersection of inference and representation that's really at the heart of the big questions of cognitive science.
And I think from what he's going to talk about here, we're going to get to see at least a sample that in his work on perceptual organization. So let me hand it over to Jacob. And welcome.
[APPLAUSE]
JACOB FELDMAN: Well, thanks, Josh, for the incredibly flattering introduction. My computer is not waking up. It's incredibly nice to be here. Again, as Josh said, I was a grad student in the BCS department back in apparently the first year that it was called BCS. I didn't know this when I came, but it was called Psychology up to the minute I got there.
And of course when I say "it," it was several buildings ago. So the whole environment has changed tremendously. BCS was in E10, which the younger folk here probably don't even know what that is. But it's I think a parking lot or something now over on Amherst Street.
JOSH: No. Now it's this beautiful media lab.
JACOB FELDMAN: Of course. Well--
JOSH: A new building.
JACOB FELDMAN: New building, OK. I'm not surprised. So they've carefully erased every trace of my presence on campus.
So I'm going to talk to you today about perception organization, which is one of my favorite problems. As Josh said, I've worked on a lot of other things too. But this is, in some sense, the problem that I care about the most.
Perceptual organization is a problem that has a whole history, a tiny bit of which I will tell you about. Before I get to what I'm personally going to tell you, let me just mention my collaborators on this work, especially Manish Sing, but also Vicky Froyen-- these are students-- Seha Kim, John Wilder, Erica Briscoe, and a bunch of others. But most of what I'm going be talking about today is joined with Manish as my colleague at Rutgers.
So perception organization is the problem by which we put the world together into objects. So I'm going to be talking about in the visual domain, though the term is broader than that. The world doesn't arrive at our senses in nicely packaged groups. Instead, we have to do the grouping. And this is an old problem. But it's sort of fundamental to how we see the world.
So this is kind of a trivial example. Here I'm showing you, I think, it's 98 dots. But, I mean, most people subjectively see them as being in two groups or two clusters or something like that. And it's pretty intuitive, I hope people can get, that that isn't really given, meaning that no matter how obvious the grouping is that that's not inherent in the data. It's a mental organization that we subjectively place over the data. In that case, a particularly reasonable one. But you might ask, and that's what I'm going to ask, what exactly are the principles that make it reasonable?
So it's not just clustering. There are other more subtle aspects of the problem. Like, here we have one simple blue region, or green on my screen, region. But it can be interpreted a number of ways as having a number of distinct objects. So here you might see it as being some hands. Is it one hand, or is it two hands? Is it one blob, or is it multiple blobs?
You can see that it's not just about grouping things. It's about finding the most coherent or reasonable interpretation of, in this case, the bounding contour, where the bounding contour seems to sort of organize itself into a second hand. And that's intuitively obvious. But it's not obvious what principles we use or what computational mechanisms we use to figure that out.
So what I'm trying to say is that grouping is an essential aspect of subjective experience, meaning we see the world as being made of objects. And the question is, why? Because the world is not actually literally made of objects. I'll talk about this a little more later. One might think that there are sort of physical bases for the objects that we see. But it's essentially a subjective aspect of organization. At least, that's part of the case that I'll make for you.
Grouping is also sort of important in the more ordinary senses-- like, it influences other aspects of perception. So there's a huge amount of literature in the history of perception on studying simple local processes, like motion and color, and some non-local processes, like attention. But it turns out that perceptual grouping influences how those things work in profound ways, to the point where in many cases you can't really understand even a simple thing like what color something is perceived to be. I mean, Ted Adelson has done work here that illustrates this very strongly. You can't even understand what color something is perceived to be unless you understand something about the perceptual organization of the surface upon which the color appears, et cetera. And it goes without saying that our unitization of the world influences later cognitive processes.
But organization is basically subjective. And by subjective I don't mean that it's arbitrary. I just mean that it's imposed by the mind. So here, this is just a bit of art that illustrates that point kind of vividly. You have lots and lots of red, white, and blue dots. But of course, most people are going to see this as two human figures. And it's obvious, if I can use the pointer here, that some of these blue dots over here are really part of the same person as the blue dots over here. And whether or not you actually recognize the original photograph, it's really not obvious why those blue dots appear to group with the other ones. It's not literally proximity, it's not contiguity, or any sort of obvious superficial principle. It's something deeper.
So on the point about this being subjective, let me just introduce an expert witness, Einstein, who made this remark. He of course was not an expert psychologist, but this always struck me as amazingly relevant. He said, "Out of the multitude of our sense experiences we take, mentally and arbitrarily, certain repeatedly occurring complexes of sense impressions-- and we correlate them to a concept, the concept of the bodily object. Consider logically this concept is not identical with the totality of sense impressions referred to, but it is a free creation of the human mind."
So without dwelling on the philosophy of this, which is a huge topic, what I want to talk about for the rest of the talk is, in what way does the human mind create that concept, if you want to call it? We wouldn't use the word concept nowadays, but in what way does the brain create that concept?
OK. So the traditional answer is these Gestalt laws, which were invented in the 1910s by the Gestalt psychologists-- most people have heard this story-- and have recurred in endless textbooks since then, often in simplified form, to the point where it looks as if they were written on a stone and brought down from Mount Sinai. So the Gestalt laws were principles for organizing elements into wholes. And the word "Gestalt" is supposed to emphasize the idea of whole, which the Gestaltists thought was important, above and beyond local operations, as we would call them now.
The Gestalt laws are often reduced to, like, three or four simple principles, like proximity and similarity, common fate. I mean, most textbooks, if you read them literally, just give these laws as, you know, these are the Gestalt laws. Of course, if you actually read the original Gestalt writings, it's not nearly that simple, and there aren't clearly denumerable laws. And it's also really unclear exactly why there would be just a few laws, where the laws come from, whose idea were the laws, what are the underlying principles.
Of course, the Gestaltists didn't have the computational principles or the neuroscience that we have. They did have some amazing insights. But the question of how these subtle aspects of subjective organization get crystallized down to a few simple rules was mostly mysterious if you read their writings.
In some versions of the Gestalt principles, there are as many as 114 separate laws. I haven't read all the original Gestalt writings, but I once picked up the paper that actually counted the 114, which quoted some of them. And some of them-- of course, they're all written in German in the original. And if you read them in English, they're really difficult to understand exactly what sentiment is being expressed by these prose versions of the laws. And of course some of them sound very similar to each other. It's hard to imagine there really are 114 distinct principles.
And the Gestaltists themselves, of course, thought that also. So some of them tried to articulate a single underlying law that sort of was the uber-law over all of them, which was sometimes referred to as Pragnanz. So Pragnanz is one of these words that's not usually translated. It means something like "good form," although I'm sure there's a German speaker here who can give me a better translation.
But it's deliberately not translated, because it doesn't really have a clear correlate in terms of a definable meaning, certainly not a clear computational meaning. It's sort of generally the idea that some organizations, like the one that induces you to see the two kissing figures or the one that induces you to see the two hands, some configurations are more appealing to the mind, more coherent, or perhaps more simple. So simplicity is an idea that's often associated with this. Sometimes simplicity is given as almost a synonym for Pragnanz. But simplicity itself is very hard to define. Or at least it was until relatively recently, when as many of you know, a lot of modern advances in the theory of simplicity, some of which I'll mention later, have crystallized these ideas a little bit.
But the point is that all of these laws are a little bit unmotivated. So again, the Gestaltists were very insightful, but you really can read in vain for a clear articulation of any of the laws, much less a clear computational version of them or a clear motivation. So that brings me to Bayesian inference, which probably needs very little introduction to people around here, which is sort of the epicenter of Bayesian inference in cognitive science.
But briefly, Bayesian inference is a provably optimal, in a certain sense, method for determining belief under conditions of uncertainty, which perception kind of generally is, and perceptual organization really, really is. It's extremely uncertain what organization to adopt or to perceive. And Bayesian inference is kind of a general method, if only you knew how to apply it to figure out what interpretation of data is the most reasonable.
So Bayesian inference involves generative models that define likelihood functions. I'm going to say all this fast, because I assume people have seen all this stuff before. Prior probabilities, which assign degrees of belief to each hypothesis before considering the data under consideration, such as the visual image. And from those, you derive, by Bayes' theorem, a theory of probability, which is supposed to assign a degree of belief to every possible interpretation of the data-- that is, every possible model that might explain to some degree the data.
So, the general claim of Bayesian theory, as applied to human cognition, is that we ought to believe each hypothesis or each model, each perceptual grouping, for example, which is where I'm going with this, in proportion to the posterior belief, according to Bayes' rule, the posterior probability, according to Bayes' rule. So this is the math of that. And again, this kind of stuff has become very familiar as it's kind of-- well, it's become very popular in cognitive science very broadly.
And specifically, it's become extremely popular in perception. So there have been many Bayesian models in the last 15 or 20 years of specific areas of perception. Motion-- and I'm just throwing up a few references on the screen here, apologies to anybody that I've left out-- color, stereo, surface shape, which is a little less local. Most of these examples are relatively local perceptual qualities that are the type of thing that were traditionally studied by people other than the Gestaltists, meaning people who wanted to see, how do we perceive a particular color at a particular location?
It has been applied to perceptual organization relatively little. And the problem is that it's very hard to figure out, if we're going to be doing a Bayesian model of perceptual organization, what exactly are we estimating? Meaning that we're trying to estimate what the objects are or what the groups are. But if you don't believe that the objects have definite quality, that some things are definitely part of the same object, which is a whole philosophical can of worms, you're just trying to estimate something like subjective organization, it's very hard to formalize the nature of the estimation problem.
So are we trying to estimate real physical objects? A lot of people in perception do think about it that way. And I don't mean to dismiss that. But, I mean, to me, it's not really a meaningful question whether two dots are actually part of the same contour. So two dots can appear to be part of the same contour if you have a certain model of the contour, and two edges in the actual physical world, you could argue a little more tightly, are part of the same object. But even that is very questionable. Again, it's kind of a long story, which I don't want to completely dismiss. But that, to me, is the wrong way of thinking about the estimation problem here.
In a sense, what we really want to estimate is something like subjective organization. But we need to formalize that somehow, because we don't want it to be just some inchoate thing. So I'm more or less, for the rest of the talk, going to talk about one particular way of concretizing what we mean by subjective organization, which is that we have object generating models. These are generative functions that generate data. And those are what we mean by objects, and we're trying to estimate the best model of the data. This will be more clear when we go on. Specifically, I'm going to talk about mixtures.
So mixtures are a particular kind of formalism in probability theory which, again, has become very popular. And I don't mean to suggest that this is a completely new idea. But mixtures are a very natural way to think about perceptual organization. So, a mixture is a probability distribution which is a combination of a set of components or sources. So here I'm going to give you what's a two-dimensional array of data with two Gaussian source components. And this is a very familiar situation in stats textbooks.
So basically, we have one source, G1, and another source, G2. And G1 is a little circular Gaussian-- this is just an example-- which has some mean, mu-1, and it has some standard deviation, sigma-1. And G2 is another source which has a mean, mu-2, and a standard deviation, sigma-2. And the idea is that each one of those generates data. So you might have some samples that are drawn from G1 and some samples that are drawn from G2.
And the idea of a mixture generally is that data is generated from some combination of these sources. But then, of course, it's not labeled when the observer actually observes it. So here I took away the color-- so here the dots are all black-- to just to indicate that when the observer observes them, the observer doesn't know a priori which dots or which crosses came from which source. And so the problem of mixture estimation is to figure out which dots came from which source, but without knowing what the sources are. So you have to figure out what the sources are, how many there are, what are the parameters of the sources, and that's mixture estimation.
So in this example, the way I portrayed it to you, it almost looks like a perceptual grouping problem, whereas not every mixture problem looks like that, because the data doesn't have to be visual, doesn't have to be perceptual. But you can get the idea of everything I'm going to say for the rest of the talk from this example. The idea is to model perceptual organization as essentially the solution to a special kind of mixture estimation problem.
So estimating a mixture-- this is sort of what I just said, but in general, we don't know the number of components. So you have some data. You have maybe some ideas of what the components might look like, what the formal models are. Like, you might have thought they were Gaussians, but you don't know how many there are. So you can fit it with a bunch of Gaussians-- I'll give you examples in a few minutes-- or you could fit it with just one big Gaussian that explains all the data, but maybe doesn't explain it as well. But maybe it's preferable to do it with fewer components than more components, et cetera. You can imagine all of the kind of standard statistical estimation dilemmas that one is faced with.
Specifically, in addition to the number of components, you don't know which component generated which data. I said that already, but let me just restate that in terms that are familiar terminology to perceptual grouping. You don't know the ownership. I'll talk about border ownership later. Border ownership is actually a term that's used in perceptual grouping to refer to how particular pieces of contours in images are perceived to be, quote unquote, owned-- I'll explain what that means later-- by one object or another object. Usually they're understood as being owned by the figure that they're on the boundary of as, opposed to the ground that they're on the boundary of. I'll talk about that later.
But the terminology is very suggestive because the kind of ownership or labeling that you get in a mixture estimation problem is very similar to the ownership that you got in one of those perceptual grouping problems, like which dot belongs to which cluster in the example, or which dot belonged to which kissing figure in the other image that I showed you.
So generally, estimating mixture means simultaneously estimating the parameters of the sources, which I'll notate theta-1, although I'm not really going to use that notation later on in the talk-- and the ownership of each datum. And the problem is that those two estimation problems interact with each other. You can't just, like, estimate one and then estimate the other. If you change your model of what the generating sources are, you're going to change your inferences about which data points belong with which source, et cetera. And so there's a kind of a cycle of simultaneous estimation, which is very tricky. And that's what makes mixtures a very important and difficult statistical problem.
So in a tiny bit more detail, this is essentially the same example. So here we have the same kind of thing, a bunch of dots generated by one source and a bunch of dots generated by another source. Now imagine some hypothetical new data point, x, which appears. So here now, you can think of this as a mixture problem or think of it as a conceptual grouping problem. The idea is that x is better explained by G1 than by G2, if the likelihood ratio, p of x given G1, is better than, or higher than, the p of x given G2. So in other words, if this ratio was greater than 1, then it seems more plausible that that x was generated by G1 than that it was generated by G2.
And that's a simple way of asking the natural grouping question, meaning what does this belong to, what group does this belong to, in a probabilistic way. And this kind of simple probabilistic effication of the problem is more or less what I'll talk about for the rest of the talk. The math is, as I'm going to present it to you today, is really very simple-- almost a vanilla example of Bayesian inference to a data comprehension problem, a data interpretation problem, as long as one understands that the models are objects.
So basically, that's the idea. The image is imagined to contain data from a variety of distinct sources. And that's what we're going to call objects. There are different kinds of objects, and I'll talk about several kinds in the talk today. But basically, we think of those objects as stochastic data sources generating data somewhat randomly, but under some kind of parametric control in the image. And then our problem is to estimate the parameters of the objects-- that is, things like their shape, form, depending on what kind of object it is-- and the assignment of visual items to objects. That's literally the grouping part of the perceptual grouping problem.
So that's the frame for everything I'm going to say for the rest of the talk. And so now I'm going to just give you a little bit more formalism to illustrate some particular kinds of object [AUDIO OUT] and illustrate a bunch of applications of this idea in different domains of perceptual grouping. And I'm got to say all this really fast, meaning I'm not going to give you details of experiments, et cetera, but just kind of whiz through a bunch of different examples to illustrate the idea of the approach, which I think is very broadly applicable to a lot of perceptual grouping problems.
OK. So here's a more detailed version of something I said before. So the various interpretations that one can place on the data are different groupings. And that doesn't include just the grouping that everybody subjectively sees, but of course, the whole point is to understand why you see that as opposed to other ones. So you might, with these same data from the previous slide, might see all of them as having been generated by a single source. And in a sense, that's a perfectly reasonable interpretation of these data. I mean, the colors here are just for your guidance. Those are not part of the data. So you might perfectly well think that it's just one big cluster, and in a certain sense, it is just one big cluster.
And there, the posterior would be something like [AUDIO OUT] cluster times the product of the likelihoods, which in this case there's only one of, right? But then if you have other interpretations-- like, here you interpret it as two clusters, which of course is what you probably see. But put that aside for the moment. This is just another possible competing interpretation. And that one has a similar equation, but now there's a prior over two clusters, which, as you can imagine from standard arguments, is going to be a lower prior, because there's more parameters, each of which needs a prior. And they get multiplied by each other.
But very broadly speaking, there's going to be some prior for two, which is going to be lower. But then the likelihood function tells you how the data actually fit that model, which is going to fit somewhat better. You're coming closer to fitting the data accurately. And similarly, you could do some kind of crazy thing, where here I think there are seven sources. You can see there's, like, three of them on this side and four of them on this side. Of course, they don't have to subdivide the way the two source interpretation did. I'm just showing it that way for clarity. But then you're going to have a prior on seven sources and this huge likelihood function, which multiplies over all seven.
So the point is that there are many interpretations. And you can see that some of these are more reasonable than others. And Bayes' rule, based in inference, is designed to choose among them-- or not so much to choose among them as to assign belief to each one in proportion to its posterior. So that's the spirit of it. There are multiple interpretations. Each of them involves grouping the world into different units, maybe of different sizes, and there [AUDIO OUT] different posteriors, according to this sort of reasoning.
So here's the story. We're going to have some data. I mean, again, I'm sort of repeating myself, but this is the big picture story. You're going to have some data, which can be points or edges. In some of the examples that I'm going to show you they're points. In others, it's edges. But that's not really a fundamental issue. It's just different [AUDIO OUT] that are generated by different sort of generative models.
And then you have various kinds of generative models, which we can think of as objects classes. There isn't just one kind of object in the world, although it would be nice to have a single central object generating class that sort of covers all types of cases if it's suitably flexible. And in a sense, that's what I'm going to show you. But really, you can imagine many different types of sources. Like, you could have a likelihood function for people and a likelihood function for tools and a likelihood function for cats if you wanted, although that's not really the spirit of the examples that I'm going to show you.
But really, the project is larger than the examples I'm going to show you. It's about, you have some object class types that generate data, and then you're going to try to make the most reasonable, highest posterior interpretation given those assumptions. Basically, this is what I said. Each interpretation is going to be believed in proportion to its posterior. This is essentially Bayes' rule. I'm leaving out the denominator and just indicating proportionality.
I will [AUDIO OUT] in Bayesian theory, we're often coached not to just use the maximum posterior, which is sort of what I've been implying. And you should, in fact, use the full posterior. That means using all the possible interpretations that are possible in your language and believing each one in proportion to its posterior. But in perceptual grouping, there is a long tradition and a lot of debate about supporting the idea that we basically just see one interpretation. So I see a bunch of people and a bunch of chairs. I don't really see other crazy interpretations where there are 300 half-people or quarter-people, or whatever it is. I am not really subjectively aware of those interpretations.
So I'm going to sort of emphasize the map, meaning the maximum posterior interpretation. But for the Bayesians in the crowd, which I know is not everybody, don't take that too seriously, because the full posterior distribution is very desirable in many situations. It's just that in particular examples it's often convenient to focus on just one and [AUDIO OUT] be subjectively of what we're aware of at any one time in the perceptual grouping problem.
OK. So let me just say one general point about the history of perceptual grouping theory. So traditionally there is this idea of the simplicity principle and the likelihood principle. Those are traditional terms in perceptual grouping. I should warn any real hardcore Bayesian in the crowd, the psychological likelihood principle is not the same as the likelihood principle that Bayesians talk about, which is a completely different idea.
But the simplicity principle is just the idea that we see the simplest interpretation, which was traditionally believed by the Gestaltists and people associated with them. And the likelihood principle is often attributed to Helmholtz in the 19th century that we see the interpretation that is most likely to be true. And until relatively recently, nobody really had any idea how to formalize that, until Bayesian theory came into the field.
But this is a kind of standard argument now, a conventional argument. But [AUDIO OUT] 1948 showed-- and I'm going to summarize all of information theory here in two bullet points-- that when you have a set of models, or messages as he put it, that if you want to encode them as briefly as possible, you would encode each message in proportion to the minus log of the probability of that message. And that is a long story, which I know is not self-explanatory in those two bullets. But as a result, the minus log probability is often described as the description length, because it's the description length in an optimal encoding system.
But what that means-- and this is a fairly standard argument now-- is that maximizing the [AUDIO OUT] is the same as minimizing this expression, which is just the log of the posterior. And if you break that down, it's really the description length of the data given the model plus the description length of the model. So in other words, how simple is my model of the data, meaning of the observations, plus how simple are the data conditioned on the model. If I know that model, how complicated or surprising are the data with respect to that model?
So the point of this is that you can see that in this way of thinking about things, the quote unquote likelihood principle, which says believe the interpretation that is most likely, is really the same as the simplicity principle, [AUDIO OUT] this observation in perception in perception is due to Nick Chater. But the math of this preexisted him and is basically a fairly standard argument.
So the reason I mention all that before getting into specifics is that you can think about everything I'm about to tell you as about Bayesian inference in the usual sense. We're trying to believe [AUDIO OUT] according to its posterior. Or you could just think of it as a version of the simplicity principle. We're trying to come up with the simplest model of the data in front of us.
OK. So what are the specific data generating models? So, I'm going to tell you about two. The first is for contours. So with contours, we just have to make a few probabilistic assumptions to say what we mean by a contour. What does that mean? What does it mean for something to be a contour?
And the assumption is simply that we have a series of edges-- I mean, like this long curve. And when you want to pick the next edge, you're going to choose its orientation in a way that is centered on collinear with that edge. And so this graph is supposed to show you the orientation of the next one relative to the previous one. It's centered on zero, and then it has a roughly bell-shaped or roughly Gaussian distribution. In orientation, one typically uses the von Mises distribution, which is very similar to a Gaussian. It's the circular version of a Gaussian. But it looks like this. This is actually a von Mises, not a Gaussian. But they're hard to even tell the difference.
But basically what it says is that contours generally continue straight. And with some lower probability they turn left or turn right. And with extremely low probably, they turn sharp left or sharp right. But generally they continue straight. And that's all that says. And you can get a lot from that.
So what that means is that you then choose a series of edges. It's a Markov chain of orientations, meaning that each one is chosen independently, conditioned only on the orientation of the previous one. So that's not a great model of contours, for those of you who study contours in detail. It's a very simplified model of contours. But it's surprisingly fruitful. And the point is, it puts some probabilities on the form of the contour in a very reasonable way-- meaning in about the most vanilla way that you could.
So, what that means specifically is that we can assign a likelihood to each contour. So these alphas are supposed to be the so-called turning angles. That's the angle of each edge relative to the previous edge. So alpha equal 0 is straight. And the likelihood of a contour is proportional to this expression, which is just the exponentiation of the sum of the squared turning angles. And that's just what you get from the math, again, in the simplest possible way.
So that means that if you have a series of edges-- so here I'm doing it as edges. You could also do this as dots, where every two dots defines an orientation-- then you can say what the likelihood of those edges are under the contour model. Of course, there may be other competing models-- I'll show an example in a second-- of those edges. But that's the likelihood you get for that model.
And that allows us, in turn, to assign posteriors [AUDIO OUT] ways of grouping the edges, which is one of the standard problems in the perceptually grouping literature. So here's just one quick example of an experiment that I did a really long time ago. This work goes back into the mid-90s, just to illustrate the scope of time. So in this experiment, I asked people to just look at dots like this and say, is that one contour [AUDIO OUT] contours? Those are the only choices.
So with the example that you're looking at, it looks more like two contours. But it's obviously a very subjective or very unclear decision. But if I showed you-- I don't think I have it on the side, but if I showed you six in a row, then you would say it looked like one contour. But with three in one group and three in another group, it looks like two contours. And that's what subjects would typically say in that configuration.
So measuring the angles here in this weird way, which I did back then, you end up with the likelihood functions that look like this. So the likelihood function for dividing the dots up this way, in a group of three and another group of three, looks like this. And this likelihood function then refers to those sum of squared turning angles that I showed you in the previous slide. You could also group it into two in several other ways-- like five and one, one in five. Or you can group them all into one contour. And then it's one big likelihood function. The way it's written here, likelihoods are computed from [AUDIO OUT] of all of them, because you want to get all of those consecutive alphas, each of which is from three dots or a series of overlapping windows of three dots at a time, if you can visualize what I'm saying.
But the point is, you actually get numbers out of all of this. And you can actually figure out what the likelihood of the two contour [AUDIO OUT] is, or the likelihood of the one contour interpretation for each of these cases. And again, to make a long story short, it fits what people's judgments are very, very closely. So these are just these plots I'm just going to flash up at you. After you compute the posterior and compare it to the human data, you can see it captures some actually surprising aspects of the data-- like, for example, that these curves go up at the tails, which is something I did not subjectively anticipate at all before doing this modeling. But the point is, it fits human intuitions pretty well. Go ahead, Sam.
AUDIENCE: [INAUDIBLE].
JACOB FELDMAN: Well, objects are much more complicated than individual contours.
AUDIENCE: But the point about complexity is, the point about the contour generation model that seems fundamentally at odds with our object knowledge because many objects have [AUDIO OUT] contours. But in fact, if you're biased towards a line, then you're [INAUDIBLE] long straight line, right?
JACOB FELDMAN: Yeah. That's right.
AUDIENCE: And likewise, objects tend to terminate with high probability-- object contours can terminate with high probability.
JACOB FELDMAN: Well. OK, let me answer that question in two ways. So first of all, there's a long way between this model and a realistic model of real-world objects, which is certainly far more complicated. But on the narrow question of closed versus open contours, I mean, you can make an alternative model of close contours. This one specifically is for open contours, in which case, the maximum likelihood case really is that they go on forever straight. So in a few slides, I'll show you an alternate model that gives you closed contours.
AUDIENCE: But when I'm asking is when people look at these, [INAUDIBLE] Or are they drawing upon this kind of other set of prior expectations about contour [INAUDIBLE] fantastic open contours [INAUDIBLE]?
JACOB FELDMAN: Well, I don't know what you mean by the real world, exactly. But it's true that they are pulling up models that are appropriate for the experimental situation that they're in. And I wouldn't claim that those generalize terribly well to, quote unquote, real world objects. But I mean, that's just a comment about the limitation of the generative classes, which are very-- I mean, I'm going to show you a few more complicated than this, but they're very limited. So I take your point.
OK. So let me just show you a few other things you can do with this model. So one thing that's very natural is instead of looking at individual turning angles, you can integrate the likelihood over the model or taking the logs. It's essentially the complexity over the entire contour. So if you simply-- this is supposed to be a summation over C, meaning C is a contour. Over the length of the contour, add up the log of the probability, the minus log of the probability, you'll end up with a measure of the complexity of the contour [AUDIO OUT] useful in a lot of situations.
And for example, this is what it looks like. So this is sort of what Sam was saying before. A straight contour is the best case, and a very curved contour is a bad case. Obviously other models would give other predictions. But this is a pretty useful measure of contour complexity, which is something that didn't have a nice, neat quantitative definition, as far as I know, before this. But it falls completely naturally as a side effect of the formalism here.
So here's a use we put it to experimentally. This is work by John Wilder. So there's actually a contour hidden in this noise, which you probably can barely see. And John pointed out at one point that existing computer vision methods actually can't even find the contour. But our subjects could.
So here we asked-- I know you can barely see it, but it's up there in the middle somewhere. This is actually exaggerated contrast compared to the stimuli we actually used. Subjects were simply asked to find the contour. And there's just a bunch of black and white pixels. The contour is simply a contiguous region of [AUDIO OUT]
And subjects could do this after some practice. And basically you can model the complexity of this contour, if you can see it there, as by the series of turning angles that make it up. And then you can look at the likelihood of that. And the minus log likelihood of that is the complexity of that. And for mathematical reasons that I'm not going explain in detail, but you can imagine if you're familiar with Bayesian theory, there is a prediction that subjects should be better able to see contours that are simpler. Basically, the simpler they are the more easy they are to differentiate from noise. So noise has very high complexity and essentially blends into the background. So if the turning angles zigged and zagged all over the place, it would be extremely difficult to tell the contour from the background noise. And so that's essentially the idea behind the math that backs it up.
And so that's a simple empirical prediction. You can see it's basically true. So as the contour DL increased, people's performance-- this is a two-interval forced choice task-- decreases with complexity. There's a ton of noise in this data because the task is very hard in this data. But just to make it a tiny bit more impressive, here are individual subjects. And you can see the trend in almost all of the subjects.
So again, there's a million details I could go into. But basically the idea is that patterns, by which I mean objects, are easier to detect the simpler they are. And you can quantify their simplicity with respect to their minus log probability in a kind of vanilla Bayesian theory. You could improve it tremendously with a better model of the objects. And this is a really bad model of contours, especially closed contours. But I'll get back to closed contours in a few minutes.
So another much more sophisticated but still pretty limited contour generating model is axial models, or as I'll call them, skeletal models. So here, from this very famous idea by Blum, you know, that you can think of summarizing the bounding geometry of an object by this thing that is sometimes called the medial axis or the skeleton. And this is an example of Blum's traditional medial axis transform, where you get this thing that sort of looks like the skeleton of the dog.
The idea that this is important is related to David [AUDIO OUT] famous idea, again from MIT, from many, many years ago, that a lot of what is mentally represented about object shape is in the axes. It is in the structure of what is essentially the skeleton-- not literally the bones of the animal, but something actually pretty close to the bones.
So he made these pipe cleaner animals that people may have seen pictures of, where the idea is that you could easily recognize it. Say, this is a giraffe or this is a deer or this is a dog. And you can easily recognize them, even though from almost any normal vision sense, they don't look anything like the animals they're supposed to depict. They don't have the same color, the same texture, the same bounding contour even. All they have is the axial structure. And so it's a really desirable feature to put into a shape theory, and it's basically a simple generalization of the contour model that I've shown you already to do that.
So the idea is that we have a generative model of axial shapes, which is built on the same type of von Mises structure of the sequence of turning angles. But it's going to be making the skeleton, not the contour. So the math is almost the same, but the use is a little different.
So this is a skeleton. And the idea is that from the skeleton, sprout these random deviates that go to both sides of each skeletal piece. Those are not pieces of the skeleton. They're just random deviations in the same way, like the Gaussian error in a dot cluster. And so the structure of the skeleton is that. And then once you produce these random deviates, you get an edge born at the end of each arrow, so to speak, at the edge of each random deviate, and that's a shape. And that's a stochastic model for shape. That's [AUDIO OUT] a model for shape, which gives you the likelihood model-- the probability of a shape conditioned on a skeleton.
So we're going to have a prior on skeletons, prior on the structure of the skeleton itself, and then a likelihood model for shapes being generated from skeletons. And that is going to allow us to get a posterior the probability of a skeleton as a model for [AUDIO OUT], meaning of the contour data that you've observed.
And again, just throwing some terminology at you, what we call the MAP skeleton is the maximum posterior skeleton, that is the skeleton that has the highest posterior, meaning the highest product of the prior and the likelihood, as an explanation of that data. So for example, that might be the MAP skeleton for these data. I'm showing it to you in the forward direction. But the main question is in the inverse direction. If you just get the outer boundary, what skeleton do you think generated it?
I don't literally think that skeletons generated the skin. So sometimes people ask me, how literally do you take the model? I mean, that's not actually how animals are made. I learned the birds and the bees a few years ago. And that's not actually how the process works. But it is a probabilistic model of how bounded contours are made from an imaginary skeleton.
And similarly, as I said before, you can think of this in terms of complexity. And in my lab, we always talk about these things in terms of DLs, not in terms of probabilities, because they're easier to work with because when you take a log, you get smaller numbers. But conceptually there's really almost no difference. You take the log, and everything is in terms of the simplest model. You don't take the log, and you think about everything in terms of probabilities. It's the same idea.
So, with a few of the mathematical terms thrown on, I've already really told you the main idea. We are going to give a prior for each skeleton, which is based on the idea that the-- C here stands for the contour-- that's a bad term-- for the axis itself, the structure of the axis. So the p of each axis is that same sum or product of von Mises that I showed you before. And then there is a certain probability for each axis to be born. So in other words, there's a cost. Once you take the minus log, there's a cost for each axis to be born.
And so that gives you this nice prior that has this form. If a skeleton has N axes, then it's N times the log of the prior on a single piece, a single axis [AUDIO OUT] sum of all the log probabilities of the component angles within each axis. And just as before, you can think of that as, once you take the logs, as dividing up nicely into the branching complexity and the summed axial complexity.
So like I said, this is a generalization of the contour. With a single contour, it's just the second term. With the branching version, you get this the first term too. And it's very intuitive. Like, it just falls out of the math, but it's what you sort of expect. And you get something that looks like this. So a single axis-- again, think of this as a skeleton now, not a contour-- has the highest probability, the highest prior, the lowest complexity. And as you add branches or curve the component axes, you get more and more complicated skeletons. And again, intuitively, everybody I think would agree that this is a more complicated skeleton than this, and that just falls out of, again, this sort of vanilla math.
And the generative model for a shape condition on a skeleton is a [AUDIO OUT] cartoon of this a few slides ago. But again, just to give some of the math, you start with an axis or a set of axes. And then once you have the skeleton, you generate these random deviates. And there are a few assumptions we make about the probability distributions governing exactly where the edge falls and exactly what angle each of these random deviates [AUDIO OUT]. In other words, they don't just go out perpendicularly. They go out at an angle that is centered on perpendicular but has some von Mises uncertainty around it, et cetera. And there's a million other details I'm not telling you which are in the paper and, to be honest, quite a few details that are not in the paper. But then you get a shape from that. So that's the forward model.
And what we want to do is figure out the most probable posterior-- that is, the highest posterior skeleton. And I will illustrate that with the story of, what is it, Goldilocks. So this is a bear. And this bear is being explained by a single axis, which is too simple, right? One axis doesn't give good fit to the data. It's underfit the data, if people are familiar with that term, which I imagine is something [INAUDIBLE].
And here is a skeleton that has many pieces-- too many pieces. And it has overfit the data. And somewhere in the middle is a skeleton that is just right, right? And that's the skeleton that has just one axis per part, which is certainly not perfectly true in the examples I'll show you. But you can see it's approximately true here. Each limb-- the color-coding here indicates the different axes. Those are different in the model, right? They're separate axes in the model. So the fact that each of the limbs is one uniform color here, different color in each limb, illustrates that it's basically getting the bear right.
And that's a feature of this way of decomposing shapes into parts, which is generally not true of the traditional medial axis, which for those of you who have ever worked with it, has lots and lots of spurious axes that don't correspond to parts. As soon as there's any noise on the data, noise on the contour-- think of the traditional medial axis as overfitting the shape, except that when Blum invented it, nobody was thinking about these things probabilistically. But it's very natural, I think, to think about it probabilistically and say, if your skeleton mirrors every wrinkle in the skin of the animal here in the contour, then you're overfitting. But somewhere in the middle you [AUDIO OUT] and Bayes' rule is supposed to tell you where that perfect fit is. So that's the map skeleton. It is, quote unquote, the best explanation of the shape.
And you can see-- I'll just say this briefly-- that this gives very reasonable intuitions for part decomposition within shapes. So there are many traditional principles of part decomposition which is a whole literature, lots of controversy about exactly which rule applies. The most famous rule by far, invented here at BCS, back before it was called BCS, was the minima rule by Hoffman and Richards, which basically says that these deep concavities are the points at which you tend to divide shapes into parts. That's a principle that seems about right, but nobody ever really understood it because it's sort of nicely motivated, but it's very hard to do computationally because you have to pair up one concavity with another concavity in order to make a so-called part cut and divide the finger off from the hand. But exactly how to pair them is not obvious and not really entailed by the theory.
But here you see if you get the skeleton which is the map skeleton anyway [AUDIO OUT] it up into approximately the right, the fingers [INAUDIBLE]. It's not perfect, especially because we cut off the hand in an arbitrary way. But it's intuitively approximately right. And all of these other principles, like the minima rule and shortcuts, which is a principle invented by my collaborator Manish Singh-- that is, that the cuts, where you cut one part off from the main shape, should be as short as possible-- You don't need any of those principles. They all fall out as side effects of the one golden rule, which is Bayes' rule.
So here are just some examples which I'll say briefly. So here's a cat, which has been decomposed into approximately a reasonable [AUDIO OUT]. And when I say reasonable, again, I mean that each of the limbs is its own color. It's not really perfect because it does group the head all the way into one long piece all the way up to the tail, which you may or may not agree with, but it's not as unreasonable as some other interpretations. And here are some other examples. Again, you can see it does pretty well. [AUDIO OUT] is available on the web.
OK. So back to the main story, though. You want to use this generative model, meaning axial models, as a piece of the larger story of how we group things. So here is a kind of axial shape, same one I showed you before. And here is another axial shape. And you might have a map skeleton for that that looks like that. But if you're applying estimation processes to the entire image, there's also axial structure in between the shapes. So you don't a priory know where the figure is and where the ground is. So if you see axial structure between, that also can explain some of the contours but, so to speak, from the wrong side, as we know subjectively. But how does the theory know that?
So one way to think about this is it's an approach to the figure ground problem. So this is similar to what I said about a single dot being clustered in one cluster or the other cluster, you can take an edge here and say, is it better explained by this axial structure to its left, or to our left from our point of view, or the other axial structure to the right? A priori, you don't know which one is figure and which one is ground. But the question is, which one owns this shape, this edge? Meaning, which one explains it better?
And the general idea is that given a whole contour, a whole boundary, whichever side explains it better is the side that's interpreted as figural. So again, this kind of falls out as a side effect of the approach that the figure ground problem falls out. I hate to say that because it sounds so grandiose. There are many aspects of the figure ground problem that this doesn't even touch. But the assignment of an edge to one side or the other, to the figure side or ground side, sort of follows that whichever side explains it better is interpreted as figure. And again, that's what's traditionally referred to in the figure ground literature as border ownership. And that's almost the same terminology that's used in the mixture estimation literature. It's data ownership.
So here is, again, very briefly an experiment kind of demonstrating this. So Seha Kim and I did this experiment where we put a little probe down at a boundary between two objects, two differently colored objects. The probe, if you want details, is a little shimmering vibrating thing which we then asked the subject-- it's kind of clever and kind of amazing that this works-- this thing, these two frames alternate very rapidly, and you see a little vibrating divot on the border.
And you simply ask the subject, what color is vibrating? And they answer whatever is figure, because, quote unquote, figure owns the border. So if you think it's the blue region, then they think that blue is vibrating, though of course physically, both [AUDIO OUT] are vibrating equally. The figure itself is symmetric. The divot itself is symmetric.
So we had an axial figure on one side and a symmetric figure on the other side, which we made by just reflecting the boundary around an imaginary symmetry axis. So one side is always symmetric, which is a clue to figure. And the other side is axial [AUDIO OUT], which we manipulated. And you know, again, the idea is, with these very simple equations, you can basically say whatever the difference in these two description lengths tells you which side is better explained. In other words, there's a skeleton on both sides, and whichever skeleton explains it better, that's the side that they're going to see is [AUDIO OUT].
You know, it basically worked. I mean, this data is incredibly noisy because there are a number of other factors influencing the figure ground determination. But if you can believe this data, which I mostly do, then you can see that as the DL increases on one side, on the side we're manipulating, the probability of the subject saying that that side is figure decreases according to the magic formula. So again, I'm not going to go into any more details of the experiment. But basically it sort of works.
So the idea is that you can group an entire image this way. So algorithmically, this goes way beyond what we've actually done. But just to illustrate the idea, you know, if you have several skeletal objects in the image, when they're both there and they project to some set of edges, you can basically try to start figuring out which skeletal pieces explain the data. And one way to do this, not the only algorithmically way, is to start peeling off [AUDIO OUT] well-explained by one axis. So if you peel off this piece which has the strongest axial posterior, so it's interpreted as an object which then, ipso facto, is in front, because you're seeing it as figure. In other words, it's explained without any occlusion.
So it's interpreted as in front. And that leaves the rest of the data to be explained by some subsequent axial model. And then that is interpreted as behind. So you can see that you get a rudimentary relative surface depth. You don't get any actual depth or anything, but just relative surface depth. There's lots of limitations to this. Like, here the skeletons are all two-dimensional. There's no three-dimensional skeletons at this stage of development. But imagine how this allows you to not just group things but to also sort of figure out where the surfaces are and how they lie relative to each other.
So, same corollary of what I said before about the description length of individual contours. Just as you can do that for individual contours, you can compute a shape complexity measure for whole closed shapes. And so we did. So again, John Wilder and I did a very similar experiment to the one I showed you before. But now where the shapes were, instead of being little contours embedded in noise, they were whole shapes embedded in noise. And again, I'm sure people probably can't see the shape. Anybody see the shape? What is it?
AUDIENCE: Cat.
JACOB FELDMAN: Yeah, it's the same cat I showed you before. I like that cat. So there's the cat. And they're very hard to see these shapes. People can see them. And it gets a little complicated. We now have two complexity measures. This relates to Sam's question from before.
So one is the simple contour complexity, as if that shape was simply seen as a long open contour which it happens to close. But that's not part of the model. And the other is the axial or skeletal shape complexity measure, which you get from the skeletal model that I just showed you. And those give you two complexity measures which are not statistically independent of each other. But they are kind of conceptually independent. And they do give different numbers. They're somewhat correlated.
So again, I'll just show you some data very briefly. And again, very noisy task. But you can see, it turns out this doesn't look super-impressive from this graph, but both of these turn out to be, quote unquote, significant, meaning that they are both statistically meaningful factors in people's judgments. The more complex [AUDIO OUT] both in the contour sense and in the skeletal sense-- meaning in the global shape sense-- the harder it is to detect. And again, that comes straight out of the theory.
So, you know, now it's getting kind of repetitive. I guess I've given the main idea. How much time? I know it's five [INAUDIBLE]. Yeah. I can do most of what I was going to do in that.
OK. So this is worked by Vicky Froyen. And this is a nod towards something more like a performance theory. What I've been giving you so far is essentially the competence theory, meaning I'm sort of stating it in the abstract without any algorithm to actually compute these things. Of course, all the examples were computed by various algorithms. But more recently, Vicky figured out in a fairly tractable way to actually compute some of the stuff.
So it's based on a technique called Bayesian hierarchical clustering, due to [AUDIO OUT] was a BCS student when I was here, and Laurie Heller-- [INAUDIBLE]-- we're the theory is essentially the same. But let me just skip ahead to the algorithmic part.
So basically, this is actually their model, but ours is the same for perceptual grouping. Basically you're trying to cluster some data, and you build a tree based on a kind of Bayesian criterion for whether things are joined together into a cluster in the tree. And you end up with this hierarchical tree, which you can then analyze for how it groups the data points into clusters in a hierarchical sense. So they have this thing called tree slices.
So tree slices are basically what level [AUDIO OUT] tree. So this is the overall tree that you might have gotten from these data. And slicing it up here, you get two clusters. And if you slice it down there, you get three clusters. So you can see the hierarchy. Because one of the clusters has subdivided into two. And again, it's based on a finer and finer modeling of the data. And if you slice it down there, you get more, et cetera. I said that wrong. It's one up there, then two, then three.
So we apply this idea basically in the perceptual grouping context. So you can see on the side here, if you can see where I'm pointing, these are dot figures similar to the ones I showed you from the experiment before, where you can see this as a smooth contour or maybe as several pieces or several pieces with a corner, et cetera. And if you can see the tree here, the tree basically gives the decomposition right, in the sense that if you take a slice at the middle level, at the maximum posterior level, you get the subjective division into two different pieces that you actually probably perceive here, like the red and the blue section. So it kind of fits intuitions. At least this is a bit of [INAUDIBLE] psychophysics.
And on the data that I showed you before, it gives this kind of impressive fit to the human judgments that I showed you before. So the model predicts a posterior for every decomposition, and people's judgments are in proportion, in a nice linear proportion to that posterior. Here's a slightly richer problem. But it's the same technique exactly, dividing this set of dots into some sort of skeletal model. So here the skeleton has a slightly different representation from what I showed you before. It can have separate pieces. But basically you have all these dots around which define a shape.
And then at various levels of the tree you can slice it at various levels into being one shape or two-- one whole shape with just one part, or various ways of subdividing into multiple parts. And here the color-coding is supposed to indicate the various subdivisions, the model estimated for these different shapes. And you can see they're highly intuitive. Now, these are super-simplified shapes, meaning these are not realistic shapes and natural images or anything like that. But at the level of a gross theory, it gives a sort of sanity check for the idea that if you have a reasonably appropriate probabilistic model of, in this case, parts of a shape, you get reasonably plausible intuitions out of the posterior.
Another you get sort of for free with the Bayesian theory is the posterior predictive distribution, which predicts missing data. So predicting missing data is a huge topic in machine learning, statistical learning. But what it means in perceptual grouping is what's called shape completion. So if you have, for example, one of these shapes down here, stimuli down here, where there's one shape occluding another, if you use, using this BHG technique, you make a model of each shape, it entails a predictive distribution for the data that is missing-- that is, the data that's occluded behind the shape.
And here, again, I'm just throwing examples at you. This model correctly predicts that you're going to sort of complete the contour over here. This is the missing stuff. And so, in other words, when you see a blue shape completing behind the red bar, that's like saying that your posterior predictive thinks that there is data that just sort of completes this straight. But it doesn't just do everything straight like a contour. It's based on the implied skeleton. And there are various examples that show that it's not always what is just predicted by a smooth continuation of a contour. So it gives a solution to that. And there's a whole literature on perceptual completion, which doesn't even connect very closely to the figure ground literature or the clustering literature or various other literature. But here you can see them all as side effects of the same basic formalism.
So that's all I have time for explaining in terms of specific areas of application. Let me just briefly mention a few other important areas of application I haven't mentioned. So one is 3D surface shape. So Seha Kim who was mentioned on a previous slide, her PhD thesis from a few years ago extended this to sort of predict the surface implied by line drawing. This is one of these problems that modern computerization doesn't pay a lot of attention to, because we're just so used to working with full color, full texture, naturalistic images. But to me, it's an embarrassment of cognitive science that in 2017 we have no idea how people infer 3D shape from a simple line drawing like this.
And the idea is, something like Pregnanz, this contour implies a 3D shape that seems to have certain surface points closer and certain surface points farther away. It's not just a silhouette. And you don't need all the texture and shading cues, all the things that have been studied, some of them here at MIT, that have been looked at in this literature. Just based on the outer [INAUDIBLE] contour and some pieces of important informative internal contour, you can see a 3D shape. And basically we have no idea how that works. But a very complicated extension of the model that I showed you, which I don't have time to explain, gives a prediction for what surface normal people see, which tends to fit human judgments for very simple shapes pretty well. So ask me more, if you want, about that.
And another really important application is shape similarity. So if you have a skeletal model of one shape and a skeletal model of another shape, a very natural [AUDIO OUT]. OK, how similar are the shapes-- another problem that we don't really have good models of. Similarity is another one of these things that is very obvious but is also kind of subjective. so it's very hard to model. There's, again, a huge literature on similarity, not just in shape, where shape matching and computer vision is a whole field, but also in other areas of cognitive [AUDIO OUT] similarity is a very important concept. Josh has given some very important ideas related to this.
So long story short-- and actually I do have slides about this, if people have questions-- you can estimate similarity using an extension of the same probabilistic argument-- in other words, without a lot of new ideas. Just basically, how similar are [AUDIO OUT] in the sense that a Bayesian would answer that question. And you can make a prediction. And so here I'm showing you one graph's worth of data comparing predicted similarity to judged similarity for human subjects.
So in conclusion, perceptual grouping can be thought of as a Bayesian estimation problem. It's an estimate of organization, as long as you can quantify exactly what we mean by organization. And what I tried to convince you of is that essentially it's like a mixture model. There are some weird aspects to the mixture component definitions that are very different in the perceptual grouping problem from what you might see in a mixture modeling statistics book. But broadly, it's the same kind of problem.
You have to make some assumptions about the generative models. And the assumptions I gave as examples are very, very simple. But I mean, a natural way to extend the theory is to give better generative models. Basically, it's a sanity check that, at least for simple examples, it gives the right intuitions. And the general idea is that the [AUDIO OUT], which was formally defined by these vague hand-waving Gestalt principles, essentially corresponds to the maximum posterior-- or, if you want to put it that way, the minimum complexity decomposition into source components. So that's fancy terminology for the basic idea that we want to explain the data in front of us in the simplest possible way.
And again, the idea is that this unifies all these Gestalt principles, as well as a lot of other principles from other areas of perceptional organization that the Gestaltists weren't even as interested in. These are literatures that are dizzying because of the number of new principles that are introduced. Every paper is a new principle. And the different types of perceptual grouping often don't even interact with each other to even see [AUDIO OUT] principles sound kind of similar.
So I know that it's an oversimplification to say it's all Bayes' rule. And I know people at MIT are probably tired of hearing the assertion that everything is Bayesian. But the point is-- I mean, what is attractive to me is the simplicity of the models. You have basically generative models of contours that are just the simplest possible thing you could say about a contour, generative model [AUDIO OUT] that are almost the simplest thing you could say about whole multipart shapes. But then you get approximately the right intuitions out of-- well, [INAUDIBLE].
And it predicts some data. I know we went over the data very quickly. But it also predicts a lot of things that we haven't tested yet. This is a huge project, which I'll probably be doing [INAUDIBLE]. Thanks.
[APPLAUSE]