Statistical learning in human sensorimotor control
Date Posted:
December 11, 2023
Date Recorded:
December 5, 2023
Speaker(s):
Daniel Wolpert, Columbia University
All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Abstract: Humans spend a lifetime learning, storing and refining a repertoire of motor memories appropriate for the multitude of tasks we perform. However, it is unknown what principle underlies the way our continuous stream of sensorimotor experience is segmented into separate memories and how we adapt and use this growing repertoire. I will review our recent work on how humans learn to make skilled movements focusing on how statistical learning can lead to multi-modal object representations, how we represent the dynamics of objects, the role of context in the expression, updating and creation of motor memories and how families of objects are learned.
Bio: Daniel Wolpert FMedSci FRS. Daniel qualified as a medical doctor in 1989. He worked with John Stein and Chris Miall in the Physiology Department of Oxford University where he received his D.Phil. in 1992. He worked as a postdoctoral fellow in the Department of Brain and Cognitive Sciences at MIT in Mike Jordan's group and in 1995 joined the Sobell Department of Motor Neuroscience, Institute of Neurology as a Lecturer. In 2005 moved to the University of Cambridge where he was Professor of Engineering (1875) and a fellow of Trinity College and from 2013 the Royal Society Noreen Murray Research Professorship in Neurobiology. In 2018 Daniel joined the Zuckerman Mind Brain and Behavior Institute at Columbia University as Professor of Neuroscience and is vice-chair of the Department of Neuroscience. Daniel retains a part-time position as Director of Research at the Department of Engineering, University of Cambridge.
He was elected a Fellow of the Academy of Medical Sciences in 2004 and a Fellow of the Royal Society in 2012.
He was awarded the Royal Society Francis Crick Prize Lecture (2005), the Minerva Foundation Golden Brain Award (2010), the Royal Society Ferrier Medal (2020) and gave the Fred Kavli Distinguished International Scientist Lecture at the Society for Neuroscience (2009).
PRESENTER: We have the pleasure of having Daniel Wolpert with us today. I'll say a few words too as a way of introduction. Daniel did, I believe, his graduate work at Oxford. Then, he was a postdoc here at MIT, and apparently, he's having trouble finding where he was when he was a postdoc at MIT, but he was somewhere. The buildings have changed.
Then, he became a lecturer at University College London, back in UK, and from there, a professor in University of Cambridge. After that, became a Wellcome Trust investigator, and now, he's back in US. We're waiting for the time he will go back again to UK, but for now, he's in the US, in Zuckerman, in Columbia.
He has many accolades and awards, among them the one that I just managed to find is Fulbright Scholar. He's a fellow of the Academy of Medical Sciences, the fellow of the Royal Society. He has won the Crick prize and the Golden Brain prize from the Minerva foundation. And I personally am an admirer, because he brings together the blend of doing work at the intersection of brain and behavior and neuroscience that is just wonderful. Bringing together very careful experiments, computational tools, and also engineering tools to really gain insight into what are the underlying computations, in his case, most focused on the motor system?
And he has made some amazing contributions over the years. Some of the papers are golden papers that I have grown up as a neuroscientist with. Along the lines of how does the brain make, for example, simulations to predict the outcomes of what you plan to do? How does the brain combine noisy sensory information with its prior knowledge to be able to generate behaviors that are appropriately optimized for the goals and many things like that?
And I will leave you with, probably, I think what is Daniel's most important contribution, which is why do we have a brain? I heard that he's not going to tell you the answer, so I will tell you on his part. We have a brain only-- and only-- because you need to move, and that's why he studies movement. Without further ado.
[APPLAUSE]
DANIEL WOLPERT: OK. It's always daunting to come and talk someone with a postdoc or a graduate student. I still feel nerve wracking to be in this room. I'm going to talk to you a bit about motor control today. I know that you have almost no faculty in motor control.
You have one now junior faculty at the moment, apart from some senior ones. So please, ask questions, interrupt. I would like this to be a discussion. I don't have to get to the end of it.
But what I'm going to talk today about is really statistical learning, and I'm interested at the moment in how do we learn about objects and build up a repertoire of those objects? And I'm going to tell you some of the studies that took us on to this question. I'm going to talk about what is an object? How do we even learn what an object is about in the first place? And I'll talk about statistical learning of objects.
I'll then talk about how do you learn about the dynamics of a single object, and the idea of control points will become important. And then for the bulk of my talk, I'll talk about the COIN model. It's a model we've developed to deal with how do you learn a repertoire of skills over your life, and it deals with when should you create a memory, how should you express memories you've got, and how should update your motor memories? And finally, if there's time, I'll talk about categorical motor learning. How do we learn about families versus individuals in object manipulation?
So I want to talk about a study, which was a fun collaboration between some visual neuroscientist Jozsef Fiser, computational neuroscientist Mate Lengyel, and myself. Our three groups got together and thought, how can we work out what an object is in the first place? So if you look up the definition of an object, it's a material thing that can be seen and touched, and if I turn that into something a bit more neuroscientific, it's a consistent set of sensory properties and physical affordances. That's what an object is.
And so the question is, how do we learn about them? So if you look at this, I think you'll all agree this is one object here. This is one object, and I think most of you will say that's two objects. You've learned something about objects in your life.
And you would have some feel of what it was like to interact with this object, what it would be like to break this object apart or break this one apart or separate these two. You'd have a feel of how hard or easy that would be. So in some sense, an object links its visual properties to its haptic properties to create a single object.
And the question is how, from this cluttered world we work in, do we ever extract those objects in the first place? Now, if you're a visual neuroscientist, you know the answer. You know there are specialized cues. There's edges and boundaries, and if you work in computer vision, you'll use those cues, and you'll build them in to separate or segment objects.
But our hypothesis is that these specialized cues are just examples of a much more general principle that leads to objects, and that is consistent statistical properties, be they visual or haptic. OK? So those statistical properties, these may be really expressed very strongly here, but maybe you don't need these specialized cues to learn about what an object is.
And so what we're going to do is have an experiment. We're going to create visual or haptic objects defined solely by statistics, with no specialized cues. And then we're going to examine whether you can learn what an object within that modality, and more importantly, having learned in one domain, like the visual, will it generalize into the haptic domain and vice versa? And that will tell you it really has an object-like representation.
So I'm going to pretend you're the subject in the experiment, and I'm going to go through the entire experiment as though you're the subject. And then I'll explain to you what's behind the experiment. So we situate a computer, and we say we want you to pay attention to scenes we're going to show you. We're going to show you 444 of them, and all we want you to do is pay attention. We may ask you questions later.
And so you see these scenes go by every second or so. Each one is a gray square with six symbols on it, and the symbols move around and change. OK? So you watch that, incredibly boring.
Great. Having seen that, we then after that show you two symbols on a gray background and another two, and we say, which one is more familiar? So you have to do a false choice experiment to say which of those two is more familiar, and that's all you do. And then we sit you at one of our robots, and we have these robots-- actually, these were designed actually at MIT by Neville Hogan and used extensively in Emilio Bizzi's lab-- where you hold onto these robots. They can track the movement and generate forces.
And we show you four symbols on a gray background with little clamps, so you can't break this object apart. And we ask people to pull on the robots, and they feel the force as they pull to the level they think would break the objects apart. And we tell them how the objects will break. If you pull horizontally, it's going to break along this tear line. If you pull vertically, it will break along that tear line.
And before we start the experiment, we train them on clearly segmented objects. So before they do the experiment, we show them objects which are red and blue. And when you put them in this vertical direction, it's easier to separate, because they're two separate objects. So it breaks at 7 and 1/2 newtons.
On the other hand, when they pull in this horizontal direction, because you're having to break the blue and the red objects apart, it breaks at 22 newtons. So they get an idea of what it means to break objects apart.
OK. So what's behind the experiment? The idea is that all these scenes are made up out of true pairs. That is, this symbol always appears on the right of this symbol, and this one always appears above that symbol. OK?
Although, I've color coded them here, what we do is we catenate three of these true pairs to make this object. But there's no boundaries ever to be seen between any of these true paths. OK? They're all in the gray background. So the only way you can learn about statistical properties is to learn, for example, that this is consistently associated with this. Therefore, it has more object-like properties.
So then when we go on to this stage, we show them one true pair and a chimera made up of a symbol from two other pairs. And so if I said to you, which of these two objects is more familiar? I would hope you would choose this one rather than this chimera, because it doesn't appear so natural in nature.
And in fact, I was very pleased, because when we were looking for an example of a chimera to put into the paper, I typed in "Unusual chimera," and this is a very famous Bavarian chimera, the wolpertinger. So this is a chimera, which is a mythical beast in Bavaria, in Germany, which has got the face of a rabbit. It's got the horns of a reindeer. It's got wings, and if you really want to go on eBay, you can buy stuffed versions of this, where people actually take them from different animals and create this wolpertinger. Anyway, so the idea is that, if they've worked out what the true pair is, they should respond to this one, because that's more object like, because it's not a chimera.
Similarly, when we come to this stage, we put two true pairs together. And so if they've worked out what objects are, they should pull harder in the horizontal direction for this object than the vertical, because it's easier to separate in the vertical direction than the horizontal. So that's a very simple visual example of statistical learning, where we can test within modality and then a cross modality.
Before I show you any results, I'm going to show you the opposite experiment, which is a bit more complicated. In the opposite experiment, we get haptic experience only. So you see a 2 by 2 symbols like this, and you're asked to break them apart.
So I'm not showing you the robots, but they pull on the object, and then the object comes apart at some force level. And then they pull in the opposite direction, at some force level, it'll come apart. And the force level varies from trial to trial, based on the scene. OK? So they do 96 of these trials.
Critically here, we want to make sure they cannot learn what an object is based on vision alone. We want to make sure it's only available to them haptically to work out what an object is, and to do that, we have to have true pairs and pseudo pairs. This object often appears on the right of this symbol, as often as this symbol appears on the right of that symbol. But unlike true pairs, pseudo pairs are easy to break apart. So they break apart easily.
So we can rate these objects out of mixtures of true and pseudo pairs. So just based on vision alone, you could never work out the difference between a true and a pseudo pair. You can only work it out through haptic experience.
So for example, in this case, it's easy to break the object apart, 7 and 1/2 newtons, because you're breaking it across the two pseudo pairs. Here, for example, pulling vertically, it's easy as well, because you're breaking two true objects. Here, it's the hardest. You're breaking two true objects, 22, and here's intermediate, because you're breaking a true object and a pseudo object apart. OK? And again we go on to test visual familiarity and the haptic pulling.
So the way we quantify learning is how many times-- what proportion of times do you choose the true object over the chimera? And in the pulling, we calculate the correlation between the pulling force and the breaking force it would have actually come apart. OK? So you don't get any feedback during the testing phase.
So the first thing to say is that, within modality, we see strong learning. This has already been shown before. The visual task has been done before, that when you visually expose people, there are about 0.75 correct. So not perfect. It's a hard task.
When we train people haptically, then they show a strong, significant correlation between the pulling force and the actual force they break apart. But the critical thing for an object is that it generalizes across modalities, and we see that, having learned visually, it generalizes significantly to haptically. Having learned haptically, it generalizes significantly to visual, and in fact, we see quite varied performance in subjects.
But the critical thing for us is learning means you're further to the right or further up here. The more you learn in one dimensionality, the better you are on the other, in general. So in general, people can generate a multimodal representation purely from statistical contingencies with a visual haptic.
Now, it doesn't mean that children learn based on pure statistics, because all those specialized cues you have are very informative. But it just means that you can learn based on that. Any questions on this, before I move on?
OK. So I've talked about how you even identify an object. One thing we're very interested in is how you learn about the properties of objects. If you're an engineer, you would learn this object a F equals ma. You'd learn it's a whole thing and how it behaves to the forces you apply.
But actually, when you think about what humans do with objects, and if you're Gibsonian, the things you do with objects really matter. And that's true in terms of dynamics. So for example, we can think of the lip of the cup as being particularly important. When you drink from the lip of a cup, what's important is that soft, compliant relationship between the lip of the cup and your own lip.
When you're putting the cup down on the table, now what's important is the frictional behavior between the bottom of the cup and the table. And so if you're focusing on different parts of the cup, such as the bottom or the top, could you learn different dynamics for the same object is a question, or do you learn the object holistically as one thing? And to address that we have to generate objects you've not seen before, because if we give you an object you've seen before, we're just looking at recall.
We want to look at de novo learning, and so to do that, we use these robotic devices, which can generate forces on you. And so for example, we can have you move with the robot turned off. That's called P0. There's no perturbation, but then we can apply a force field to you.
And a typical force field, actually developed in Emilio's lab, is this force field, which is what's called a curl viscous force field. The force you get is proportional to speed. So the faster you go, the bigger the force, and the force always acts at right angles to your current direction of motion.
And in this case, it's a clockwise field, which we'll call P+, but you can flip the gain here, have a negative sign, in which case you get the field in the opposite direction. We call that P-, and these are going to appear in a number of times during the talk. So P0 is no force field. P+ and P- are opposing force fields.
And the way we measure learning is not by looking at how far you're perturbed away from a straight line, because that can be affected by cocontraction, stiffening your muscle, and it's hard to turn that deviation into a measure of how much you've learned. So rather than do that, occasionally, every 10 trials or so, rather than apply a force field, we apply a stiff channel. So we basically we apply a stiff spring in one dimension, which basically makes the hand go along a channel. And you can measure the forces into the wall of the channel as a mark of the predictive belief about the force.
So for example, on one of these channel trials, you might measure this force into the wall of the channel that the subject generates, and this is the ideal force which would compensate for the force field. We can regress them and say adaptation's at 0.3, for example. And over the course of an experiment, if we just apply one force field, this is the learning curve you get. That as you learn it, the forces-- that the adaptation goes up to about 70% or so.
So getting back to objects, it's been known for a long time, it's very hard to learn opposing force fields with sensory cues. If you give people a red color and a blue color, and you effectively give them opposite force fields associate with them, they can't learn that. They can't switch based on that cue, and in fact, it's very hard to switch between opposing force fields based on cues.
So here we have an object, which the subject is going to control. You can actually see the hand. You can see the handle. This is actually what they see.
They're looking through a semi-silvered mirror. This object, this rectangle is going to move with the hand. OK? And on it are several control points we can ask them to control.
So the first thing we ask them to do is to control the central control point to move to a central target, and they can do that either under a force field which is clockwise or under a counterclockwise force field. And which force field they're going to get depends on whether this little target lights up on the left or the right. So if it lights up on the left, you'll get one force field. Lights up on the right, you get another.
But their task is to move the middle control point to this, and when they do that, they can't anything, if we interleave those. OK? So they aren't able to make use of this sensory cue here, and by controlling the same point in the tool for the two force fields, they simply can't learn anything at all. But we can change the experiment just very minorly.
We can ask them instead now to move the left control point to the left target or the right control point to the right target and associate which control point they think they're controlling, effectively, with the force fields. OK? The movements they make here are identical to here. They're just straight ahead movements. The only thing which is different is they're being told to control the left or right part of the tool.
And when we ask them to do that, we get really dramatic learnings. So they have no problem associating different dynamics with different ends of the tool, as though they're not learning the tool necessarily as a holistic thing. And in fact, we can show through generalization, we move the tool around, it's really the tool that matters to them and the point on the tool.
Now, in this case, we've given them an explicit instruction to focus on the control point. You may think this is a bizarre thing. This never happens in reality. So let me give you the reality version.
Imagine you're using a broom to sweep in a room, and there's a skirting board on the side, a wall, which you might hit. So you're going to focus on the left side of the broom, because you want to make sure it doesn't hit it. But if it does hit it, it's going to spin clockwise in your hand. On the other hand, if it was on the other side, the skirting board, and you were going to focus on this side of the broom, it would spin the other way. So the dynamics of the perturbation they get depend on which side of the broom you're focusing on, and you may learn the different dynamics for that.
So we have a sweeping task. So now, there's no explicit control point. Subjects are told, use this broom to sweep up this dirt, but don't hit the skirting board. And in one situation, we give one field, and with the dust on the other side, we give the other field.
We have a control condition, where again, they control the central control point. Although, they're still sweeping right and left, it's just not part of their task anymore. And what we find is in this condition, they see really very pronounced learning, and in this condition, they can't learn anything at all. So it suggests to us that, effectively, implicitly, you control control points, and you can associate those with different dynamics.
So this gave an idea about the sorts of sensory cues which might play into the ability to learn multiple memories at the same time, and that really led us on to the idea of how do you build up that repertoire of skills? I think the way we study motor control, over the last 20 years, is wrong. What we do is we bring people into the lab, we give them a task to do, and then make up a story of how they learn that task.
And I think that misses the point of human experience. We're born. We have a stream of experience throughout our life, and from that stream, we have to build up a whole bunch of skills, motor skills, and then we have to somehow maintain those skills in some sort of repertoire. And that leads to an interesting computational problems.
So what I want to talk about is give you a little introduction to context-dependent learning in the lab and in the wild, and that will lead on to the COIN model, which is a model of motor repertoire learning, which deals with creation expression updating. It'll make a distinction between proper and apparent learning, which I think is underrepresented in the field, and it's pretty important in motor control. And I'll then talk about experimental tests of the model, and please, stop me at any point.
We've written a review recently to try and explain some of the concepts. People talk about context-dependent learning a lot, but people use different words. So I'm going to tell you the words I'm going to use, and you can disagree with the words, but that doesn't really matter. I'm going to just explain to you how I'm going to use them and try to have a common vernacular.
So context-dependent learning has been known for ages. If you put an animal in and condition it to a tone, to a shock, and it shows fear conditioning, you put it in a different chamber and extinguish it, it'll show no more fear conditioning. If you bring it back into the original, it will re-express it. So it's linked that memory to a context.
And the players in this are there are sensory cues. So there's the visual appearance of the box is the sensory cue. There are states of the world, and the relevant state here is, is there a tone or not? In many tasks, there are actions. There aren't actions in conditioning, and then there's feedback. Do I get a shock or not, or what's my error of my movement, or do I get money or punishment? OK?
And the key thing is, at any one time, there's a contingency which links these things together. So there's a statistical relationship between the sensory cues, the states and feedback, and how they depend on action. And the idea is that what context is in our terms, it's a indicator variable for a contingency.
So in context 1, there's one set of contingencies. In context 2, there's another set of contingencies. So context is thus a discrete variable which specifies which contingency you're in.
So here, for example, context 1 would be a conditioning context, where you get a shock to a tone, and here's context 2, where you don't. And if you're interested in episodic memory, people have talked about similar things. So you might have to learn a series of items under one context, which might be a sensory cue, background image and another under a different background image, and you're better at recalling it under the same background image, later on. And if you're into neuroeconomics, again, you can have different payoff matrices for different states, in different button presses, and the context basically tells you the payoff matrix. And if you're a motor control person, holding an object out to the right or left could be different contexts, if they perturb you differently.
So in the lab, we feel we have a handle on context, and we set the context, and we say the animal or human has learned the context. And I think that's really misleading, because in the wild, it's just not like that. In the wild, context estimation is uncertain and hard.
So if you're a chef, and you've learned how to cut an apple and learned how to cut a tomato, and you come to a persimmon for the first time, should you use your memory for the apple or from the potato? Should you mix them? Should you create a new one? It's just totally up to you.
If you're a mouse which has a scary experience in the forest, what is the context for that fear? Is it the sound of the birds in the tree, the location, the route you took to get there? It's totally up to the animal to construct it.
So I'm going to argue context is totally in the eye of the beholder. What you decide is a context and I decide is a context can be completely different. There's no right answer, in general, but context can be useful to tag memories. OK?
So that's what the use is. By tagging memories with context and having multiple memories which you can basically associate with different contexts, you know how to solve the computational problems of which should I express, which should I update, and when should I create a new memory? And that's all governed by contextual inference is the claim we're going to make.
So I'm going to tell you about a very simple model. It's like a one-dimensional adaptation model of motor control, which basically does context-dependent learning, and I'll show you. Can explain a large set of data.
So here's the model, developed by James Heald, a very talented PhD and now a postdoc at the Gatsby, in collaboration with Mate Lengyel, and I'm going to tell you about the generative models. This is a Bayesian normative model. I'm going to explain how the model believes data is generated. Once you've decided how data is generated, everything else are fixed. You just do inference under that data generation.
So you can be in different contexts at the time. You could be the P0 force field, the P0 force field, then the P+, then the P+, and the P-, so context transition over time, set by some experimenter in this case. OK? And the way the generative model works is we have some sort of Markovian process with the transition matrix here. So we transition through context over time, and so in the inference, we want to work out at any point, what's the current context I'm in? And because we're going to be Bayesian, we're going to estimate what's the probability of all possible contexts I could be in at each point in time?
And then, I'd want to know how many contexts have I seen, and what governs how contexts transition? So I want to learn all that from observations. Contexts can emit sensory cues. For example, in this case, sensory cue could be I'm controlling the left or the right control point, or I'm looking at some visual object. And so you want to learn, for example, the cue emission probabilities, how probabilities are seeing this cue given this context.
And crucially, each context is associated with a state. So this would be, let's say, the strength of the P+ force field, which can change over time, and we just model this as a linear dynamical system, which can change. Here's the state of, let's say, the P- force field. This might be the state of the P0 force field, which can all change. So now, you want to estimate, what's the current state of every context, and you want to also say, learn the state transition dynamics as well, which can be different for each context.
And finally, what you get to experience when you make a movement is state feedback, which is basically the state of the active context. So if I'm in the P- context, then I experience the strength of the P- force field only, and I don't experience this, although, it continues to exist in the outside world. And so what you get is this bottom row here. All you get to observe is sensory cues and states feedback, and you have to estimate everything up here, which is all these things here. OK?
I'm not going to explain how we put priors over this. To make this work, we have to put complex priors over the model. I'm going to skip that to try and give you the principles. Instead, I'm going to try to explain to you how the inference process works.
So this is done on a trial-by-trial basis. I come into a trial in my experiment, and I've got an estimate from the previous trial of what context I was in. And I might have a new sensory cue, and from those, I want to estimate how probable each of the contexts I've seen before. And maybe, at this stage, I've only seen two contexts, and at this moment, I estimate I'm most likely to be in the red context, context 2. And it's the some probability I'm in a novel context as well.
I have to also represent for every context its state, and being good Bayesians, we represent the posterior here. His field strength, so the blue context, this is my belief about the strength, it's got a high field strength. The red context has a low field strength, and I also always represent a novel context, which just has a very uniform, wide prior.
And so the right thing to do, before you move, is to express your memories in proportion to how much you think they're going to be active for the trial. So this is the weightings you give to these distributions, which generates this output here. That's your belief about what's going to happen on this current trial. And if you're in this motor control domain in one dimension, the simple thing to do is take the mean of that, shown here, apply that, and then get feedback. OK?
And now you have more information. You've made the movement. You've got sensory feedback. So we're going to update our beliefs.
So having had our beliefs before, we can now update them, and in particular case, this field you experience, or the sensory back, is most consistent with something novel. And in our model, you generate a new context and proportion to how tall this novel context is. And so in this case, you generate a novel context, this orange one, and now you have to represent that orange one.
So you create a memory. You effectively create the memory based on your experience, and now you also update all your memories based on how much you think they were relevant. So you're going to update the red memory the most, because effectively, it had the highest thing and the blue memory the least here. OK?
And then you recurse. This becomes the new prior, and then you keep going. So it's basically a way of updating your beliefs about the context you're in and the forces.
OK. So let me give you actual real examples from data of how we can use this model. So one thing that's been known for a long time, if you experience an abrupt perturbation which goes off abruptly, you deadapt much quicker than the same perturbation that comes on slowly and goes off abruptly. OK? So under these two protocols-- and here's some real data-- when you learn something abruptly, and it turns off, you decay very quickly back to baseline. If it came on slowly, you decay away slowly.
So we can take our model and just simulate it. Feedforward, we're not going to even try to fit it to this. Actually, the parameters just come from other experiments. This is what the model predicts.
The model says, yes, if you learn it abruptly, it goes away abruptly. If you learn it slowly, it goes away slowly, even though in this case, the slower one, the slow learning wasn't even as high as that. So they cross over. And so now, we can actually look inside the model to understand why this would be the case.
So I need to explain to you what these are. You're going to see a number of these. So I showed you before. Here, where the state's shown as these distributions, what I've taken is as I've rotated these vertically and plotted them over time. That's what's shown here. So these are little Gaussians, effectively-- or maybe Gaussians-- plotted over time. This is the full posterior.
So what happens is, when you experience this perturbation, which is an abrupt, you generate a new memory for the perturbation, because it's a big error. OK? So you have two memories. What I'm showing you here is these mixing proportions over time. You start off fully expressing the blue memory, the zero memory, here.
Then, when the perturbation comes in, you switch to express the red memory and turn off the blue memory, so as to fully express this. But crucially, when you deadapt, what happens is you very quickly switch back to the original memory, because you've maintained it. OK? So you've had that blue memory for a long time, and the deadaptation, in this case, is driven by fast changes in your predicted probabilities, of which context's relevant. OK? And that leads to this fast deadaptation.
When you bring a perturbation in gradually, you never get a big error. OK? So what happens in the model is, rather than generating a new memory, you now adapt your existing memory for the perturbation, because memories don't get big. But yet when you turn the perturbation off, that leads to a very big error. So in the model, what happens is you create a new memory now for the baseline condition.
But crucially, this is a very new memory, and what the model learns about is how probable are the memories and context in general? Because that tells you what the transition matrix and the global probability is. And because this memory hasn't been around for a long time, when you go back, when you want to transition to the memory, it takes more time to turn this red memory on, because you're not sure it's going to hang around, effectively. And so that leads to a rather slow deadaptation.
So we can explain how large errors here are important for memory creation. Small errors don't create new memories. And abrupt turning off creates new memories, but because they're a recent memory you don't transition back to them. So it's a very simple explanation of that.
But-- let me just one thing I should point out from this slide-- what's important here is deadaptation, for example, and this is not learning. It's actually just switching between two learned things over here. You're just changing your mixing proportions, and I want to make that clear, that that's not proper learning. We're going to call proper learning changes that happen here. Changes which are caused by these things changing, we're going to call apparent learning, and I'm going to give you three examples here.
This is three simulations of the model, which look identical, but they have fundamentally different learning processes going on. So the perturbation here was a step change, this black thing. The blue is the motor command of the model, which does a good job of learning that. But there's no way you could tell from these which is proper and apparent learning. So it turns out, the left-hand one is proper learning.
In this case, the model came in with one memory for 0. When it got the perturbation, it just adapted that memory, and it always fully expressed it. That's really proper learning, what we tend to think of learning. You've changed some internal representation in the brain, which now represents the strength or weight of this object.
In contrast, the middle one has no proper learning at all. It's purely apparent learning. So in this case, it just so happened, the model came in with two memories. It had a memory for 0, and it had a memory for 1, but it was fully expressing the memory for 0. So that's why there's a 0 here.
And when it experienced the perturbation of 1, all it did was turn on that context, or express that context, and reduce the expression of the other context, leading to, basically, expressing the red and not expressing the blue, leading to this learning curve. So in this case, there is no proper learning at all. It's all apparent learning.
And here's an example, which is a mixture of the two. You started with one memory, you generate a new memory, and this learning here is proper learning. But also, you change the mixing proportions, which is apparent learning.
So it's a mixture of the two. So I think it's important, when we see learning curves or curves where performance gets better in animals and humans, it does not mean there's learning going on at all. It can be purely inference and a mixing of previous learned things.
So we need slightly more complicated paradigms to really test the models. I'm going to tell you about a spontaneous recovery paradigm. This is a very old paradigm from the literature. If you condition an animal to a-- fear condition an animal, and then you extinguish it in the same cage, they'll no longer show fear conditioning. If you just wait, time, they often will re-express fear conditioning. They spontaneously re-express that fear conditioning. OK?
So this only made it to the motor control literature in like 15 years ago. So here's the motor control version of spontaneous recovery. Subjects come in, they experience P0, no perturbation, just to get used to the robot.
We then give them P+ for a certain amount of time, and they adapt to that. And then you give them P- for a short amount of time. And the idea here is you're trying to adapt them up to here, then deadapt them back down to there. So this is like the conditioning and the extinction.
So now, they're back down at this point here. You might expect that they would now generate no force for eternity. They're back down at the 0. And the way it's assessed-- and Maurice Smith at Harvard was the first to do this-- you put them in a long sequence of channel trials, and what you find is now they re-express the forces for the P+, spontaneously.
And the way this has been explained-- so Maurice Smith on data-- is with what's called a dual-rate model. So this is not a context-dependent model. It's a model which was created particularly for this data. It says, the motor output at any time is a mixture of a fast and a slow learner, which compete to learn. And the fast learner learns quickly and forgets quickly, and the slow learner learns slowly and forgets slowly.
So here's what happens. The fast learner in green starts to learn, but then the red slow learner takes over. So the red's learned most of it by here. Now, you deadapt, it's the fast learner which learns the deadaptation.
So by this stage, although the motor output is 0, the slow learner is positive, and the fast learner is equally negative. And what happens when you go into a channel trial, there's nothing to learn, so you just decay away. But the fast learner decays away quickly, revealing the slow component. OK?
Crucially, in this model, the fast or slow states are always fully expressed. There's no concept of apparent learning, and I'm going to show you that how our model can explain this data and more. So here is the COIN model applied to this task.
So what we put in, there's no cues here. We just put in a noisy version of the perturbation. It starts with one memory, it creates a second memory for P+, and a third for P-.
When you go to channel trial phase, these just generally decay away. There's not much going on there. But critically, although, you express the blue, then the red, and then the orange memory, what happens in this phase is, because P+ has been experienced the most, and it's the most common of the perturbations, in the absence of any evidence, you're going to transition back to believing that P+ comes back.
And that's exactly what happens in the model. It's learned that P+ is most common. It's not getting any observations here. So it basically increases the probability of P+, and that leads to the re-expression of P+.
AUDIENCE: Can you take the channel of a new context or--
DANIEL WOLPERT: So OK, that's an interesting question. We don't model it as a new context. We model it as non-observations. You can model it as observations, such that any force you produce into the wall of the channel is what you're experiencing. It just makes everything more complicated to simulate, but it doesn't fundamentally change the qualitative patterns. So at the moment, we treat it as non-observations.
So if context is important for this expression, well, we should go and get a much stronger expression, if you give really strong evidence that that context is returned. So we can change this paradigm by two trials, by adding in two P+ trials into the paradigm just before you get to the channel trial phase. So for the COIN model, that's really strong evidence that P+ is back, and so when we do the simulate this, what happens now is, rather than having this rather slow increase in P+ being expressed, you have this very rapid increase, which leads to what we call evoked recovery. Rather than having this knee-like thing, you basically jump up and then decay away. OK?
Now, that will work for the COIN model, because the COIN model can basically show these effects through apparent learning, but in the dual-rate model, it can only do it through proper learning. And so when we ran these experiments, here's spontaneous recovery. It looks weird, because we're only showing you channel trials.
These are all channel trials. Here, there's only 1 in 10 channel trials, but you can see the spontaneous recovery here. You can see what we call evoked recovery here.
The dual-rate model can't explain-- I mean, it can explain this, because it was designed for that. It has a qualitative mismatch here, because to get from here to here in two trials requires the fast learner, but the fast learner then decays away very quickly. In contrast, the COIN model can explain that very well and at the BIC level as well. So we think context estimation controls memory creation expression in evoked and spontaneous recovery.
So far, we've talked about creation, and we've talked about expression. The most interesting thing I should-- sorry, one more thing. The reviewer said to us, if statistics really matter, you should run many more P+ trials. The reviewer said, because if you run more, you should get more spontaneous recovery, and that's true.
So if you simulate these two, this is the standard-- this is overtraining. When you do much more P+, you get much higher spontaneous recovery. We would depress that we're going to have to run this experiment, but luckily, Opher Donchin had run it already, and it's on bioRxiv. Sadly, it's still on bioRxiv and unpublished, but effectively, they showed that if you overlearn in the P+, you get much more spontaneous recovery, so again, suggesting that you're sensitive to the statistics of experience, in terms of when you're doing context estimation.
OK, so memory updating. In typical state space models, you update your estimate of a force field by the previous estimate, plus some error, plus some learning rate, the Kalman gain. And that's nice and easy, but when you come to these sorts of switching state-space models, which have multiple contexts, life becomes a bit more complicated.
The right thing to do is to update the state of the jth memory, based on what it was before, again based on the error it had and its Kalman gain but, now also, how responsible it was for that movement. OK? So having made the movement, you might think you've got a bunch of memories which are responsible. The right thing to do is to update all the memories in proportion to how responsible they are. So we're going to argue that the right thing to do is not to update a single memory, but you've got to update all your memories at all times. OK?
Now, it may be that, in real life, typically, one context probability is at 1. Everything else is 0. In which case, you just update one memory. But we're going to create a situation, at least in the lab, where you're going to be confused with which context you're in to see whether you update multiple memories. And this is the most complicated slide, but I'll give it a go. OK.
What we do is we want to create two memories in our subject's head, two motor memories, and we want to associate one with P+ and one with P-, opposing force fields, and each also with a sensory cue. So I'm going to pretend it's color. It isn't color, but just it's easy to describe as color. We're going to give sensory cue the blue, and if they see the blue, they're going to get P+. If they see red, they're going to get P-.
So the idea is that we basically have got now a memory for P+ for cue 1 and P- for cue 2. And we can interleave these randomly, and subjects will perform perfectly. They'll switch between them trial by trial, based on the cue they see. So they can learn them. They've got two memories.
What we want to know now is how much do we update each memory for different experiences? So we want to say, if I was to give you P1+ trial, how much would you update a memory? And if I was to give you a P2- trial, how much you would update.
Under these two, you're going to be pretty sure what context you're in, but to make you a little less certain, we can show you cue 2, the red cue, but then give you the plus force field. Or show you cue 1 and give you the minus force field. This is a conflict now. You haven't seen these before, so you should be uncertain about which context you've been in.
So the way we actually do this is we have to measure what's in a memory before and after this exposure, and that's a measure of single trial learning. The way we do that is we use one of our channel trials. There's nothing to learn here, but we give a cue. We give the blue cue, so you're going to express what's in your memory for context 1.
We then give you one of these trials, and then we, again, assess what's in your memory for context 1. And the change in that is the single trial learning, how much you've learnt for context 1. OK? So this will tell you how much you've learned, and there's a little wrinkle that we actually wash people out a bit before we do this, to make sure they're not saturated. So there's something still to learn, but that's a technical detail.
So let me give you the intuition of what should happen. Before you move, you see the cue. If you see cue 1-- this is a subscript-- you should have a high belief, a high prior you're going to be in context 1, because that's what's associated with context 1, and a low belief if you see cue 2. Having moved, you experience the force field.
So now, we can calculate the likelihood, if you experience the plus force field, you should have a high belief that you had just been in context 1. If you see the minus, you're going to have a low belief, and being good Bayesians, you should combine these likelihoods and priors to form the posterior. And actually, what we do now is we actually fit the model, so what I'm showing is the fit of the model.
This is what the model predicts should happen, that you should get the most learning when the cue and the feedback are consistent with context 1, the least when they're both different, and intermediate levels for the cue conflict. There's a very good reason this is lower than this, which I can go into. But it's technical, and it's not that important, but it's expected to be lower.
When we look at the data, we see this lovely gradation. That, effectively, the amount you update your memory in cue 1 basically shows this lovely gradation. The experiment is completely symmetric. So had we done cue 2, we'd get just the opposite effect, because everything is counterbalanced. There's nothing special about cue 1 and cue 2.
Now, one thing I can't tell you is this happens on every trial. I can't tell you this is what you update. We have to average many trials to get this.
So it's certainly the case that, over many trials, you update according to the probabilities of the context. But we can't know whether on each trial maybe you update only one memory, but the memory is probabilistically updated, depending on that. That's something for the future, but certainly, memories are updated in proportion to their responsibilities.
So we've run this memory updating, the spontaneous recovery, and the evoked recovery. So we have all the parameters for individual subjects, I think 40 of them. We're going to take those parameters and just fit classical data now with that-- sorry, not fit, simulate classical data. So there's no fitting here at all. We're just going to say, what would happen if our 40 subjects had done these experiments? OK?
And a very famous experiment is savings. If you learn something once, go away from the rig, come back a week later, you'll be forced to learn it the second time. OK? So that's savings.
So here's an example. You learn a force field P+. We washed it out. This is actually a spontaneous recovery paradigm done twice.
Then, you come back and do it again. You're faster on the second day to learn the perturbation than the first day. OK? If we just simulate our 40 subjects, average that data, that's what we get from the model as well.
Now, the question is, why are you a faster learner the second day? Is it that you've maintained a memory of that? Is it that you've got a higher learning rate?
And it turns out that neither of those are true in the model. The Kalman gain, which is our learning rate, is identical on the first exposure and the second. In fact, the representation of the state, because it decays away in between, is identical on the first and the second, pretty much. You haven't maintained that memory really.
What's really different in this case is your belief that there's a new context that you're going to transition to. Having experienced a new context, that you're more willing to switch to that context the second time. So we see the probability of the new context goes up much faster the second time you see it, and so this is driven by apparent learning. So this not proper learning driving this. This is all apparent learning.
A similar thing happens for anterograde interference. The longer you learn P+, the harder it is to learn P-. So this is just saying a paradigm where you experience P+ plus for different amounts of time, before you go to P-.
This shows you the same data plot from the start of P-. If you go straight to P-, you learn it really quickly. If you're in the pink group, where you've learnt P+ for ages, it's very slow to learn P-. OK? And we recapitulate that data, but again, there's no difference in the learning rate. OK?
And more importantly, the speed at which you actually learn P- is identical in the model. So this shows you the representation of P-. It's identical. Somewhere in your brain, you've learnt P- just as well.
What's different is your belief that P- is going to hang around, that effectively, the longer you experience P+, the less you believe you're going to stay in P-. P+ has become commoner and commoner, and therefore, you're not willing to express your P-, and that's shown here. That effectively, if you just learned P-, you express it immediately, but if you learned P+ for a long time, it takes you a long time to turn on the P-, even though somehow it's represented in your brain. I know this is all correlative, but this is how we're trying to explain the data.
And finally, in terms of the COIN model, it makes a nice distinction between the different sorts of volatility can see in the world. OK? So you can have volatility at the top level. I could flip P- and P+ really quickly, or I could keep P+ on for a long time and switch it slowly to P-. OK?
So that's between context volatility, and that's very different from within context volatility. I could be within a context, where P+ is now drifting in time in terms of its strength, and that's within context volatility. And often, these are not really separated in studies of volatility, whether you think it's within or between contexts.
And so I want to show you this data set, which was interpreted in the paper in a very different way, and show you how we interpret it as between context volatility. So this was a study published in a number of years ago, where they put the very stable environments. So you switch between P+ and P- either only very infrequently or very frequently. So this is the probability of staying.
And then they asked, having experienced this volatility, how does it affect your single-trial learning in the same way I described it before? So this is what they found, that if you're in a stable environment, the amount you learn on a single trial to a P+ trial goes up and up. Whereas, if in a very volatile environment, it goes down and down, and they argued that what you're doing is changing the learning rate, that there's some mechanistic model which changes the learning rate as a function of experience. OK?
When we run the COIN model on this-- and this is not a fit. Again, it's just a feedforward simulation from our 40 participants-- we recapitulate this behavior, but importantly, the learning rate doesn't change at all between the conditions or over time. And the amount you learn on any trial doesn't change between the conditions at all. The critical thing here is that, having experienced this P+, I have to assess it on the next trial.
So remember, we bracket the P+ trial between two channel trials. So it's how much you're going to express on the next trial depends on your between context volatility. If I'm in a very stable world, anything I learned now I should express immediately, because the world's stable. It's going to hang around. If I'm in a very volatile world, anything I learn in that I shouldn't bother expressing in the next trial, because it's likely to have gone.
And so this is exactly this. This is the predicted probability of how much I should express things, and I should go up for the stable environment and down for the variable environment. So we interpret this as just simply that, effectively, apparent change in the learning are being driven by this contextual inference. So all these really effects are apparent learning effects, not proper learning effects.
We are not oblivious to all the other papers dealing with multiple-context models, which I have one of them. There are many others. We go to a lot of effort in the paper to describe what the differences are, where they fail, what the positives are. I'm not going to go through that. It's reviewed in that and also reviewed in a couple of review papers.
But the COIN model I think is a key computation. It's principled. It's Bayesian. It's comprehensive. In a sense, it's got all the signals. You can make the model much more complicated, that's for sure.
It unifies a bunch of disparate data sets. It makes a really clear distinction between apparent and proper learning. I think it's relevant in conditioning episodic memory, economic decision making.
We've argued that in a couple of review papers this year, where we think that, actually, having this common vernacular, you can explain many features in other domains using a very similar model. And I think I will skip my last bit, since we're at five to. Let me see and get to my last slide, if I can find it. So just a summary, I've only dealt with the top three of these. I'm sorry, I skipped the categorical learning.
What that tells us is we have to modify the COIN model. That effectively, you don't learn things as individuals, that contexts are really hierarchical in their families, and that's something which we'll take into account in the future. And I'd like to thank my collaborators and my funding, and I'm happy to take questions. Thank you.
[APPLAUSE]