VICTOR WINARSKY: Good afternoon, everyone. So we can start now. My name is Victor Winarsky. I'm a post-doc in Josh McDermott's lab for computational addition.
And I would like to welcome you all here for what we hope to become the first out of series of speakers invited by the CBMM graduate student and post-doc council or shortly trainee council of the center for brains, minds, and machines. So it is my great pleasure and honor to introduce the first speaker of that series, Professor Illya Nemenman from the Emory University. Illya got his undergraduate and masters degrees in theoretical physics and math from the Belarusian State, Santa Clara, and San Francisco State Universities.
And then he moved on to do his PhD in Princeton working with Bill Bialik on information theoretic approaches to learning. I guess it's important to note is that his main PhD paper on predictability, complexity, and learning became a quite important and influential piece of work and information theory in theoretical biology. He then moved to, after completing his doctorate, to a few post-doc and research scientist positions in a bunch of institutions in NEC Research Institute, Kavli Institute for Theoretical Physics in Columbia University, and in Los Alamos National Labs.
Since 2009, he's a professor of both physics and biology at Emory University in Atlanta. And what I can say about Illya is that his interests are definitely very broad. They encompass whole range of theoretical approaches to information processing in living systems.
In his work, he goes the whole way from abstract theoretical considerations to precise and detailed experimental predictions and data analysis methods, which I think is a quite unique combination. One fun fact is that at his lab's website, he hosts a section "Smart Quotes from Smart People." And I was told that one's choice of quotes tells you something about person's personality and worldview. So I allowed myself to pick two from this section of his website.
First one, very dramatic quote from John Hopfield, "What is physics? The idea that the world is understandable." And the second one, perhaps more trivial in content but equally impactful, is do not model bulldozers with quarks. Ladies and gentlemen, without further adieu, professor Illya Nemenman.
ILLYA NEMENMAN: Thank you very much for having me here. It's really-- it's a great honor to be opening a series and also being invited by students and postdocs. It's, I think, the first time I'm doing something like that. So thank you very much for having me and I hope I won't disappoint.
So I'll talk mostly today about birds, about spike timing in birds. Not that scary, you know, since they are much smaller and much one beautiful creatures. But since I'm always, you know-- as always, I'm going to run out of time.
Let me start with the most important things, right? Who actually did work that I'm going to be talking about? There are multiple labs involved.
There's a couple of couple of people who work in my group. Caroline Holmes, Baohua Zhou, and David Hoffman will probably talk only about the work that Caroline has done. And Sam Sober Lab is where my [INAUDIBLE] behavioral experiments are done on the Bengalese Finch, multiple graduate students and post-docs involved. And then Coen Elements Lab in Denmark does in vitro work on muscles. And so we'll talk about that work as well. And then there's funding sources, which is also important.
So I was asked by Victor to start with just general introduction of what it is that my lab does, just give you a couple of examples and then go to the neuroscience of it. And so what do we do? We study basically two questions.
The first question is how biology processes information and learns, right? And abstractly, this is a very-- is you can ask the same question about both neurons, about individual cells, brains having percolations, right? And involving percolations [INAUDIBLE] learns about the environment, accumulates information about the environment.
And so what we're interested in trying to figure out-- what are the limits to information processing in all of these systems and what are the computational and biophysical mechanisms used by biological systems to learn from the environment? And here's a couple of examples. So this is an image of mammary duct terminal end bud.
Sort of the duct goes from the left to the right. When the mammary gland grows, this bud propagates through the entire gland, and the concentrations of molecules are so small there that they cannot easily figure out which way to move, right? They-- you know, left or right.
The concentrations or gradients are basically stochastic-- you know, almost zero, right, in all directions. Cells are sensitive to differences of one molecule across a cell size. And so we model this system as information relay channels in very similar ways as people who work on communication theory model information [INAUDIBLE]. We can show that this particular systems integration of information of a large time over large spatial scales is actually what allows the system to make a decision which way to move, right? Which way for the cells to move.
Similarly, we study things like the [INAUDIBLE], which is an immune inflamatory signaling pathway. And in that system, we can try to design very specific bichemical models of how information is transduced in this system. Or you can just try to understand what are the important players in the signal transduction, right?
Where is information lost? Where is information maybe merges? Is it typology of this network?
Sort of information processed as a pathway or as an interconnected topology and things like that. And this was a paper a few years ago in science. Was Litvinenko, who is at Yale nowadays.
Do obstruct things like, for example, an immune receptor model trying to figure out can you transmit multiple signals through the same receptor. That's a non-trivial problem. You know, there is a lot more molecules in the environment than there are different types of receptors.
So it can virtually sense many of different molecules with just a small number of different receptors and with accuracy. And we have a paper in press which calculates all the limits in these type of systems. So this is another example in this direction.
We study learning in worm. And so the worm has been hit with a thermal laser. And so some different directions of where the worm goes are warm and some directions are colder.
And this worm learns the association between the existence of the food and the temperature in the environment. And we develop these models where on the vertical axis, the thermotactic index of this worm depending on various different conditions, was the worm starving or is the warm not starving and things like that. And the model that we develop sort of feeds the data we would say reasonably well. The curves are matched.
And so this is a paper just about to be submitted with [INAUDIBLE]. And Will is actually giving a talk two days from now on Thursday in biophysics at Harvard. And he is going to be talking about this stuff. So if somebody is interested, it would be a good idea to go and see how worms learn.
This is another work on the birds, which we're not going to talk about is birds. This is Bengalese Finch. They learn.
They adapt over the course of their adult lifetime. You can force them to change their pitch by shifting the pitch that they hear. They sing and then you shift what they sing by putting headphones in their ears. And they sing too high or too low.
And then they try to compensate for that. And so the amount of compensation, you can show this on-- you know, this is experimental data-- averaged over about 10 birds per curve. And then we have a model which is a Bayesian non-linear filter model that seems to predict this data quite well and many other features of this data as well. So that's again at the same Sam Sober's lab at Emory.
And another part of what we do is trying to figure out what matters in biological systems. So the idea is that if I try to simulate or try to model a biological system on a molecule by molecule scale, I don't, right? That's never going to happen. So trying to model bulldozers with quarks is not really necessarily a good idea, right.
So we need to figure out which features or which effective phenomenological features in the description actually matter and build models using just those features. And so we have a few papers in which we show how one would learn this type of model, thermodynamical models, on just the right scale. So for example, somebody shows you time series data, and can you describe, can you figure out what are the laws that describe generation of this data?
Can you learn, roughly speaking, Newton's laws from the trajectories of falling apples and moving planets and things like that? If we didn't have Newton, would we be able to learn those laws? And we can do something like that.
So this is a talk I'm giving tomorrow on physics at MIT here that we-- this is a trajectory. The sort of solid lines are the trajectories of planets that we haven't learned from and that the dotted line is our predictions of where those planets would actually be. And this is from about 100 different observed trajectories not knowing anything about the laws of nature besides that they are temporally local and we can learn these type of things from data.
So this is a general overview. I hope that sort of piques your interest and maybe some of you would be interested in coming and seeing this talk tomorrow as well. But with that, let's move to the only story which I will talk about today, which is the precise spike timing in motor control.
I have to apologize a bit. I had an accident with my laptop. My talk crashed earlier today. And so I was trying to reassemble it and there might be some glitches, that some slides will show and some slides won't. And so please don't get angry about that if that happens.
So what is the question that we're trying to answer and will be answering for the remaining, whatever, 40 minutes of the talk? This is a classic question in neuroscience, right? Does timing of individual spikes matter? Or to which extent does timing of individual spikes?
And of course, this will depend on the system. And the first time these questions were asked were-- they were asked in the sensory neuroscience domain. And people showed that there is temporal encoding of information in the sensory system with pretty much any sensory system that you can look at, right?
Vision, hearing, somatosensation, taste-- there's too many publications to list here-- tons and tons of papers. There are entire books. For example, Bill Bialek, who I worked with, had a book Spikes, which is devoted to the idea that individual spiking [INAUDIBLE] individual times of those matter.
So what is a problem with this type of analysis? Today, there is still no understanding. There is still no proof.
There is still no demonstration that timing downstream from sensing actually matters, right? Whenever you try to publish a paper about timing in sensory system, one of the reviewers, at least one of the reviewers, immediately comes back to you with a question, fine, yes, there is information at precise timing of individual spikes in the sensory system, but is it ever used? Maybe it shows there in the brain, but the brain doesn't use that information to actually control the behavior.
And so what we would like to try to do is we'd like to show that that information is-- that exists in the brain in the form of precise timing of the sensory system can be used in the motor system and actually is used in the motor system, right? And so that's what we would like to show. And on top of that, what we would like to do is we'd like to show that it's not-- can be used meaning that there is some correlations that exist between the timing and the behavior.
But it is actually used, right? That there is actually a causal relation between the timing of individual spike and of individual neurons and the behavior that is going to happen afterwards. And I want to emphasize this important difference over here-- sot of when versus what.
There are a lot of papers, including some papers from Michael [INAUDIBLE] group, that shows that in these motor systems-- like for, example, in the bird, in the bird song, if you take the onset of, for example, what are called gestures, the sound that the birds make, and you try to correlate the onset of this sound with the timing of spikes that preceded this onset, there is a correlation. So in this sense, the brain encodes the timing of what is going to happen in the timing of spikes.
And that's maybe not extremely surprising in the sense that if you have all of the neurons in the certain part of the brain that are going to-- they're controlling a certain type or behavior. And suddenly, all of them have spiked one millisecond earlier. The system is causal, right?
So the induced behavior would have to happen one millisecond early. So the fact that timing is transduced into timing is not the most surprising thing. What I would like to show is that the timing of the spiking in the spike trains actually is transduced in different behaviors, right? Not just in the onset of the same behavior, but in different behaviors.
So you move a single spike somewhere, and the animal is suddenly doing a somewhat different saying, right? Not just seeing a different type. And I hope it's clear. And if it's not, please-- by the way, if you have any questions, interrupt me as well to give me some time to breathe and get some water, all right?
And so as far as I know, up to when we started working on the systems, there were not a single paper trying to understand the importance of precise timing in the motor system where timing controls the what of the behavior. And so that's what we all tried to do all this. The reason why is this is the case, why people didn't really focus on this question, and maybe I'm a bit exaggerating, but let's please try-- stay with me as I'm building this strawman-- is that the almost exclusive paradigm in the model control literature is that muscles integrate on scales of at least tens if not hundreds of milliseconds, right?
And so if you move a single spike by a few milliseconds, it cannot have much of a dramatic effect on the behavior that's going to follow because muscles are slow. They're low pass filters, right? So this high frequency structure is not going to be important.
Let's see if this is indeed the case. And we're going to do something which is basically monkey see, monkey do, exactly the same stuff as people have done in the sensory literature, which is going to repeat the same analysis in some other literature. So let me walk you through at least one paper-- this is our paper from 2008-- of how an analysis of an importance of precise spike timing is done in the sensory literature, right? So this is this animal, the fly.
I worked with Bill Bialek and Rob de Ruyter. This is the animal that they studied. You can record from a neuron in [INAUDIBLE] plate which responds to the animal's self motion through the world, right?
So if the animal turns right, the world in front of animal effectively turns left. And so this is individual spikes. And of course, all spikes are stereotypical, so it's only either the number of spikes or their actual positions that could be encoding the information.
And so what you start doing is something like this. You look at the stimulus, which is velocity of which-- this angular velocity with which this angle rotates. This is the spikes that happen.
And you can look at the discretized response in different words, right? So you would have a word maybe at 32 millisecond resolution that would be 11 spikes that happened in this window. And then there is nine spikes that happened in this window, right?
As you make finer and finer discretization, you will go this 11 will become 4, 3, 3, 1, right? And then it eventually will become 1, 1, 0, 1, and so on and so forth. It becomes a binary word of either spike is present or the spike is not present. And what I can try to calculate is correlation between the words describing the response at a very high resolution where it's actually talking about the number of spikes about the rate versus correlation between the very final results by [INAUDIBLE] train and the sensory stimulus and see which of the correlations is higher.
And the correlations are going to be measured in terms of information because correlations here are clearly nonlinear, and just [INAUDIBLE] linear [INAUDIBLE] coefficient or something like that is not going to be a good idea to measure non-linear correlations. And so the idea is that if you measure the correlation between the stimulus and the response in terms of this mutual information, the average of the log of the joined distribution of stimulus and response is divided by the two marginals.
If that information increases as you make finer and finer discretization of this by train, then there is information at high discretization in precise timing of spikes in addition to the information of the numbers, right? And so from that paper, this is the information in the fly vision about the angular velocity of the fly. And if you're looking at 20, 30 milliseconds resolution, the animal has 80 bits per second about how quickly it's moving through the world.
And if you're looking at 200 millisecond resolution-- 200 microsecond resolution-- then the animal has about 250 bits per second about its position, its velocity through the outside world. So clearly rate matters, but timing provides almost about the same amount of information as the rate in this particular system, right?
And this correlation again is measured in terms of mutual information, which is a generalization of correlation to non Gaussian signals, and it's measured in bits. I hope if somebody doesn't know what information is, please ask me right now because we're going to be talking about it later on. So I never know what is the audience that I'm talking to.
So yeah, nobody's asking. So now how do we the same analysis in a behaving animal, right? So what we're interested in is trying to figure out what is a code by which the animal controls its behavior. This is by train in the cortical activity, let's say, of this animal. And this is the behavior.
The behavior is this.
And [INAUDIBLE] singing of the [INAUDIBLE].
And so what we're going to try to distinguish are these two extremes of what is really, of course, not a dichotomy, but a continuous sort of different types of codes where the information about what the animal is going to be singing is controlled in the number of spikes. So maybe these three different sequences encode for the same note that the animal is going to be singing, and these sequences encode for a different note that the animal is going to be singing, right? This is the rate code.
The number of spikes is coding for the behavior. Or maybe these sequences which all have four spikes, but you know, they come in pairs-- two two versus not in pairs. And maybe this corresponds to the same note, and this corresponds to a different note, in which case the number of spikes is the same.
But they sing in different note, it means that there is a timing that is important, right? That's what we'd like to distinguish. And so what we do is we focus on this particular syllable, and we try to measure what the pitch of the syllable is.
And this is going to be our behavior. The pitch is what the animal produces. And we're going to be recording in a certain part of the animal brain and trying to correlate activity in that part of the brain with the pitch.
So notice that the pitch actually is pretty variable. It goes from about 3 to 3.6 kilohertz. And the speech is controllable. So when the animal is singing to a female, the pitch has a much narrower distribution than when it's just rehearsing on its own, on his own. And so this is a behavior which is important to the animal and is controllable by the animal. And so maybe we can figure out how exactly it's controlled. What is the code by which his behavior is controlled?
The recordings are done in this area RA, which is about I think two synapses away from muscles. That's part of the motor cortex synapses on modern neurons. If you lesion RA, there is no more song, right?
And if you stimulate RA, the song changes about 15 milliseconds later. And so this sort of sets relevant time scales in this problem. So this is how the experiment looks like.
This is a syllable that we're going to be looking at. We're measuring the pitch, the fundamental frequency of the syllable, 40 milliseconds after the onset of the syllable, which is the point where it's sort of the easiest to measure the pitch of this line. And then we're going to look at the pitches and classify them in just two different behaviors-- low pitch versus a high pitch, right?
So there is just one-- behavior is just binary. Is it zero, low pitch, or is it one, higher pitch? That's it. So the maximum amount of information I can have about behavior in the system is just one bit, all right? So if you you've been predicting behavior perfectly, you would only be able to recover about one bit of this behavior, of this information.
And we're going to look at the 40 milliseconds preceding the pitch, the rendition of the syllable. And we're going to record-- look at all of the different traditions of the same syllable vertically over here. There are 200 different traditions of the syllable.
So sometimes you have spikes, four spikes. Sometimes you have three spikes. Sometimes you have two spikes.
Clearly there is some structure. There is a lot more spikes in the middle, and then maybe there's a second column over there, right? And so we're going to try to look, try to figure out how informative are this 40 milliseconds of spiking about what is going to happen next.
Is the pitch going to be above or below the median line? We're going to do exactly the same thing. We're going to look at the 40 milliseconds as just one number and calculate the information between the number of spikes and the pitch. Then we're going to partition it into two milliseconds, five millisecond windows, two milliseconds, all the way to millisecond windows and try to figure out if there is more information in this type of words, at this one resolution of one millisecond, about the upcoming pitch of the song then there is at the resolution of 40 milliseconds, right? At which point, you're on the recordings [INAUDIBLE].
So what, again, we're going to be doing is we're going to be converting spikes to binary words at different discretization delta t. We will directly estimate information between those words and the high low field pitch and explore how that number depends on the resolution delta t. The biggest problem here is that the estimation of information from data is actually a hard problem. So I casually told you that we're looking at 40 milliseconds at the resolution of up to about 1 millisecond.
So there is 40 binary words that are involved there, right? So the length of this word is 40. The total number of possible words is 2 to the 40, and that's about a million squared, right?
So if I fully wanted to sample the distribution of all possible words that the animal is producing, I would need to have a trillion different traditions, ballpark trillion different renditions of the song. And that's just never going to happen, right? At most, we have about 1,000 renditions of each individual syllable from one neuron.
And so it turns out there are methods for estimation of mutual informational of entropy from data which require a lot less data. And this is what I developed some number of years ago. And the method's actually rather simple. It's based on this idea of the birthday problem, right?
So in this room, we have about, what, 40-- so there is a likelihood of about 90% or more that at least two of us share a birthday. And that's even though the number of days in the year is 300. So the total entropy of birthdays in a year is log of 300.
If I wanted to sample the entire distribution of birds over a year, I would need to have a lot more than 300 people in the room just to see the distribution is uniform and so on and so forth. Or what I could have done is I could have kept on asking you about your birthdays. And when I started to get co-incidences, I could have squared that number.
And that would give me a really good estimate of the number of days in a year. And if I take log of that [AUDIO OUT] would give me a really good estimate of what is entropy of birthdays across a year. So we can make those estimates of entropies with much smaller data sets with exactly square root fewer data sets, which are square root less in volume than a typical simple naive estimate of entropy requires.
And so there's a lot of things that are swept under the rug, but code is there. You can download it. It's been implemented by us and by many other groups. And to date, I don't think there is any other method that works with smaller data sets than ours.
So it's probably reaching the limit where you cannot really do much better than this type of estimates. So still, even in this case, we have typically about 200 repeats, at most maybe 800 or 1,000 or so. And for words of lengths of 40 binary [AUDIO OUT], you would still require much, much larger data sets than we can produce.
And so in fact, what we do is we average over multiple animals. So the next data that I'm going to show you is not the amount of information that this particular neuron carries about the song that's going to happen, but this is averaged over many different neurons from many different animals that we have recorded. Now what happens is if you calculate the total amount of information, that on average, about 40 neurons in the brain, in RA area, of the bird carry about the pitch of the upcoming [AUDIO OUT] gesture. This information is barely distinguishable from zero, right?
It's very, very small. It's a fraction of 0.1, right? 0.03 or something like that.
And in fact, if you had a rate code, this number wouldn't change. If you were increasing the temporal resolution, in that case, what starts happening is that at about two milliseconds or one millisecond, the information starts increasing and goes to about 0.2 [INAUDIBLE]. I remind you that the maximum it could have been is one bit, right?
So it means that I can predict which pitch is going to happen by looking at-- I have as much information from looking at a single neuron about what pitch we're going to have. It's going to be high or low pitch. I have 30% of the maximum possible amount of information.
And of course there are thousands of neurons that are there. So it's not surprising that maybe if you add thousands of neurons and each one of them carries this much information, you will actually control the entire pitch [AUDIO OUT]. This is about the pitch, but we can do very similar results for-- as acoustic features, we can analyze the sound amplitude or the spectral entropy or any other feature that we analyzed.
And you have very similar behaviors where there is almost nothing, almost no information, at high resolution, and then at high temporal resolution of about one to two milliseconds, the information saturates anywhere between 0.1 over b to maybe 0.25 over b depending on which specific feature you are looking at. So what this means is that there is information in the precise spike timing of the-- in array neurons about upcoming behavior. So we're moving one step closer to this question of is information actually used, right?
There is information in the sensory system, but we already knows this. Now it seems that there is in the motor system about the upcoming behavior as well. The next thing that one can try to ask is, can we actually decode this information? So I'm telling you there is information, but what different words actually carry it, right?
What is the key word is that if you observe a certain key word, it will tell you that the animal is going to sing at high pitch or if you have some other key words, animal is going to sing at a low pitch, right? This is again a very hard problem to analyze because every neuron is unique. Every neuron is going to have its own dictionary, its own mapping between the find quadrants and the behavior.
And we have hundreds of recording per neuron. We have to explore the space of 2 to a 40 [INAUDIBLE] 1 millisecond or 2 to the 20, 1 million, if we're looking at the spike train at the 2 millisecond resolution. And then there is an additional problem that spike words are not independent. So if you have, for example, a word which has three spikes and it's correlate [AUDIO OUT] high pitch, then any other subword of this word-- right, you know, these two spikes or these two spikes or these two spikes is also going to be correlated with high pitch. Or any super word, right, is also going to be correlated with high pitch.
And so if you want to have a non-redundant dictionary, you have to work somewhat hard to solve this redundancy problem. So we have a message which works, at least works, again, in the very severely undersampled region, and the main idea of the method is that it is easier to estimate which spike words matter [INAUDIBLE] how they matter, right?
So typically, when one tries to decode spike trains, one builds things like generalized linear models which estimate a coefficient [INAUDIBLE] coefficient which tells you how important is this specific spike pattern to the upcoming behavior or how important it is in coding the sensory signal. There's actually a real valued number that you're estimating for each specific pattern. And if we don't care about that, if we only say can estimate which specific number is non-zero, right-- so which specific coefficient is non-zero?
And therefore, this word actually encodes for the behavior or doesn't encode for the behavior. And it's not surprising that we would have to get a lot less data, right? We need a lot less data to do that.
There is an algorithm. I am going to skip the next slide just judging by the time. And this is what we would get.
This is two different neurons. The left column is neuron number one, the right column is neuron number two. And we can look at, for example, which specific words are encoding for a high pitch or a low pitch.
And then on the background, you can see it's actually the firing rate of this neuron [INAUDIBLE]. And so this is for a pitch. And you see that in this neuron, this particular combination codes for high pitch in this neuron.
So it's a very, very different combination that we code for high pitch. This is low pitch-- again, very different combinations. You can look at the amplitude of the signals, the loudness of how strong [INAUDIBLE] bird is sing, or the spectral entropy.
And you can see, first of all, that the neurons are very diverse. And what's also very important is that-- I'm sorry-- is that most of the patterns that we see that predict the pitch or predict the amplitude or predict the entropy are not single spike patterns. It's a combination of two.
And now we have a bit [AUDIO OUT]. Sometimes it's even three spike patterns that are predictive for behavior. There are very, very, very few single spike patterns that are predictive of essentially anything that's going to happen.
So the code is not a simple code where if you spike, you're going to have largest pitch or lower pitch. The actual structure of a pattern [AUDIO OUT] of spikes matters, right? So this specific spike is a letter, but what that letter means, what it's going to predict, is going to predict high pitch or low pitch, depends on which other letters follow. And It's a combinatorial code rather than a simple linear additive code, right?
And again, to my knowledge, this is the first example or at least an attempt for a massive reconstruction [INAUDIBLE] not reconstruction of specific spike words that predict behavior or encode the stimulus, but actual reconstruction of the entire dictionary, whether it's in the [INAUDIBLE] system or in the sensory system where we are not-- assumption that the codes are linear, individual spikes matter. But we actually constructing entire words, right? Entire combinatorial words.
So this is paper in preparation with my post-doc Damien Hernandez. OK, so recap up until now. What I told you is that spike trains are informative about the behavioral output at high frequency and that this suggests that there is a spike timing code in the pre-motor area RA. And we can use this method, which we called Bayesian Ising Approximation for-- because that's what some other people have called it previously-- to decode timing codes in even individual neurons.
So this unfortunately hasn't answered the question which I told you I will answer. I will show that the precise spike timing is correlated with the motor behavior, right? And we showed that it is-- the precise timing is important not just in the sensory system, which has been shown many times, but I haven't told you yet that this is causation rather than correlation, right?
Told you that if you actually change the spike train by a few milliseconds here or there, the animal is going to be doing different things-- not just a different time, but different things. So let's try to do this. So to show that the timing is used to control behavior, we have to do [AUDIO OUT] things.
First of all, we have to focus on motor units, not anywhere in upper area of the brain because even if you look at the motor cortex, people will still keep on asking you the question. But is it really-- does it really matter in the muscles, right? So you have to actually report from the motor units.
Then you have to show that the spike trains have structure at high temporal resolution which can in principle be used to control behavior. Then you have to show that this structure is correlated with or informative of the behavior in naturally behaving animals. Then you have to show that the muscles can decode these precisely timed spikes into different forces.
And then you have to stimulate neurons with those specific patterns that you have identified that the muscles can decode. And this causes animal to do very different behaviors. And since we cannot simulate-- this is vertebrate animals. We cannot simulate the same neuron, right?
It means that we will have to do it on individual animals, right? We cannot average this reconstruction across multiple different animals. So that means we have to record from a system where we can record a lot of spikes, right? It's not going to be the bird song.
So we need to have an ability to record and stimulate. We need to generate many spikes. We need to have a behavior which is rather slow so it's already clear that the spike timing is going to matter.
But at the same time, the behavior shouldn't-- it should be plausible at least that the control of his behavior at high temperatures resolution is a useful thing, right? And what we decided to focus on was control of quiet breathing in an anesthetized Bengalese finch because we know that at least one thing is singing-- of course, [INAUDIBLE] sized ones don't-- they can control behavior on scales of about 10 to 20 milliseconds. And at the same time, the behavior itself, the breathing cycles, are about half a second long in this animal.
And because the animal is anesthetized, we can record from it for hours and hours and hours and it clearly keeps on breathing. We don't have to make it a breath, right? So we can have a lot of data. Yes.
AUDIENCE: I'd like. Ask you a question.
ILLYA NEMENMAN: [INAUDIBLE] this one?
AUDIENCE: Yes, this [INAUDIBLE]. So how do you consider whether the change in the motor output that's produced by changes in spike timing is actually an important part of the motor [INAUDIBLE]? So if you change the spike timing, you could produce change in the behavior. But that change in the behavior might [INAUDIBLE]
ILLYA NEMENMAN: That's a great question and there's two answers. The first one is first, I'd like to show that changing the timing can control the behavior or does control the behavior. And then in a later work, we will figure out whether it's important or not by certain definitions, right?
And the second response here is that here, we have a reason to believe that control of behavior, at least on a scale of 10, 20 milliseconds, is an important [INAUDIBLE] because the animals can slow down their breathing or restart their breathing when they're singing at roughly speaking that behavior, right? It's plausible that the behavior where-- it's at least plausible that the behavior that we're going to study should be controlled in those scales. And whether it's an important feature of the control or not, I'm just going to leave it to the next iteration of this work after we have shown that the timing in the spike trains actually is comparable into the timing and behavior.
ILLYA NEMENMAN: I agree. I agree. Yes, I agree with you [INAUDIBLE]. I agree with you. So next time I give this talk, I'm going to add that point here.
So this is what's been done. This is an anesthetized bird. This is an array which is put on a respiratory muscle in this bird.
And we generate-- we put the pressure detector in the bird's air sac. So we-- pressure cycle lasts 400 milliseconds to one second. By the way, if you are breathing and you have periodic breathing, it means that you are going to be dead pretty soon, right?
We think it's actually very variable. We can record up to about 30,000 breathing cycles on this electrode per bird and 300 spikes per bird. And this is spikes from individual motor units, right? How exactly we know these individual motor units? I know because I trust Sam, and he has an entire supplementary [AUDIO OUT] of the paper to explain that this is indeed a single motor unit that's been recorded.
So what happens is something like this. This black line is the average pressure trace in the bird, average of about 400-- of about 20 different pressure cycles. And then the red trace is on the individual cycle. This is what the behavior-- what the pressure was.
And there are hundreds, potentially thousands, of neurons that are controlling the behavior of this muscle. It's clear that an individual neuron, individual motor unit, is not going to be able to drive this entire cycle. But its plausible that an individual motor unit is going to be able to [AUDIO OUT] that tiny difference over here, right?
So what we're going to do, we are going to-- when talk about the pressure, what I talk about is a pressure residual, right? It's a difference between the average behavior and the actual behavior observed on this individual cycle. And this is what the spike in the system looks like this. This is expiration, exhaling neuron.
So it fires at [INAUDIBLE] peak and slightly after the peak of the pressure cycle. It never fires spikes over here, never spikes over here. It's only in this-- at the very top. OK, is there question?
AUDIENCE: Well, you talk about the three large spikes and the small spikes.
ILLYA NEMENMAN: This is-- we-- Right now, we're looking at this system. We believe that-- at least I believe. Sam still doesn't believe me that I can get additional motor units from the system from this small individual thing, right? But [AUDIO OUT] only looking at these large spikes. We're just thresholding everything and only looking at large spikes.
So let's go through my list. The first thing that I told I needed to show is that there is a millisecond level structure in the neural code. So how are we going to do this?
We're going to look at the spike trains. And I'm going to calculate, is the next interspike interval predictable from knowing the previous interspike interval? And then what I'm going to do, if it's predictable, then of course it means that there is a structure there. A purely Poisson spike train will have no predictability in it.
But what's interesting-- I can actually jitter now my spikes, right? So here they jitter by one millisecond, and here they jittered by 10 milliseconds. And I can see how much of predictability I'm going to lose if I do jittering of a certain specific structure. And that is what you're going to get if this is 100% of information that you have.
In ever bird we tried, there is some information, some predictability. Very different values, but there is some [AUDIO OUT]. Once you go to about maybe 2 milliseconds, you lost half of that predictability between the nearby interspike intervals. So what I show in the rest of the talk, everywhere I have a solid line, this is our bird number one, which had the most spikes.
That's the only reason why we called it number one. They are sorted by the number of spikes that were recorded. And then the bands are the full range over all eight birds that we have done in this particular study. So what it throws, what the slide shows, is that there is structure in the spike train, and the structure is on the scale of about one millisecond [AUDIO OUT] second.
So the next question is, can we-- does the structure actually predict behavior? And so what we're going to do is we're going to look at pressure residuals. We're going to look at, first of all, 20 milliseconds of spike timing, spike train of [AUDIO OUT] duration of about 20 seconds, and then 100 milliseconds of pressure residuals that follow that spike sequence.
And we are going to calculate the mutual information first between the number of spikes and the subsequent spike trains and then the mutual information between the timing of spikes. No discretization, just real value timing of spikes and the subsequent spike train. So the correlation time in this signal, the pressure [INAUDIBLE] correlation time in the system is about 15 seconds or so. So we are going to represent this 100 milliseconds of pressure as 11 real values, right?
Every 10 milliseconds, we're going to record a value, and that is good enough, right? We're We're not losing much of a structure because correlation times is longer than that. So what it really means that we are trying to calculate mutual informations between an 11 dimensional real number and potentially three dimensional real number if you have three or four spikes or three spikes in this 20 millisecond window.
And so estimating something in 14 dimensions is a hard problem, and we have solved this problem. You can look at six or seven different figures in the supplementary materials to show that we're not lying. This was, by the way, done by an undergraduate student, Caroline Holmes, who has just been recently admitted here at MIT to the graduate program in physics. So maybe one day she will end up here in some of the labs here in the future.
So anyway, so when we do this analysis and we estimate-- that we convince ourselves there is no bias, that the estimations are done correctly and all that stuff, this is the result that you are going to get. You have eight birds. what [AUDIO OUT] in blue is the amount of information that the rate, the number of spikes, gives us about the upcoming pressure, the entire pressure trace. And the red is what the timing of spikes provides, their error bars as you can see.
In all birds except for this one, there is a substantial amount of information between the timing and the pressure. As the open brackets is the places where we believe that our estimate of neutral information is biased downwards. So we're underestimating the total amount of information. And we're allowed to make that mistake because we the only thing we want to show is that this value is non-zero.
So the statement here is that essentially in every case except for one, there is substantial amount of information in timing about the upcoming real valued pressure trace, all right? So we're back to the same problem where we were with the birdsong rather than breathing, that there is information there. Well, the next thing that we're going to try to do-- we're going to try to find different words, different spike words, that actually carries this information.
And so what we're going to look for is, remember, our main idea is that we would like to show that timing matters, not just the rate. So what we're going to try to do is we're going to try to find combinations of spikes which have the same rate of spikes but [AUDIO OUT] timing of spikes. And the simplest combinations look like this.
You have three spikes which are at the discretization of 2 milliseconds. The two spikes, the first and the second one, are 20 milliseconds away from each other. And the central one is 10 from the first, 10 from the second.
And then the second pattern is 12, 8 millisecond gap, right? [INAUDIBLE] have the same rate, three spikes per 20 milliseconds. But they have a two millisecond difference in the timing of a central spike. That's it. This is [INAUDIBLE].
How did we choose the specific patterns? There's 20-- of these three spike patterns, the one which is about 20 milliseconds apart is the most common in at least bird [INAUDIBLE], and the 10, 10 gap is the most common of those. So we're not choosing for something very specific, very [INAUDIBLE]. We're choosing the most common things.
And so now let's try to do the following. Let's try to calculate the pattern triggered average of the pressure residuals conditioned on this pattern or this pattern. And so this is the pattern triggered averages. The full width of the line is a standard error of the mean in all of the plots I'm going to show.
Even if it doesn't seem like it is, if the lines are very, very [INAUDIBLE], it always is the standard error of the mean. And so it's pretty clear that these two individual-- this sequence and this sequence show very-- are followed by very different activations of very different pressure residuals, very different pressure traces. We can quantify it-- we can show-- sorry, different slide.
This recording is actually quite different in birds. This is a whole variability of our [INAUDIBLE] birds for which we have enough statistics here. And this is what a 10, 10 pattern code causes in six different birds.
And you can see these pressure triggered averages are very, very different across the birds, right? This top, this green line over here, is this one. So the scale on the two plots is different if you noticed, right?
So what it shows is that the actual neurons, actual motor units, actually produce very different force. And yet when I do following, I can now calculate the d prime between the blue and the red line here. So how distinguishable are those two neurons as [INAUDIBLE] two triggered examples.
And you get this really beautiful nice black line, right? Controls reshuffling is the brown line. So we're not cheating.
This is real statistics that these patterns are distinguishable. And then you look at six different birds and you see that the d primes of pressure triggered-- pattern triggered averages are very close to each other. That's interesting, right?
So what happens is that muscles, different neurons, different motor units, generate very different effects on the behavior. But a tiny change, a two millisecond shift in the spike timing of an individual spike, in all of these neurons has the same effect in every neuron, right? So the baseline's behavior is very different, but the change by shifting this spike by 10 milliseconds is basically the same across all and neurons. That's kind of surprising. I was very surprised to see these plots, how neatly they map on top of each other.
OK, so I've shown you that there is timing information, timing structure. The structure carries information about the pressure. So now let's see if that information can be transduced by muscles to different forces. So now we go to Denmark.
We get those muscles out of the bird and we stimulate them with three pulses with exactly the same structure of 10, 10 versus scope 12, 8 milliseconds. And this is the d prime in the forces that these muscle generate. This is different bundles, slightly different number of fiber. So that's why you have very different variability.
But notice that the time scales-- almost perfectly matching the time scales of the triggered averages for statistical analysis whereas the peak is at about 20 millisecond. Then there is [INAUDIBLE] at least in some of the animals at about 40 seconds or so. So what this means is that those same patterns, 10, 10 and 12, 8, can generate different forces in individual fibers, right?
Well, that's not the same as generating different behavior because bird is a big animal and animal in [INAUDIBLE] is big. It has large inertia, large moment of inertia. Try to move your hand very, very quickly. It's not going to happen.
So bird is effectively a low pass filter. The physics of the bird, the mechanics of the bird, is a low pass filter on its muscle and [AUDIO OUT] that the muscles generate. And so now the question is, does this timing also generate different pressures, different behavior, in a real bird, right? So let's do that.
So we're going to-- remember, I told you that this neuron fires only about in this area of the pressure cycle. So what we'll we do is we're going to stimulate the nerve in a different part of the pressure cycle where there is typically no spikes in that system. It's-- as I told you before, there are hundreds of thousands of neurons that drive the dynamics here.
So stimulating a few-- we cannot be sure by stimulating one, but we're not stimulating more than a few. So if you stimulate few muscle fibers, we will not suddenly start exhalation too early. It's just not enough neurons.
But we might create a tiny blip somewhere on the pressure. And so let's see what the blip is going to be. So what we're going [AUDIO OUT] again, we're going to stimulate with 10, 10 versus 12, 8. And this is the d prime that you get in, in this case, I think six different birds.
And I remind you that this is what the d prime was in a different part of the cycle in a freely behavior animal, right? It's the same. At 20 milliseconds, you have a peak. At 40 milliseconds, you [AUDIO OUT]
Very similar behavior that we recorded in the animal, and then we saw that the muscles can reproduce it. And now we can trigger the animal to do this behavior by changing the timing of a spike of a stimulation in a handful of neurons by just two milliseconds. In fact, we can do the same with one millisecond.
And in that case, the amount of data that we have would make four out of eight birds significant rather than six out of eight birds significant. You can do even more. Having this set up, we can look at all of these three spike patterns anywhere from 2 and 18 millisecond gap to 18 and [AUDIO OUT] second gaps and actually see which kind of pressures will they generate in this bird.
And again, every line is full [INAUDIBLE] standard error of the mean. They're all distinguishable from each other, right? So moving a single spike a few milliseconds produces very different pressure in the live birth. That's, to me, [AUDIO OUT].
Why, right? How does it happen? How can a muscle which integrates on scales of about 20, 10 to 20 millisecond, be affected to a millisecond scale of firing?
So this is complicated slide, so let me try to walk through this. Let's suppose that I have measured an impulse response to an individual to a spike. So over here, there is a single spike. Here is an impulse response to a single spike, right? This is pressure generated by pressure residual. That will be generated in a bird if I spike it at a certain point. And what I'm going to do-- I'm going to stimulate now with two spikes an interspike interval of 2 milliseconds.
AUDIENCE: Just [INAUDIBLE]
ILLYA NEMENMAN: No, it's just a single time. So we simulate once per breathing cycle, and sometimes we don't stimulate to have cage cycles to see that the bird is not adapted or something hasn't changed. And typically, what we do when we do this, we intermingle.
So sometimes, it will stimulate-- I'm sorry, sometimes it will stimulate with one spike, sometimes with this particular combination, sometimes with that one, sometimes with nothing, and just completely reshuffle things. It's once per cycle. Yes.
AUDIENCE: Yes so the dark trace, the small trace--
ILLYA NEMENMAN: Yes.
AUDIENCE: --that seemed to be centered around [INAUDIBLE] milliseconds.
ILLYA NEMENMAN: Yep, this one.
AUDIENCE: That's the response to a--
ILLYA NEMENMAN: --to a single spike, to a single-- not spike, but a single impulse. Yes, to a first stimulus which happens at zero, right?
AUDIENCE: [INAUDIBLE] the muscle integrates over 20 seconds.
ILLYA NEMENMAN: That's the standard lower sort of in the field that the muscles would integrate anywhere from 10 to 20 seconds. And in fact, what probably happens is that people confuse sort of the [INAUDIBLE] time and the full time and all things like that with the fact that the maximum force is achieved at about 20 milliseconds, right, or close to that. So multiple time scales, and people typically record different ones.
AUDIENCE: Right, but if we were to [INAUDIBLE] Gaussian, a Gaussian would have a sigma of about two milliseconds, right?
ILLYA NEMENMAN: Sigma here--
AUDIENCE: If you go down one sigma on each side--
ILLYA NEMENMAN: No, I would say four, but that's about four milliseconds. This is 10 milliseconds, right? This whole width is 10 milliseconds.
AUDIENCE: It's pretty small, right?
ILLYA NEMENMAN: It is.
AUDIENCE: Down there in the [INAUDIBLE] milliseconds--
ILLYA NEMENMAN: Yes, I agree. So it just means that people have to go and measure what actually happens in an actual individual animal rather than just talk about [AUDIO OUT] based on order of magnitude estimates that we have, right? So there are multiple time scales here. There is a [INAUDIBLE] time, the lowering time, the time to the maximum, and so on and so forth.
And they are all different from each other. And it's not maybe that surprising that you can't figure out how the multiple spikes are transduced to have a different behavior. And that's what I'm trying to show in a second, right?
So if I have this individual impulse response curve and I have two spikes which are two milliseconds apart from each other, can [INAUDIBLE] this spike impulse in-- the spike train with a single spike response, a single impulse response. And that would get the [AUDIO OUT]. In reality, when I have two spikes two milliseconds away from each other, I get the red curve, right?
The extremely non-additive, right? There's a huge synergy in the system. If on the other hand I have spikes much are 20 milliseconds away from each other, then my single pulse [AUDIO OUT] response [INAUDIBLE] with this spike train is going to give me a blue curve. And in reality, you're going to observe a red curve, and they are now very close to each other, [INAUDIBLE] that distinguishable from each other.
And so what I can plot here is the synergy, right? So amount of area which is outside-- which is of this area between red and the blue curve and unsigned, right? So absolute value of this area integrated over the entire time. And this is what you will see.
At about 20 milliseconds, the spikes-- and this is again not a muscle, but this is a breathing, right? This is the response of the whole animal to the system, right? At about 20 milliseconds, things become [AUDIO OUT].
And the interesting thing is that the highest slope and the midpoint of this is about 10 milliseconds. So if you wanted to control your muscles in the largest possible way, right, by changing the spike train the least [AUDIO OUT], you would have your spike interspike intervals, at about 10 milliseconds because moving spikes near 10 milliseconds by a couple of milliseconds further or backwards is going to produce the largest effect in the system, right? The highest slope, the midpoint of this curve.
And so probably what happens is that the animal is somehow-- it knows, of course, what the response time scale, what the structure of the muscles is, right? How does it respond? And at 10 milliseconds, the muscles are the most sensitive.
The nonlinearities in the muscles are most sensitive to spike timing. And that's why we're absorbing everywhere in this animals that sort of 10 milliseconds is, roughly speaking, the most common interspike [AUDIO OUT]. You go from 10 to 8 of the 12, you have the largest effect. So with that, I am going to end because my time is up.
So the conclusion here is that the precise spike timing is used in the bird to drive behavior. Whether this is an important behavior or not we will figure out sometime later, I would hope. And so precise spike timing carries information about the behavior which is not present in the spike rate.
Specific precise timing patterns correspond to different behaviors, not just of [INAUDIBLE] sets of behaviors, right? The different what rather than when. In vitro precise spike timing patterns can be decoded [AUDIO OUT] muscles in [INAUDIBLE].
They actually cause different pressures. And so the thing that one needs to think about is when we're designing brain machine interfaces, neuro-prosthetics, et cetera, et cetera, should we just focus on spike rates? Or maybe we should now start thinking about the individual timing of spikes?
And with that, I'm going to show you one. This is not just birds. So this is data that was recorded in Lee Muller's lab in Chicago in Northwestern.
And we're doing essentially the same analysis. We have a bit less data. And it's at five millisecond discretization, not in two millisecond discretization.
So we're looking at spike trains. This is a monkey doing the standard reach out tasks. And we're trying to figure out is the timing of spikes in this monkey, in the monkey brain, predictive of the-- not where the monkey is going to go, not this very coarse features, but the actual velocities, right? The real valued velocity traits.
And so here you have-- I think it's 100 milliseconds long spike train with three spikes-- one, many, many zeros, and one and one. And again, the most common pattern that you will see in this monkey. And then it's shifted by five seconds.
And the d prime value [AUDIO OUT] velocity distributions that you get in this particular system is this much compared to reshuffled tests, right? So it seems that at least in some systems as a motor system, including this monkey, very similar features are probably going to be seen that [INAUDIBLE] resolution at least a few milliseconds-- maybe not one or two, but at least five-- that the timing actually controls which specific behavior which trajectory the animal takes. And with that, I'm going to end.