Building a state space for song learning
Date Posted:
December 8, 2021
Date Recorded:
December 7, 2021
CBMM Speaker(s):
Michale Fee All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Songbird vocalizations are produced by a sparse sequence of spike bursts in a motor circuit that controls the vocal output on a fast (10ms) timescale. This sparse sequence is also transmitted to song learning circuits, presumably to control the temporal specificity of vocal learning, a process thought to proceed by mechanisms similar to reinforcement learning (RL). Electrophysiological recordings in young birds have revealed that such sequences do not exist at the earliest stages of learning, and emerge only gradually during song acquisition. How does this sparse temporal basis, or state space, emerge during development? Songbirds learn their vocalizations by imitating the song of an adult bird, suggesting that the auditory memory of the tutor song may play a role in setting up sequences in the motor system, creating a state space custom built for a given tutor song. I will describe a model for how temporal sequences to support RL of this complex behavioral pattern may be constructed in the brain, and will propose a hypothesis for how the auditory system could shape these sequences to align with a memory of the tutor song, thus facilitating song evaluation.
PRESENTER: It's a real honor to be able to introduce Michael. I've known Michael a long time. And just going through his stated bio, Michael is interesting because his background is in physics. I mean, he got his PhD in physics from Stanford. He worked as a sort of a physicist/engineer when he was at Bell Labs.
And so, when he came here, he was one of, I would say, a select cohort of scientist-engineers that I think fit particularly well in the department, who was able to bring this-- an experimental preparation in the study of songbird and songbird learning to general questions of computation.
That is, how are songs learned? How is sequence learning engaged? And how can it be probed in a system that arguably is even more challenging than the rat, and perhaps the mouse, and that is the songbird-- that is challenging to actually do in vivo electrophysiology in general, but specifically, to do in vivo electrophysiology in a behaving animal that has to engage in ecologically-appropriate behavior like singing? And so that, I think, tapped all the training and resources that Michael developed as a physicist and engineer.
And I think what really distinguishes his work for me is both the insight and dedication to the development of technologies. And he has developed incredible novel methods to allow him to interrogate the system. We collaborated with him as well, trying to squeeze what little tidbits we can to apply to our own research. But in doing so, applying these methodologies to record from the system, he's able to use the background in physics to think about and develop computational models which can be tested using the engineering-leveraged experimental preparation of the songbirds.
So all together, it really is a powerful but also unique opportunity to ask fundamental questions about not just learning, but specifically about sequence learning, and as he's been looking at more recently, the role of reinforcement and reinforcement learning in both the development of song and how it may also relate to our general learning of sequences, motor sequences, speech.
And I think I covered just about everything except in addition to his capacity as-- and I want to get this, I want to get this right, as the Glen Phyllis Dorflinger Professor of Brain and Cognitive Science, he's also recently our new department head. And he is our department head at a time when we are making a transition to a greater embrace through the college of computing, the integration of artificial intelligence, machine learning, and the general integration of computation throughout MIT.
And in his earlier capacity as the associate department head, Michael was instrumental in bringing together our new EECS-BCS major-- this refers to the 6-9 major which has dramatically changed the undergraduate culture --which also served to create an environment that bridges and will soon literally bridge college of computing, brain and cognitive science. So it's ushering this new era of computationally focus-- let's say, sort of model-inspired biology and neuroscience. And for that, that's a sort of a perfect fit for CBMM, for this talk. And so he's going to give us his insights into the latest work on temporal state space.
MICHALE FEE: Something. Something or other.
PRESENTER: Michale?
MICHALE FEE: Thank you, Matt. Thank you. Yeah, it's great to be here. It's great to be a part of CBMM. It's a very exciting mission that resonates very much with my own research interests and my own passion. So it's great to be a part of this community. OK, so our lab is interested in understanding how the brain generates and learns complex sequential behaviors.
These are behaviors that most of us think are uniquely human, like speech and language and music and athletic performance. These are examples of behaviors where the brain has to generate a very precise sequence of motor gestures. Those gestures have to be produced with very precise timing and temporal ordering for those behaviors to be successful.
Those behaviors are also learned through a lot of practice, thousands of repetitions of those behaviors we have to undergo. We try out different things. We discard the things that don't work well. And we keep the things that work well. Those behaviors are also learned largely by imitation. We watch other people do those things and we store a model of what looks right in those behaviors. And those models, those internal models, guide our practice.
So our lab works to figure out how those things work at the level of neural circuitry in the brain using the songbird as a model system. The animal that we study is the zebra finch. Zebra finches are sort of the lab rat of the songbird field. They're studied extensively in their brains and their behavior. Zebra finches produce a characteristic song that sounds like this.
[BIRD SINGING]
So they have a brief introductory note that they produce a few of, sort of clearing their throat, then they produce a song motif that has three to seven distinct song syllables. And that motif gets repeated over and over again. The motif itself lasts about a second. The syllables last about a tenth of a second. And there are distinct notes within those syllables that are very precisely reproduced.
Each one of those syllables is separated by a brief gap in which the bird takes a 30 millisecond long breath so that they can keep singing continuously. OK, let me just play a couple more examples of zebra finch songs.
[BIRD SINGING]
They got repetitive--
[BIRD SINGING]
--motif structure. This one is particularly jazzy.
[BIRD SINGING]
OK, so zebra finches learn their song through a series of stages that in some ways resembles humans, the process by which humans learn speech. There's an early sensory stage at which the young bird here, this is a 35-day-old male zebra finch that's listening to the male tutor song in the background. They listen to that tutor song over the course of a few weeks. They form a memory of what that tutor song sounds like, after which they don't need to actually be around the parent or the tutor anymore.
They begin to babble, producing highly variable vocalizations that I'll play for you in a second. By listening to themselves sing, they gradually refine their vocalization until they can produce what's a pretty, in the end, can be a pretty good copy of the tutor song. So here is an example of progression of the vocalization of one young bird after it hears this tutor song. So let me just play these for you.
So here's the tutor song that this young bird hears.
[BIRD SINGING]
OK. At an early age, 35 to 40 days of age, this young bird produces babbling, so this is called subsong.
[BIRD SINGING]
So that's a baby bird babbling. Then after a few days you start seeing a rhythmic, repetitive structure within the song and that's called-- those are called protosyllables and that's called the protosyllable stage. So you hear this-- you'll hear this bup, bup, bup, bup, this sort of 10 Hertz rhythm that emerges.
[BIRD SINGING]
And then a few days later, you start seeing--
[BIRD SINGING]
--distinct syllable types emerge. And then a few days later, or a few weeks later, the bird is practicing thousands of times a day during this period, and a few weeks later, the bird--
[BIRD SINGING]
--produces an adult song that can sound--
[BIRD SINGING]
--a lot like the tutors. All right. So how does all this happen? So one of the most prominent ideas in the field about how song learning occurs in songbirds is the idea of reinforcement learning. So the idea is that there's a song motor system that produces a vocalization that drives a vocalization at the output.
So here's a babbling song. The bird then listens to-- sorry, each time the bird sings, that song has to be a little bit different because the bird is trying out different things. Then each time the bird sings, it's comparing that, its own song, using auditory feedback to the stored tutor memory. That's somewhere-- we don't know exactly yet where that is in the brain. But then that auditory feedback gets compared to the auditory memory to produce a reinforcement or an error signal that goes back and modifies the song motor system so that the bird then eventually produces a good copy.
Now one of the key challenges of reinforcement learning is that-- is building what's called the state space of the problem. So for example, if an action, if the outcome of an action depends on where you are when that action happens or when that action occurs, then the brain has to have a representation for space or time in order to engage reinforcement learning successfully. And in the songbird, the relevant state space that needs to be represented is the time within the song.
So here's what I'm going to talk about today. So I'm going to start by briefly describing these neural sequences that underlie song production. So the state space, this time within the song, is represented by a small population of neurons in a particular area that is the vocal premotor cortex of the songbird.
So I'll describe the neural sequences that form that state space for the reinforcement learning problem and actually drive the vocal output. I'll then turn to hypothesis, or our working model for how the brain implements reinforcement learning. And then finally, I'll describe how those sequences that drive the vocal output, that represent time within the song, emerge during development. So we'll see how those sequences that represent time within the song arise as the bird develops through that learning process.
So the brain areas that are involved in producing the song are outlined here. So this is the motor pathway, as it's referred to. The vocal organ is down here. It has six or seven muscles on each side of the vocal organ. Those muscles are innervated by 3,000 or so motor neurons in the brain stem that-- just like motor neurons in our spinal cord --those neurons get direct input from a chain of forebrain nuclei that are in a part of the avian brain that's analogous or maybe homologous-- there's still some uncertainty about the relation --to mammalian neocortex. So we think of these as the song motor cortex in the bird.
There is a feedback pathway from this layer 5-like like area of the motor cortex down to the midbrain to the thalamus and back to HVC. I'm not going to spend a whole lot of time talking about this loop today, but it's been shown to be important for song production. So I'll start by showing you some recordings of neurons in these brain areas. Songbirds are very small. Singing in adults is a mating behavior, so the birds have to be pretty happy in order to sing. So we developed this very small, lightweight micro drive that's motor controlled so you can turn a dial and find neurons essentially by remote control.
So if you record from neurons in this layer 5 output part of motor cortex, you find neurons that generate 10-- 8 to 10 or so brief bursts of spikes. And each time the bird sings its song motif, that pattern of spikes is very precisely reproduced and each neuron produces a distinct pattern of bursts that's very precisely reproduced on each repetition of the song. So these are the bursts of activity that project down and control the motor neurons in the brain stem. So there's a complex sequence of bursts.
Now we wanted to ask where does that complex pattern of bursts come from? So we started going upstream to record from neurons in HVC that project down to RA to ask how are those bursts driven? And we found these really interesting neurons that generate a single burst of spikes on each repetition of that song motif.
So you can see three repetitions of the motif and that neuron generates three distinct high frequency bursts of spikes, each one of them 6 milliseconds-- about 6 milliseconds long. If you record from a whole bunch of those neurons in the same bird, you can put them together in a pseudopopulation using the song as a temporal alignment. And you can see that each one of those neurons is generating a single burst of spikes at one moment of the song and different neurons are active at different times in this one.
You might ask, is that song completely tiled with these neurons? Is every moment in the song represented by a population of those neurons? And in some recent work Tatsuo Okubo and Galen Lynch collected and analyzed a very large data set of these neurons, showing that as a population. So this shows the bursts of individual HVC neurons sorted in time in three separate birds, and you can see that the song is essentially completely tiled in time. There are some fluctuations in the density, but those are consistent with random placements in time of those neurons. So the song is completely tiled.
And so our basic model for how this works is very simple. These neurons are active at a single moment. As a population, they completely cover the song. There's no evidence for edge-to-edge alignment, it's just convenient to draw it that way in the diagram. Those neurons then activate downstream in RA, a complex pattern of bursts. And at each moment, that pattern of bursts then converges to output neurons in the brain stem, to the motor neurons, to drive some time-varying level of tension in those six muscles. So it's very mechanical. It's sort of a music box view of how this motor circuit might work.
Now where does that sequence come from in HVC? One simple hypothesis is that those neurons activate each other in a sequence, like a chain of dominoes. We refer to this as a synaptically-connected chain. This was proposed by a number of labs. There are other hypotheses that experiments have suggested are very unlikely but rather than-- we actually don't think it's one continuous chain in HVC. We think that this feedback loop, essentially at the end of one syllable, feeds back and activates the next syllable in HVC through that feedback loop and also serves to synchronize the two hemispheres of the brain because that projection from DM to [? Uva ?] is actually bilateral. That's the only bilateral interaction in this circuit.
So the idea is that there are these little modules in HVC, each of which generates a single syllable, and those modules that are essentially selected and activated by this thalamic input. So that's our basic hypothesis, the basic conceptual framework by which we think the song is produced. Any questions about that? Because I'll then turn to learning.
Yes. There's a lot of lateralization in these circuits, is that what you're asking about? So some birds actually dominantly sing their song through the control of the descending circuitry on the left side of the brain. Other birds actually sing different parts of the song using the left side of the vocal organ and other parts of the song with the right side of the vocal organ. So it's quite complex. There are actually two separate sound sources in the vocal organ of the bird. So they're actually using two separate instruments simultaneously.
And one really direct way of trying to see whether there are these synaptically-connected chains in HVC is to do dense electron microscopic reconstruction of the circuitry within HVC to look for evidence of these synaptically-connected chains. And we're collaborating with Joergen Kornfeld and Winfried Denk at the Max Planck Institute, together with [INAUDIBLE] at Google who does the data analysis, the image processing, and Mike Long at NYU to test this hypothesis.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Please.
AUDIENCE: What would that [INAUDIBLE] look like? I mean, it'd be like a bunch of the [INAUDIBLE] connected to that one, connected to that one?
MICHALE FEE: That's right.
AUDIENCE: Without knowing the function of each, how would you know?
MICHALE FEE: Yeah. So each one of these blobs, if you just look at the number of those neurons, that type of neuron that's in HVC and the number of distinct independent time slots. We estimate that there are about 200 of those neurons co-active at each moment in HVC. They're randomly distributed, as far as we can tell, within HVC.
And so the idea is you would basically have to image those neurons using a functional indicator in behavior. And that's the part of this project that Michael Long is doing. And then the idea would be to take that sample of tissue, do a density-- a connectomic reconstruction of it, find those neurons, and using the time at which they're activated, test the hypothesis that basically neurons that are neighboring in time are synaptically connected with each other. That's right.
So now let me talk a little bit about how one might go about learning that sequential output. So let me just come back to this picture real quick, because I forgot to say something. In this model, you can see that HVC is simply a sequence. You can produce any song you want. If you have a one-second long sequence, you can produce any one-second long vocal output you want. You just have to connect the HVC neurons to the right RA neurons to produce the right pattern of tension in the muscle.
So the score of the birdsong is really in this matrix of synaptic weights from HVC to RA. So that's the learning problem from the perspective of reinforcement learning. You have to learn how to map-- you have to learn how to specify. You have to specify what output happens at each time in the song. So there's the state space and the mapping from state to action occurs through these synaptic weights. Yes.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Yes. So zebra finches have one song. Other birds have more songs and they have separate sequences in HVC for each song.
AUDIENCE: Separate--
MICHALE FEE: Separate--
AUDIENCE: Time code neurons.
MICHALE FEE: Yeah, exactly.
AUDIENCE: And allocated to each separate song, right?
MICHALE FEE: Exactly. OK? OK. All right. So how does this reinforcement learning happen? So remember, the bird is trying to learn the synapses from HVC to RA to produce the vocal output. It turns out that in addition to the motor pathway, the bird has a whole basal ganglia circuit, a learning circuit known as the interior forebrain pathway that's involved in learning.
And this was shown by Sarah [? Bottcher ?] and Constance Scharff and others, who showed that if you lesion any of these brain areas, the bird-- in an adult bird, they sing just fine. But if you lesion those brain areas in a young bird before it's learned its song, the bird can no longer learn its song. It's stuck at the point where it was where you did the lesion.
And so Sarah proposed that this learning circuit somehow programs up or guides plasticity in the motor pathway to drive that system toward a good imitation of the tutor song. Now what I'm going to show you first is that before we turn to that question of learning, is that evidence that this LMAN circuit, this component-- let me just briefly say what the elements here are.
So LMAN is a cortical component of this, Area X is a basal ganglia circuit, beautiful work from David [? Brickell's ?] lab has shown that Area X is very much like its homologous to mammalian basal ganglia. And this forms a corticobasal ganglia thalamocortical loop that's very similar to corticobasal ganglia loops in the mammalian brain that are involved in motor learning habit formation, all kinds of different aspects of complex behaviors.
What I'm going to show you briefly is that LMAN plays a very important function in that circuit and that is that it plays a key role in variability generation. In fact, we've proposed that LMAN is actually a circuit-- a dedicated circuit that generates variability in this behavior for the purpose of exploring motor space for reinforcement learning.
So here's one little piece of evidence. If you take a young bird that is singing this highly variable song. This is at the border of subsong and the protosyllable stage. You can see it's highly variable but there are some protosyllables in there. And if you lesion LMAN, all of that variability goes away and the bird is left singing highly stereotyped protosyllable.
On the other hand, if you lesion Area X, you don't see any change in the variability of the song. If you record from neurons in LMAN, you see that there are bursts of activity during the subsong stage that actually occur at the onset of subsong syllables. So we think that in red here you can see these bursts of spikes and in black you can see the song amplitude. We think that LMAN somehow initiates subsong syllables and it does so in a highly random variable way.
LMAN is a variability generator that projects to RA. HVC is a sequence generator that projects to RA. And the idea is that young birds start out with the motor system being driven with the variability generator and then that gradually transitions. The effect of this LMAN input gets weaker, the effect of the sequential input gets stronger and the song gradually becomes more and more stereotyped [INAUDIBLE].
AUDIENCE: Michale, what is the timescale of those subsong snippets that were--
[INTERPOSING VOICES]
MICHALE FEE: The LMAN syllable duration distributions are exponential. So it's a Poisson process. Once you start a syllable, when it ends it's a Poisson process and-- yep. And the average duration is 100 milliseconds, which is the same as the average typical duration of adult syllables.
AUDIENCE: [INAUDIBLE] talking about HVC driving RA sequences, but if LMAN is just doing a moment by moment variability, like it looks like LMAN is driving the whole song [INAUDIBLE].
MICHALE FEE: It is. At that very early stage--
AUDIENCE: At that stage--
MICHALE FEE: At that subsong stage, at the very earliest stage before that LMAN lesion I showed you. If you lesion LMAN, the bird doesn't sing at all. Then if you lesion LMAN after you start seeing some of those protosyllables, then you lesion LMAN and you just see very stereotyped protosyllables. OK?
So let me show you what happens if you lesion LMAN at a stage where HVC and LMAN are converging to produce a variable, but somewhat repeatable song when you can start seeing distinct syllable types. So this shows an example of a song with LMAN intact, an example of a song with LMAN inactivated, with muscimol or TTX, and you can see that this particular harmonic stack syllable, if you take those out and plot those up here, there's LMAN intact and you can see those very large pitch fluctuations.
With LMAN inactivated, you can see that those pitch fluctuations are gone in the pitch. And here is the residual pitch. So you can see these very large fluctuations with LMAN intact and most of that variability goes away when you inactivate LMAN, OK? Nancy.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Yes. These are separate. These are these things extracted out and plotted next to each other.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Yes. Inactivating LMAN also increases the regularity of the sequence as well. Good eye. OK. All right. So what I've shown you is that while we have this hypothesis that this circuit is involved in programming up the motor pathway, what I've shown you is pretty convincing evidence that this circuit is also generating variability that drives fluctuations in the motor output.
So let me now show you some evidence that there's actually an instructive signal as part of that output. So I've shown you evidence that that circuit injects variability. Now let me show you evidence that projection to the motor pathway also contains an instructive signal that tells the motor system which way to go to make the song better.
So here's the experiment we did. We built a little device that allows us to monitor the birdsong. So there's a little hearing aid microphone on the bird's head. You send that signal away to a digital signal processor that makes a decision about that song and then sends sound back into the bird's head through a little hearing aid speaker that's implanted into the head-- into the air sac in the head that then has access to the inside of the eardrum. So it's like earbuds but on the inside of the eardrum.
So here's what we do. The microphone is picking up the bird song. We monitor the pitch of a harmonic stack signal in real time. And when that pitch crosses a threshold, we play a noise burst back to the bird. If the pitch trajectory stays below the threshold, we don't play a noise burst and we place the threshold in the middle of the distribution of pitch trajectories in a position so that the bird gets this feedback noise on about half the trials.
So you could imagine that in-- exposed to this type of auditory feedback, that those noise bursts would affect the song in a way that the bird would start avoiding those noise bursts. Now you might ask, well, maybe that noise is just aversive. Maybe they can't stand the sound.
Jesse Goldberg, who was a post-doc in the lab, when he later left the lab, did this cool experiment where he had birds in a box with a perch on either side. Noise bursts played on one side, thinking, well, maybe the birds would avoid the sound. In fact, they spent most of their time on the side with the noise bursts. So it's not behaviorally aversive. But what does it do to the song?
So here's what happens. Each dot here is the average pitch of a particular-- one rendition of this harmonic stacked syllable. You can see that there's some variability in that pitch, but over the course of a few hours, you can see that the average pitch of that syllable drops to avoid the threshold. If you look at the individual pitch trajectories, again, you can see that the pitch trajectories just drop down as a whole so that you very rarely cross the threshold.
Now, the question is, what role does LMAN have in that change in the pitch? So you can see that in the presence of this noise that only gets applied to some pitches, to some syllables, you can see that the pitch changes to avoid that threshold. And the question is, what is the role of LMAN in that learned pitch change?
So Aaron [? Andelman ?] did this beautiful experiment where he drove this pitch change day after day in birds and asked, what happens-- so if we drive that learning in a bird and then inactivate LMAN after four hours of learning, what happens to the pitch change? What happens is the pitch change reverts back to what it was in the morning, on average.
So LMAN, whatever LMAN is doing, it's driving a change in the distribution of pitch fluctuations in the direction of improved performance, avoiding errors. And if you inactivate LMAN, the song reverts back to what it was before. Now there's a lot more of this story to tell, and I don't have time to give you the full picture, but I want to kind of give you a summary of what these results point to.
So the idea is that this is motor parameter space. So here's motor parameter space. If this is time, then the motor parameters are going through some trajectory in time. And on a given trial, let's say at that moment those motor parameters have that pair of values, tension one versus tension two. LMAN injects variations so that each time the bird sings, those motor parameters are different. So that's the variation. That's the noise that LMAN is injecting.
In the presence of an air gradient, where these fluctuations produce an error, you can see that the fluctuations get shifted in the direction of improved performance. And we've done experiments that suggest that the motor pathway stays where it is for a day, but then the next day, the motor pathway, meaning the HVC to RA connections, have been dragged in the direction of improved performance. So there's this initial change in the fluctuations that LMAN generates. And then that gets consolidated into the motor pathway over the course of the next 24 hours and if there continues to be a gradient in that direction, it keeps dragging the motor pathway in the direction of improved performance. Does that make sense? Yes.
AUDIENCE: In the direction of improved performance, because the noise doesn't sound like the tutor song, but it's drifting in pitch, then the pitch is different from the tutor song. So it's not perfect? Right?
MICHALE FEE: Yeah. So let me just-- let me go back to this picture here. So the idea is that LMAN is driving fluctuations in the output. This circuit is figuring out which one of those fluctuations made the song better and it's reactivating the good fluctuations more often.
AUDIENCE: [INAUDIBLE] good before meant similar to my stored model. And now good means avoiding creating this exogenous error.
MICHALE FEE: Yes, but creating this noise burst is a big deviation from the model.
AUDIENCE: That's the worst bad song than having the wrong pitch.
MICHALE FEE: Yes. And this bird has already pretty-- it's a young bird, but it's pretty much learned its song. So all of its pitch fluctuations are way closer to the model than this noise burst. Sorry. Does that make sense? So the idea is that this circuit is somehow figuring out which direction in this motor control space makes the song better. At that moment, remember that noise burst is targeted to a particular syllable. So at that moment this-- the idea is that this circuit figures out which direction in motor command state space makes the song better, or less bad in that case.
And then the song the motor system learned, it undergoes plasticity in that direction. But remember, what this thing is doing is it's activating neurons at a particular time. And so simple Hebbian learning can do that consolidation process. Because this input is pointing in the direction of this very high dimensional control space which direction makes the song better. Yes.
AUDIENCE: You just mentioned high dimensional control space. Is there an estimation on what is the dimensionality of the control space that can be controlled by this learning?
MICHALE FEE: Yeah. So there are about six muscles on each side of the vocal organ. So there's a question of the resolution of that control space, that tension parameter, let's say, on one muscle. So there might be 100 effective resolution elements in that analog space and the temporal resolution is maybe 10-- 5 to 10 milliseconds. So those are the relevant numbers. So some product of those numbers. Does that seem like a reasonable calculation?
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Yeah. So how would LMAN figure out which fluctuations make the song better and which fluctuations make the song worse? So it turns out that every axon that goes from LMAN to RA has a collateral that goes to the basal ganglia. So Area X, so every axon that drives a fluctuation, and there are about 6,000 of those neurons in LMAN, there are 6,000 collaterals that go to Area X. So Area X has an image of what that fluctuation, what fluctuation LMAN is driving into the motor output at every moment in the song.
Now if only Area X had a signal that said whether the song was good or bad, whether that fluctuation made the song better or worse, then X could figure out which fluctuations make the song better and which make it worse. Well, it turns out that there is a pathway from auditory cortex to Area X that goes through the ventral tegmental area, which is a classic area that is thought to be involved in conveying reward or reward prediction error signals.
So we discovered a pathway from auditory cortex through this layer 5 part of the avian brain, down to VTA. And David [? Brickell ?] has shown that there is a population of VTA neurons that projects to Area X. And so we wondered whether neurons in this pathway carry error signals. We recorded neurons in this area called AIV that show robust responses to those noise bursts during this playback of these bursts. And Jesse Goldberg recorded from X-projecting VTA neurons using antidromic identification to characterize the error responses of those dopaminergic neurons. And here's what they found.
So again, antidromic identification using stimulation in Area X, if you record from those VTA neurons while the bird is singing, and on half the trials you play a noise burst targeted to one syllable-- and these experiments were done after the birds were exposed to this random 50% probability of noise bursts targeted to one syllable for a day or so. If you record from one of those VTA neurons, you see that those dopaminergic neurons exhibit a suppression of spiking on trials where you play the noise burst, and an excess of spiking on trials where the noise burst is missing. So they are showing a decreased response to a song that's worse than average, and an increase in response to a song that's better than average.
And that's exactly the kind of behavior that Wolfram Schultz and others have identified in dopaminergic neurons in primates. You get a burst of activity when the outcome is better than expected and a suppression of activity when the outcome is worse. So now-- so there is a signal dopaminergic signal that carries performance information, not just performance but performance prediction error signals, and so how would this actually work?
So here's our grand hypothesis for how reinforcement learning works in the songbird. You have a song motor pathway, generates a sequence, some activity in RA that drives this song. LMAN injects variability into the motor pathway. A copy of that variability signal goes to Area X, the basal ganglia. Area X receives a performance evaluation signal that goes to Area X. And now Area X can figure out which fluctuation made the song better and which made it worse.
Now it has to do that calculation at every moment of the song independently because a fluctuation might make the song better at time 1, but the same fluctuation might make the song worse at time 10. So it turns out that HVC sends a projection from a population of neurons that carries the same kind of timing information that it sends to RA, to Area X.
So now Area X can figure out which fluctuation at each time makes the song better. And then the idea is that timing information can then be used to reactivate, in LMAN, the fluctuations at each moment that made the song better. And then the idea is that that biased variability then activates the appropriate RA neurons at each moment to make the song better and that strengthens the appropriate synapses between HTC and RA. So that's the big picture hypothesis. Yes.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: Great question. So there's a lot of work being done on that in our lab and other labs, and I will try to get back to that at the end. I'll try to reconnect so the question there, the kind of really interesting question is, when the bird hears the song, does auditory cortex need to know what time it is, so it's comparing the vocal output that the bird is trying to sing with the appropriate point in the auditory memory of the song?
And so at the end I will try to come back to this connection. So there exists a bidirectional anatomical connection here that we think is important for weeding out the auditory memory at the right moment in the song.
AUDIENCE: [INAUDIBLE] bird, you won't get the VTA response.
MICHALE FEE: No. None of those signals are responsive when you just play a song to the bird. They all require that the bird actually be singing. OK. Yes.
AUDIENCE: This may be a question better reserved towards the end. So naively, I would imagine for such a sequence learning where there is a tutor song, that the bird would use something more like the mammalian cerebellar system to do some supervised learning. So can you shed some light on why the bird apparently choose the reinforcement learning system, which is perhaps less data efficient and its harder to learn higher dimensional sequence compared to using a supervised learning system?
MICHALE FEE: So there are people thinking about the possible role of the cerebellum in this. There's-- to my view, the evidence is not terribly compelling at this point. The anatomical connections are barely there. And there's no evidence that I'm aware of that lesions of any part of the cerebellum lead to song learning deficits that can't be explained by more global behavioral deficits.
So the question of why supervised learning couldn't be used? I mean, there are some interesting ideas about this, but I don't know the bigger question of why supervised learning would be a less suitable answer to this. But it actually sounds like an interesting conversation.
So we have developed a very specific circuit and synaptic level model for how that learning could occur. So the idea is that LMAN is active at random times. HVC neurons are sequentially active. VTA is active whenever LMAN activity at a particular time leads to a better outcome. And the idea is that if you strengthen HVC synapses, after a coincidence of LMAN, HVC and dopamine inputs that, for example, if LMAN activity at time point 3 leads to a better outcome, then you strengthen this synapse at time point 3 and the next time the bird sings, HVC neuron 3, time point 3 then activates this feedback loop and reactivates LMAN at time point 3.
And so this model actually has some very interesting synaptic level predictions about how HVC and LMAN synapses interact at the level of medium spiny neuron dendrites. And we're actually doing dense EM reconstruction of basal ganglia circuitry, Area X circuitry in the songbird, to specifically test those predictions-- namely, the HVC inputs should land on dendritic spines of MSN dendrites, where LMAN inputs should land preferentially on shafts. And it turns out-- so I made that prediction before we did the reconstruction and that turns out to be exactly what happens. It's pretty interesting.
So now let me spend a few minutes talking about how those sequences emerge during development. Because, remember, in order for that reinforcement learning circuit to work, you need you need these temporal sequences not only to drive the song, but you also need those HVC sequences to tell the learning circuit where in the song you're trying to sing in order to do an independent reinforcement learning process at each time.
That's why this state space is so important. So how does that state space emerge during development? So, remember, here's the development of the songs through different stages-- subsong, protosyllable, multi stage and adult. And what I'll do is, I'll show you recordings in HVC at different stages of this song development and see if we get a picture of what that development of those sequences looks like.
So let's start with subsong. And we're going to try to understand how those sequences emerge. So let's start by looking at HVC during subsong. And so Tatsuo Okubo, when he was a graduate student in the lab, recorded from HVC neurons in subsong using this 3D printed, very lightweight, motorized micro drive. And what he found is that most of the HVC neurons, these RA-projecting HVC neurons, are locked at subsong syllable onsets, just like LMAN.
And if you record-- if you make a raster plot of when those spikes are active on different subsong syllables, so each one of these. So the red line-- each one of these rows or the spikes during a different syllable rendition, sorted by syllable duration, and you can see that those neurons are generating a burst prior to syllable onset. So you can write down the time at which that neuron is active prior to syllable onset.
You record a bunch of these neurons and plot a distribution of when those neurons are active. And on average, they're active before syllable onsets. The HVC neurons burst before the subsong syllable onset.
So now let's go to the next stage, protosyllable stage, and record some HVC neurons there. So Tatsuo recorded a bunch of those neurons. And what he found is that-- what he found is that those neurons, even at this very early protosyllable stage, generate a rhythmic sequence of bursts.
So it's like, even though those syllables don't all look the same, there's a single rhythm in the neural activity at 10 Hertz as the bird is singing at that early stage. And different neurons are active at different point-- different time points within that rhythm. So there's a single 10 Hertz rhythm and all the neurons are entrained to that rhythm. All right. Interesting.
Now what happens at this multisyllable stage? So what's interesting is that the way birds develop new syllables was actually highlighted by [INAUDIBLE] I should-- I forgot to put his name down here. [INAUDIBLE] showed that at this early stage, going from the protosyllable stage where birds have this protosyllable, to distinct syllable types, it looks like those protosyllables actually bifurcate, like alternate versions of that protosyllable start looking increasingly different.
So here is a parameter space, an acoustic parameter space, and each one of those is a syllable. You can see initially there's this big blob of acoustic features, and then as the bird develops, you start seeing two distinct syllable types kind of emerging out of that blob. Now what's going on at the neural level? So here's what we found.
What we found is at that early level you have this really interesting-- two different kinds of neurons. You've got neurons that are active on each one of those two different emerging syllable types. We call those shared neurons. And you have neurons that are specific for the individual new syllable types. So shared neurons and specific neurons. And the earlier you go, the more shared neurons there are, and the later you go, the fewer shared neurons there are. So early on they're all-- most of those neurons are shared, and then increasingly more and more neurons become selective for one of those two emerging syllable types.
And that's consistent with the idea that early on that protosyllable was produced by one big fat chain and every neuron is active at a different point in that chain and that chain kind of loops around. Just kind of runs, the end connects to the beginning and it keeps going. And the idea is that neurons here, they're active on every repetition of the protosyllable 10 Hertz rhythm. And here you can see that this neuron, if you take that big fat chain and you start cutting it down the middle and disconnecting one part of-- one half of the chain from the other half, you'll have neurons that start becoming selective for one chain or selective for the other chain and you'll have some shared neurons.
But the further you go in splitting these this chain in half, the fewer and fewer shared neurons you'll have. OK so that's the picture from the neural data and we also-- and let me just make one other point. That in the-- early on in the subsong stage, most of those HVC neurons are active prior to syllable onset. But the further the bird develops, you can see more and more of those neurons are active after syllable onset until in the later stages they're more uniformly distributed across the syllables.
So this led us to the idea that there is a kind of a growth of chain and then a splitting of that chain into different chains for each syllable. And we wondered whether we could actually develop a model by which that simple process could actually work. Does that-- does that word model make any sense from the perspective of real neuron-- model neurons connected to each other with model synapses.
So we did this modeling project. And the idea is that-- remember, early on, LMAN drives RA to produce subsong. Now remember that feedback loop from RA to the midbrain back to HVC? We think that's the pathway by which HVC neurons get activated prior to syllable onsets in the subsong stage. And now a simple Hebbian learning rule with synaptic competition that [INAUDIBLE] actually proposed in a paper in collaboration with Richard Hahnloser, you can take a simple Hebbian learning roll with competition, activate a subset of neurons in a randomly connected network and they grow a chain.
And then the idea is you simply connect the end of that-- remember, these are projecting to HVC and the end of that can then reactivate the beginning of that loop and make a 10 Hertz rhythm. So now, how do you actually make multiple syllable types? Well, the idea is that if you have different populations of neurons in the thalamus or in another part of the brain that connects to HVC, if you have different populations of neurons that act as those seed neurons-- I forgot to mention we call these seed neurons, the idea is that these inputs can actually shape the way that this splitting process occurs in HVC and you can actually get splitting of that chain.
And let me just show you what this looks like in a movie that was developed by Emily-- a model that was developed by Emily [INAUDIBLE] as a project at Woods Hall in the Methods in Computational Neuroscience course. So here we activate those neurons at syllable onset. And you can see that that chain grows.
AUDIENCE: [INAUDIBLE]
MICHALE FEE: What's that?
AUDIENCE: Why 10 Hertz and not 5 Hertz or 2 Hertz?
MICHALE FEE: That's just the frequency. So most of them are between 5 and 10 Hertz. Why, I don't know. Other birds sing much faster syllables. So grasshopper sparrows sing syllables at 30 Hertz, white crowned sparrows, it's more like 1 Hertz.
So then what happens? So now what we do is, we take those seed neurons and we activate them in two separate populations. And watch this. The exact same learning role that drove the growth of that chain? OK, that's very sad because it's a really beautiful movie.
Basically what happens is, you can see these synaptic connections here gradually get broken. And you get basically a splitting of this chain that runs right down the middle so that you get now two independent chains that can then serve as the temporal basis for learning two separate song syllables. And we have experimental evidence that those-- that you can get hierarchical splitting, where you can get splitting of a chain from the initial protosyllable chain and then one or both of those can then split into subsequent chains for even more syllables.
All right. So I will just very briefly turn to the last topic, which I'll just touch on quickly because we're out of time. But basically, the question is, how-- what is it that controls that splitting process? So remember, when birds are born, they don't know exactly what song they're going to try to imitate. They don't know how many syllables they need to imitate.
So they're-- the growth of those chains, the splitting of those chains needs to get input. It needs to be controlled by the auditory experience that the bird has. And it turns out that there are very strong interactions between auditory cortex and HVC that we and others have hypothesized are involved in shaping the formation of those chains and which Mooney's lab and Todd Roberts and others have identified very clear anatomical interactions between auditory cortex and HVC.
And we are recording in HVC during tutor exposure to try to understand how this auditory input could shape HVC during development. We're doing things like imaging in HVC, doing functional imaging--
[BIRD SINGING]
The movies aren't working. It's really--
[BIRD SINGING]
--a pretty picture of neurons flashing while the bird is singing. And to try to understand the process by which that auditory exposure shapes the emergence of sequences in HVC. And I think what I should do is stop there and thank you for your attention. Thank you.
Associated Research Module: