Attention (1:16:35)
Date Posted:
August 20, 2018
Date Recorded:
July 26, 2018
CBMM Speaker(s):
Robert Desimone All Captioned Videos Brains, Minds and Machines Summer Course 2018
Description:
Bob Desimone, MIT
Overview of the neural basis of attention in primate vision, evidence for its computational role in enhancing the neural encoding of attended objects in areas V4 and IT, models of the underlying neural circuitry, implementation of attention through neural synchrony within and across cortical areas, role of the frontal eye fields and ventral prearcuiate region of prefrontal cortex in top-down feature-based attentional control.
PRESENTER: Well, I'm so sorry I can't be here with you. But I'm here virtually, and I'll be taking questions in the later Skype session. I will be giving you some background on the neural basis of attention and our work, in particular. And I want to start off with some examples of the profound effect that attention has on our perceptual systems. So in this example, and I apologize if you've seen some of these examples before, but in the example you're going to see, you're going to see a complex slide flashing on and off.
And all I want you to do is just mentally note when you notice something changing from flash to flash in the scene. So we'll start that. Are you ready? OK. I know it's early here, but has anyone got it yet?
Well, I can't see you, but I'm assuming that at least some person is raising their hand by now. But, for those of you who haven't gotten it, I'll just point out here, the airplane engine that's going on and off. You see that there? Can you see that now?
Well, some of you might say, well, that's a little unfair. I wasn't really paying attention to the airplane engine. And that's exactly the point that, without specifically paying attention to the airplane engine, you actually don't see it. And I can assure you that, if we recorded your eye movements while watching this scene, we would have seen that your eyes passed right over the airplane engine. So it wasn't as though your eyes weren't seeing it at all.
It was stimulating your retina, but it wasn't getting the image to your soul. I'll give you another example in the upcoming image, and this one is going to be a flashing-- sorry, this is going to be a scene that there's going to be, what we call, mud flashes appearing on the scene from time to time. And then you simply note where something's changing in the scene in spite of the mud flashes.
Has anyone got this one? Again, I can't see you, but I'm assuming that someone has gotten it by now. And I will point out the line on the street here going on and off. Again, so if something that you weren't specifically attending to, you may not have seen at all.
And I give you the last example, in case you think there's some trick involving with flashing, and this comes from Aude Oliva. And it's a scene that just gradually changes. No flashing. Again, I want you to mentally note if you notice something changing in the scene. We'll start that.
So did you notice anything? If not, I'll point them out to you. So here's all the things that were changing in the scene. Doors were appearing. People, signs were changing, and so on.
There were dramatic differences from the beginning to endpoint of the scene. And, most likely, you didn't notice them, or certainly didn't notice all of them. This is because they were changing slowly. There wasn't a strong temporal transient to attract your attention. And that's what we did in the other two examples where I had the scenes flashing and the mud flash, and so on, this to mask the temporal transient.
So, if something changes dramatically and seen in time, the sharpness in time for the change will attract your attention, and that would cause you to see it. But if you're slighly temporal transient, you just don't see these things. And so we know that attention can have a profound effect on our visual perception. And we'll get to why that is and how that is, what are some of the neural mechanisms that explain these perceptual phenomena.
Now, all of our work takes place in the visual system of the primate. And we're talking today about our work in the monkey cortex, we'd also do some work in a human cortex. And the primate visual system, I know you know from some of your other lectures in the cortex, at least, it starts with visual area V1 here in the back of the brain. And then goes through this pathway for object recognition, the occipital temporal system for object recognition that includes areas of V2, V4, and IT cortex. Some of these areas, in turn, get projections through the Pulvinar.
Now, as you progress along this pathway into the ventral stream, the processing of visual objects gets more and more complex. And I'm sure you're going to be hearing more about this throughout the talk. But in V1, you have very simple stimulus feature. You have [INAUDIBLE] type responses from cells. Cells responding to simple lines and edges passing through their small receptor fields. And then, as you go through this hierarchy of areas in V2, V4, and so on, stimulus processing becomes more complex.
You see more complex features being analyzed by cells. And by the time you get to the inferior temporal cortex, you can find cells that are selective for the image of even a face. And I took this image here from our very first paper on face select of cells that shows the actual action potential change because that was an era of time, a very long time ago, when you would typically illustrate your results with the actual action potential of chains so that you can demonstrate to people that you could actually record an action potential. And so there's this ever increasing complexity as you go into the ventral stream. And there's another change as well, and that is there's this progressive increase in receptive field size.
So as you go from V1 with sort of pixel size receptor fields, then through V2, V4, or ATO, and then IT cortex. There's an increase in size until you get to IT cortex and the receptive field of a cell could include this entire [? realm. ?] And it's thought that one of the reasons for having these large receptor fields is it helps promote or mediate the recognition of objects in spite of their retinal location. So you recognize an object, whether it's here on your retina or here and your retina and so on. And the large receptor fields in IT cortex are thought to contribute to that.
Likewise, changes in size can occur within these large IT receptive fields. And that we found through many years of studies of this system that there's a progressive filtering out of information that is not attended to. And I'm going to be explaining more of that in a moment. But, by the time you get to the IT cortex, you have largely a representation of the intended item. So, in this case, the student which is attending to the task at-- I didn't mean to do that.
How can I get back? There. You have the student attending to the task at-hand, and everything else gets filtered out except the book that she's attending to. So I told you that we would talk a bit about why we have this attentional system. What's the computational purpose of us having this attentional system?
And the basic idea we have is that it's based on a notion that we've called Bias Competition, is that objects are represented in the cortex by the activity of large populations of cells. So, for example, if you have this car here, and you can imagine this being represented by this neural population represented here by this the vector of neural activity, different cells spawning different firing rates according to the preferred feature of the cell. And you have another object, like the tomato here, and it's going to evoke a different pattern of activity across the same population, but different patterns of activity. And that's the code, in this case, for the tomato. But the problem comes when you have multiple objects in the scene, which is our usual situation.
Our scenes are very cluttered with lots of objects. And, now, since the same population has to represent different objects, you're going to get some very complex combination of activity according to the features in the different objects as you see here. And that's going to degrade the code for any individual object. But the idea is that, with attention, top-down signals to our visual system specifying what it is that we plan to attend to will bias the population activity in favor of just the features of the attended object. And, therefore, the population code now becomes more pure of the features of the attended object.
That's the basic idea. So that's why we have this system. We have the large receptor fields to mediate in variance. But then we need a system for selecting objects within those large fields. So that's the idea, but did an actual experiment to test this quantitatively in collaboration with Tomaso Poggio and Ethan Meyers. We record it from neurons and IT cortex and recorded the activity of cells. And then used a pattern classifier to relate the activity of cells to the image that the monkey was observing on the screen.
So here, the monkey's observing the kiwi, and it evokes a certain pattern of activity that goes through a classifier. Which learns to associate the image with this pattern, and we can do this for different images. And then, once we train the classifier, we can make predictions about what it is the animal's seeing on the screen based on that neural pattern of activity.
So we see one pattern activity is well, it should be the kiwi, another should be the face, or whatever. And so the predictions can be right or wrong, and then we can test the coding ability from the pattern of cells.
So the actual experimental design is illustrated here. We had the animal fixating and maintaining fixation on a spot, and then we would place three stimuli inside the receptive field of the IT cell. Whoops.
Three stimulants inside the receptive field, and the sequence of events in the task was as follows. The monkey would start fixating, the array of three stimuli would flash on, stay on, then the monkey would get a very dim line segment pointing towards one of the objects. And that would tell the monkey that it should pay attention to that object because, at some random point in time in the future, that object would change color very slightly.
And then at that point, the animal should make an eye movement to it. And if one of the other unattended objects were to change color, the monkey would not make eye movements to that. And then we could run a-- Sorry. This thing is great, but it's a little flaky. And then we could run a decoder at different points of time along the trial to get the decoding performance.
Now, the stimuli that we used were from different classes of objects. We had cars, fruits and vegetables, faces, furniture, and as you know, these are the four major classes of objects in the universe. And then we paired them in different ways, and as I said, we did our analysis in a sliding window. So at all points during the trial, we can compute the decoding performance of the network. And we express this as an area under the ROC curve where one would be perfect classification, and 0.5 would be chance classification.
So let's see what happens in an experiment like this. Here's the monkeys fixating. Here, the car appears on the screen, and then over here, you see the decoding performance, which before the object appears, is a chance and then pops up to be pretty good. Over 80% correct when the objects on the screen, and in this case, there's only one object on the screen. Presumably the monkey's paying attention to that one. There's no there's no competing clutter.
But now, if we go to the case where there are three objects on the screen, now we see much more complex pattern of results. The purple line shows the response to that object when it was shown in isolation, and the red and green lines showed the decoding performance for the object that we've cued the animal to attend to. And the object that the animal is not attending to, and the cue is appearing here at 500 milliseconds.
So you can see until the cue appeared, the decoding performances is pretty poor. But then, as soon as the cue appears, now the decoding performance starts rising for the attended object and declines for the unattended object.
Now, it never reaches the decoding performance for the object in isolation, and that is consistent with what we know from behavior, which is that by attending to something in clutter, you can improve recognition. But you don't get it up to the point you'd have if the object is presented on a blank screen.
Now, that's in terms of decoding, if you want to know. But what actually happens to firing rates of cells? The answer is the firing rate of cells are going up and down with attention. It's really the code that improves.
And this shows you the average firing rate of cells to what we consider their best stimulus. So if you test each cell separately, and find out which stimulus does it respond best to, and then which one does it respond worse to, and we're just going to track the responses to the best and worst stimulus across the population for best and worst determined for each cell. The red line here shows the response to the best stimulus in isolation, and now the dark purple line is the response to the worst stimulus in isolation.
And now, you can see here in the middle are the response to attend to best and attend to worst. The pink line is the response to the best stimulus when you attend to it, and this lighter line is the response to the worst stimulus when you attend to it. And you can see that even though the stimulus on the screen isn't changing at all, but the response is changing dramatically.
So if the animal's attending to the best stimulus for any given cell, response is going up. And if it's tending to the worst stimulus for that cell, response will be driven down in the direction of the response you would have had in isolation.
We also noticed that the output of the decoder gave us some insight into what the animal was doing internally. I mentioned to you that the unattended objects could also change color during the trial, and the animal was supposed to ignore those. But in fact, the output of the decoder suggests that when the unattended object changed color, the animal switched its attention to it briefly and then switched it back to the attended object. And you see this here where we're tracking the classification accuracy of the decoder for the attended object.
And here at time zero is where the distracting object changed color. You can see that all of a sudden, the output of the decoder drops way down but then pops back up in a couple hundred milliseconds. Attention was switched away from it and back, whereas if we track the decoding performance for the unattended object, at the same time, it's popping up as though now the population code is reflecting the properties of the previously unattended object. And it's being processed briefly and then shifts back down again. And so the animal is now switched its attention back to the attended object.
And based on our studies of the firing rates of cells, we believe that this kind of process is going on at every stage along the hierarchy in these receptor fields that are ever increasing in size. So it's a multi-stage filtering operation.
So if we go back earlier in the system to see how this filtering works there, we can go back to an experiment even earlier in time where we just were tracking changes in firing rates back in a visual area before, I think, which gives you a sense of what happens in these areas with the smaller receptive fields. The V4 receptive fields are medium in size. They're not like V1 pixels. They're not whole rooms like IT cortex.
So the situation we have here is here, this box represents the V4 receptive field that's mapped for a cell. The animal's fixating a spot, and then we find, for any given cell, it's preferred good stimulus. In this case, it was the horizontal red line, and we find a poor stimulus for the cell. In this case, it's the yellow bar over here. And then in another situation, we pair these together, the red and the angled yellow.
So now we'll look at the response to these different conditions. The red line shows the response to the good stimulus in isolation. The yellow line solid line is the poor stimulus in isolation, and then when we pair them without attention, the animal's doing a task at fixation, you can see the response is intermediate.
But when the animal attends to the good stimulus, here we just have the attention to the poor stimulus. And so now the pair is there, but the animal's attending to the poor stimulus. And you can see the response is driven down close to what it would have been if the poor stimulus had been presented in isolation. Whereas on a different trial, if the animal's cued to attend to the preferred stimulus, now the response to the good stimulus pops up. And it's more similar to what you would have had had the good stimulus been presented in isolation.
So again, the idea is even in these areas with smaller receptive fields that without attention, cells are doing something like averaging. It's this mixed code that's not specific to one individual object. But with attention, now the code becomes more specific for the attended object, whether it happens to be the preferred or non-preferred stimulus for a cell, different cells are going to be preferring different stimuli. But the code coming out will be specific for that object.
And one metaphor that we used many years ago was it's as though with attention, the receptive field shrinks around the attended object. So it no longer includes all the other unattended clutter. And our thinking has evolved since that time, but it's still not a bad initial way to think about it.
So what else happens when an animal attends to an object? One thing we've noticed is if there's just a single object in the cell's receptive field. So there's no competition. There's no cluttered code, then you can still see effects of attention on the responses to that stimulus. And that's shown here.
Here we have the same cell I showed you previously, but in this case, now we've moved the competing stimulus outside the receptive field. You can't do that in an IT cortex because the receptor fields are too big, but in these areas with smaller fields, you can do this experiment.
And what you can see here is that when the stimulus comes on, the stimulus is on during the time of this horizontal bar, there is an increase in response when that stimulus is attended compared to when it's ignored. And in addition, you can see that even prior to the stimulus coming on, when the animal knows that the stimulus is going to appear in the receptor field and it should be attending to it, you have higher baseline firing rates of cells in the attending versus unattended condition, as though the system has become more sensitive at that location to stimuli.
And that, indeed, we see evidence for that if we plot the contrast response function of a cell with and without attention. Where here, we're putting a stimulus in the receptive field, which varies in contrast from 0, invisible, up to 100%. And we plot the response as a function of contrast. This is average across the population. The yellow line shows the contrast response function in the ignored condition. The red line in the attending condition.
And you can see that in the attended condition, it's as though the contrast response function has been shifted to the left, which means it's as though the stimulus is higher contrast. The cell has become more sensitive to contrast with attention, so it's become more sensitive.
Now what's a model that could explain these kinds of results well we came up with a model with John Reynolds quite a few years ago a simple model that would explain this is a normalization model in which the target cell in this case y here gets inputs from different subpopulations of cells at an earlier level.
So for example, Y could be a cell in IT cortex, and it gets inputs from populations of cells in area V4. And the V4 cells have the small receptor fields. The IT cells have the larger receptive fields. The V4 cells have different receptive field locations within that larger IT receptive field.
And the target cell, the IT cell, is simply summing the excitatory inhibitory weights from the subpopulations that are supplying inputs, and it's averaging those in a simple normalization equation that we had developed back then. And there's been other variants of this over time.
And that the attention model is simply that when the animal would attend to a stimulus, the weights from the population carrying that information about the attended stimulus would increase. Quantitatively, this would mimic the effects that we find on the cells firing rate to the different stimuli and the receptive field just by increasing the weights or the sensitivity to inputs from the attended stimulus.
Now, more recently, John Reynolds and David Heeger have developed a more elaborate version of this model in which they take into account not just the receptive field input, the inputs to a cell from the receptive field, but also stimuli from outside the receptive field, the spatial parameters of the stimuli, as well as the size of the focus of attention. These are all built into their models, and they can predict the response of cells to stimuli inside, outside the receptive field, big stimuli, small stimuli, stimuli extending into the realm, and so on. And it's quite a comprehensive model that I urge you to look at.
So now this is all based on just increasing the sensitivity of cells to inputs, but what's the actual mechanism for doing that? And one of the things that we've become interested in over the years is the possibility that there's not just the changes in firing rates of cells that are causing these changes, but there might be some changes in the synchronous activity of cells, the timing of the action potentials that could have an impact on how cells communicate with each other.
Just to give you an idea of what this would mean, cells have a certain temporal integration time for inputs. So if spikes appear to spikes occur at a cell within a certain temporal window, they will sum and more likely change the cell's membrane potential, giving an action potential. But if they're too sparsed in time, they don't sum, and so you have less of an effect on the postsynaptic cell.
And an example here is from an experiment by Jeff McGee, in which he was imaging the dendrites of cells and stimulating the dendrites with caged glutamate. And then doing that, he could stimulate different spatial points along the dendrites, as well as different times in different locations. And what we have plotted on the upper line is the number of the membrane response to these spikes on the input as a function of the number of those inputs.
And the upper line is when those inputs occur within a narrow window of time, whereas the lower lying curve there shows you the inputs that are more separated in time. And you can see that for any given number of inputs, the membrane response is larger if the synaptic inputs are closer in time.
And that would suggest that if activity is synchronized so that spikes are coming from the different inputs into cells or clustered in time, they'll have a bigger synaptic effect. You can see an example of how this occurs in an experiment, again, that we did some time ago, in which we recorded from area V4 an animal doing an attention task.
And here, I'll show you a little bit more about how an animal is doing an attention test. What are the test parameters against the animal to do that?
So in this experiment, the animal's fixating a spot in the center of the screen, and we map the receptive fields of the cell. We place one stimulus inside the receptive field, other stimuli outside. In this case, they're all colored gratings, and the color of the fixation stimulus tells the animal which one it should pay attention to.
So the stimuli appear on the screen much like in that earlier experiment I showed you, and at some point later in time, the fixation spot changes color. In this case, it turns red. It tells the animal pay attention to the red stimulus because at some random moment in time, that red stimulus changes color slightly. And the animal releases a bar, and it gets a juice reward. And if one of the distractors changes color, it ignores those.
And then we instruct the animal from trial to trial, shift the animal's attention from one location to another, and we randomize the locations of all the stimuli, and so on.
So when you do that, you find that first of all, the animal's attending to the stimulus in the receptive field. You can get this increased in firing rate like you see here. The red line shows the response to the attended stimulus. the blue line to that same stimulus when it's ignored, so there's some increased sensitivity of the cell.
Over here on the right, then we look at the coherent activity across the population and the attended and unattended condition. What we do here is we have multiple electrodes in the cortex, and we measure not just the action potential chains on each electrode, but also the local field potential.
Now you've probably heard about the local field potential in the course. The local field potential is a slow potential that is reflecting the integrated activity in some small volume of neuropill. On the order probably of a couple hundred microns or so, and when cells tend to fire together, it's going to cause currents in the cortex that will cause the local field potential to go down to lower values. And when they're deep, they're hyperpolarized. They go back to the other end.
What you see is that the local field potential is oscillating back and forth and in some measure of temporal fluctuations in a pattern of activity in that population. And one of the things we do, we use that as a time base, and then we look at the timing of spikes in relationship to the phase of these oscillations in the local field potential that we can do at different frequencies.
And if we plot that, we can compute a measure of coherence, which reflects the phase synchrony of the spikes. What we find is that there's an increase in phase synchrony in this frequency range from 40 to 70 or 80 Hertz or so, which is known as the gamma frequency range.
And this is a frequency range that puts spikes into a temporal window on 20 milliseconds or so, which is thought to be temporal integration time of cells for their different synaptic inputs.
And so if you think that temporal synchrony plays a role in the potential modulation of cells, then this is positive evidence for that. That increase of spikes in this narrow temporal window.
Now you'll also see that here in the low frequency range is an opposite effect in the theta alpha range. There's actually less synchrony with attention, and this is another whole story in itself. When the spike field relationship is changing in this low frequency domain, this is, we believe, is outside of the normal temporal integration window of those cells.
And basically, this is bad for a decoder when the cells are fluctuating in phase and low frequencies that's unrelated to the stimulus. And the cells are not adding up these spikes. What the system would like to do is average across these fluctuations, and when you have more of these low frequency fluctuations, they're not good.
This has been referred to as the noise correlations that are not related to the stimulus, and there's another whole line of work by John Reynolds and Marlene Cohen and others showing that these noise correlations go down with attention, which is a good thing. And that's what we're seeing here. There's less fluctuations of activity with attention that are unrelated to the stimulus.
So now what happens as you go successive stages of the pathway? As we go from V4 to IT cortex, we see the following results. I've already shown you that IT cells are modulated by attention. Here, we have placed, again, multiple stimuli within an IT receptive field. We're simultaneously recording from V4, and so we have stimuli that could occur within a V4 receptive field. And then the larger IT receptive field that contains it.
If you just look at the response and the attended versus unattended condition, the V4 cells are showing a higher, greater response when the animal tends to the stimulus than when it's ignored. The IT stimulus, if it's a preferred stimulus, does the same. I've already shown you examples of this.
And in addition, since we're doing a simultaneous recording, we can also look at the coherent activity between the IT cells and the V4 cells. And that's shown here, the V4 IT coherence, and you can see that here in this gamma frequency range, there's far greater coherence between V4 and IT with attention than without attention.
But this coherent activity has shifted in time, so that these IT cells are firing coherently with the V4 cells but slightly later, a few milliseconds later, which we think accounts for the time delays for spikes to get from V4 to IT cortex.
You can also calculate a statistical measure that some of you may have heard of called Granger causality, which is simply a measure of how well activity in one population predicts activity in another. And if we compute the Granger causality for V4 to IT, you can see that in the skin.
In this gamma frequency range, there's a greater causal influence of V4 to IT with attention than without. Whereas it turns out there's not much of effect on IT back on V4 in either case. So it's a predominantly feed-forward system with the V4 cells firing more with attention and in a more synchronized state with attention, and it's having a larger impact on the IT cells with attention.
Now, when we talk about coherent activity, the question's always raised of whether these correlations we see of coherent activity, with attention, and so on, whether they actually have any kind of causal role in the system. Are they actually functional? Do they do something? Or is it some sort of epiphenomenon? Is this basically a bug or is this a feature of visual processing?
And from our point of view, given that cells have limited temporal integration times, synchronous activity, as far as we could tell, must be having an effect on postsynaptic cells. And then it's a question of understanding how the network works. Just the way we have to understand what the changes in firing rates mean for network function as well. These are all things that are happening within the network. We need to understand them, and they basically come about through the anatomy of the system.
People have talked about this in different ways. Pascal Friess has talked about communication through coherence. Pascal suggested that cells might change their phase relationship of activity with each other, and that could regulate communication from one group to another. We haven't seen evidence of this phase shifting yet, but in principle, this could happen in networks.
Earl Miller has found evidence for phase in coding where different stimuli might be encoded at different phases of the local network oscillations, but ultimately, these correlations that we see between spiking activity and behavior, the timing of activity and behavior, are going to have to be tested by some kind of actual causal manipulation where we go in there and manipulate the activity of cells. And the methods for doing that are now finally becoming available.
Just to give you an example of just how anatomical connectivity-- as a result of anatomical connectivity, oscillatory activity in one group of cells will naturally affect oscillatory activity in another. And again, this is just a metaphor, but I think it's a good one, is just looking at the actions of some coupled oscillators.
So what you're going to see in this video are some metronomes. I've done this myself. You get these metronomes off of Amazon, and you have the metronomes that are all oscillating at a common frequency but random phase when they're sitting on this board. And then you'll see what happens when this person puts the board on a couple of soda cans, so that it's now possible for the oscillations in one metronome to physically influence the other metronomes along the same board. And you can just see what happens.
So here, the metronomes are all going off randomly. There's no synchrony, but now-- What happened? It shouldn't have stopped.
And now, when he puts the board on the cans, now when one metronome's going it can jiggle the board a little bit. And that will start to influence the other metronomes, and you can see that eventually, they will all go into synchrony.
Again, because it's a metaphor for how anatomical connectivity between cells that are oscillating in any way are going to set up synchrony in the network.
So now we've seen that within the ventral stream, there's these changes in firing rates and synchrony with the tension. And the question is, how do they come about?
There's many lines of evidence that the frontal eye plays an important role in spatially-directed attention. This is the type of attention I've been talking about so far. When you attend to one location in the scene and not to others, and we have evidence from our own lab that FEF may induce the changes in firing rates, and V4 cells, and would induce the synchronous activity in V4 cells, as well.
And to give you an example from an experiment where we've now simultaneously recorded it from the V4 cells and from eye field cells that have overlapping receptor fields. If we look at the changes in firing rate of the cells with attention that's shown up here at the top, and what you can see here is the frontal eye field cells are giving a much larger response to a stimulus when it's attended than when it's ignored.
We see the same thing in V4, but now we can compare the latency of these attentional effects in the two areas. And what that's showing is that the frontal eye field effects are happening at about 80 milliseconds, where those in V4 are later, about 120 milliseconds. So the FEF is showing these changes first and is in a position, at least temporarily, to have caused the effects in V4.
Likewise, if we look at coherent interactions between the frontal eye fields in V4, we see-- and again, in this gamma frequency range, there's this enhancement of coherence in this gamma frequency range. So the frontal eye fields are becoming more coherent with those in V4. And if we look at the time shift in those oscillations in the frontal eye fields in V4, they show that across frequencies, there was a constant time shift of about 10 milliseconds or so with the frontal eye fields leading activity in V4 in these coupled oscillations. Again, all consistent with the idea that the front eye fields is causing these effects.
But of course, again, we want to do a causal manipulation to more conclusively test this idea. So what we did in one experiment was we took out, we lesioned the entire lateral prefrontal cortex in one hemisphere of monkeys doing an attention task.
And we both looked at the animal's behavior with this removal of the frontal eye fields in one hemifield. And here we also cut the connections between the two hemispheres, so that they were operating independently. So we can compare the results in the hemifield affected by the lesion versus a control hemifield in the same animals.
And at the same time, we could record the activity of cells in V4 in the lesion versus control hemisphere and look at the effects on the feedback, the attentional feedback, on V4 cells.
Now behaviorally, when you do this lesion of the lateral prefrontal cortex, the ability of animals to attend to a stimulus in one hemifield versus another is impaired in the lesion hemifield. And when you look at the V4 cells, if you plot an index of the effects of attention on responses, and then plot those indexes cumulatively across a population, what you find is that there are larger attentional effects in the control hemifield and the lesion hemifield. If you plot the latency of these attentional effects, you find that these attentional effects occur at a longer latency in the lesion hemifield compared to the control.
So what does that tell us? It tells us that the frontal eye field does appear to have a top-down influence on V4 cells. So if you take the frontal eye filed cells out, the effects of attention in V4 are reduced. However, they are not eliminated. So the frontal eye fields can't be the only source of top-down input on V4.
But the fact that the V4 cells are showing these effects later in time suggests that whatever these other inputs are coming into V4, modulating the responses for attention, they take longer to work. Apparently, the most efficient way to modulate activity back in V4 is through this FEF pathway.
Now, we haven't identified what these other pathways are, these ones that have longer latencies. We've tested for the pulvinar. It turns out it's not the pulvinar. Probably the most likely source would be the parietal cortex, but those experiments have yet to be done.
So now I want to switch gears because we've been talking about spacially-directed attention, and I will now talk about feature attention, attention to features irrespective of location.
Now going back to our classroom scene, if I ask you to find the girl in the pink shirt, presumably, you can all do that. And you can do that even though I didn't tell you where she is. I didn't say attend to the left-hand corner or whatever.
But you must have used your knowledge of pink, and girls, and so on to direct your attention to the girl, and we know that you don't solve these problems by scanning every pixel in the scene. Your search is much more efficient than that. You go usually within one or two items of this. You're right over it and looking at the target object in the scene
And we must have a way of using knowledge of objects and the task instructions for modulating activity in a way that may be analogous to how we do it for spatial attention but is more mysterious. Because you think about attending to locations in space, you have areas like the frontal eye fields that are spatial maps. They have a map of space. You want to attend to the upper-right hand corner? Increase the activity in the upper-right hand corner of the map.
But you think about. How does it work for, say, girls in pink shirts? Where's the map of that? And that is really mysterious to think about how that works, and where do those signals come from? How do they affect cells back in the ventral stream?
So that's been motivating a lot of our experimentation as well, and I'll tell you what we know about that so far. And in this case, we've been doing simultaneous recordings in the frontal eye fields, and in an area of the prefrontal cortex that in front and ventral to the frontal eye fields known as the ventral prearcuate area. As well as the control region in the sulcus principalis which is labeled here as VPS.
All of these areas have projections back into the ventral stream, and I'll be talking more in detail about that later. And at any of these areas could potentially be relaying these feature related signals back to the ventral stream.
So to study this feature-based attention, we used a variant of visual search like finding the girl in the scene. In this case, we had the monkey start off fixating, and then we presented a complex object at fixation. And that was the cue to the animal to find this complex object. In this case, you see here it's a shoe. Again, this like finding the girl in the pink shirt.
And that tells you we'll find the shoe in the scene that's going to appear in a moment. And then there's a delay period, and then after delay, you can see there's an array of objects. And the animal's task is to move its eyes around until it finds the object that matches the target and just hold fixation there. In this case, it would hold fixation on the shoe.
And then we also had some detection trials where just a single object appeared, and the animal makes an eye movement to it. And that we can use to just probe its basic selectivity of the cells.
So how do we extract signals related to feature and spatial attention in this task? For spatial attention, it's really simple. We can look at the response to an object in the receptive field in one case when the animal's making an eye movement to it versus another case where the animal's making an eye movement to someplace else. And we know that the animals attend to where they're going to move their eyes, so we can just look at how much does that spatial attention modulate the cell's response?
For feature attention, we want to separate that from spatial attention, so we look at the response to the receptive field stimulus in those moments of time when the animal's planning an eye movement someplace else. So spatial attention is directed someplace else in the scene. So that's not a factor here.
But we compare the response to that object in the field when it is the thing the monkey's searching for. Let's say there's a shoe in the receptive field, and the monkey's searching for a shoe versus an epic in time when the animal's searching for something different. Say there's a shoe in the receptive field, but the animal is searching for a banana.
So we can see what is the effect of the animal's top-down feature attention, attention to shoes versus bananas, on the response to that shoe independent of where the animal's spatial attention is directed?
Now if you look at just the pure visual properties of the cells in the frontal eye fields of this area, the VPA, the ventral prearcuate area, and compare those. In the graph below, we've also compared them to cells recorded in the anterior IT cortex and this ventral sulcus principalis areas as well. What you can see is just looking at receptive field size that cells in the VPA have receptive fields similar in size to those of the frontal eye fields. So they do have some spatial information in VPA.
But there's a mix of receptive fields in VPA compared to FEF. Some of them have receptive fields very similar to FEF, and some of them include the fovea and are quite large, more IT like. And we'll get to that.
If we look at the selectivity of cells in these different areas for different kinds of objects, the frontal eye field cells, consistent with what other people have reported, don't have selectivity for objects. They just care about that there's a thing in the receptive field, and that's all they care about.
Whereas this VPA area, like IT cortex, does show object selectivity in addition to the spatial selectivity. So if we just plot the response to the best versus worst objects, and it's normalized, you can see that the VPA selectivity is not quite as good as the IT cortex but is really not bad. And you have moderate selectivity in this sulcus principalis area.
So here we have this VPA area that has spatial receptor fields plus some non-spatial mix of properties and object selectivity. So what do they do in this search /
What we're showing here, this is the population activity averaged across the population in the interior IT cortex VPA and the sulcus principalis area, and we're looking at activity to the cue. And the red line shows the response of the preferred cue for the cell segments, whether it's a shoe, or a banana, or whatever. And so you see that in interior IT cortex, there's strong selectivity. In VPA, again, this is averaged across the whole population of cells, it's pretty good but not quite as good as IT cortex. In the EPS, it turns out, there wasn't this cue selectivity.
And then if we synchronize at the time of the search array-- so the search array is appearing at time zero, and now we're looking backwards in time from the search array. One interesting thing you can see in the VPA population is that there's higher activity on the trials where the animal is going to be searching for the preferred stimulus for the cell.
Again, this is like the cells have become more sensitized let's say, when there's a cell that normally responds, say, to shoes. And the animal is going to be looking for shoes. These cells are already showing higher levels of activity.
And then, even more interesting, if you look at what happens after this [? soccad ?] occurs, you see it retains the same selectivity. So now the animal's moving a size into the array. It's going to be looking at any of the different stimuli, but the activity is higher in those trials where the animal is searching for that cell's preferred stimulus. And that's not found in any of the other areas the way it is in VPA.
So there seems to be a signal probably related to working memory attention for the cue features, and it persists throughout the trial while the animal's looking for the cell's preferred feature.
Now let's look at the response to the stimuli in the receptive field depending on whether they are the preferred or the non-preferred feature for the cell.
So this is a little bit complicated, so I'm showing you the data for the frontal eye fields and for VPA. Here, the difference in response between the green line and the red line is the modulation of the response in the frontal eye fields depending on spatial attention, whether the animal's attending in the receptive field or not. So you can see there is a signal related to spatial attention in the frontal eye fields.
And the difference in response between the red line and the purple line is the signal related to feature attention. There is a feature retention signal in the frontal eye field, so in other words, the frontal eye field cells are giving a larger response to a stimulus in the receptive field irrespective of eye movements. But a larger response if it is the stimulus the animal is looking for. This has been shown in other work as well, and some people have related it to salience. So in other words, that stimulus is more salient, and the frontal eye field cells, their activity may be modulated by the degree of salience in the receptive field. .
VPA, you see the same thing. You see there's a signal. If you look at the difference between the green line and the red line, this is the spatial attention signal in VPA. It's there. And the difference between the red and purple line is this feature retention signal. So both areas are showing both signals.
So which area is dominant for one versus another? What's the cause of these changes in the responses? One thing that we can look at is the latency of these attentional effects in both areas and what we find is that for spatially-directed attention, the frontal eye fields and the VPA are showing these changes roughly the same time. But for feature-directed attention, the VPA is showing this modulation response much earlier than the frontal eye fields. So it's the 90 milliseconds compared to the frontal eye fields at 100 milliseconds.
So the VPA is showing the modulation for feature attention first, and the frontal eye fields of anything is a little bit before the VPA for the spatially-directed attention. You can see this in the cumulative plots of latencies for these different signals. So for spatially-directed attention, the cumulative plot of latency shows that the population latency has shifted earlier in FEF compared to VPA, whereas for feature attention, the population cumulative latencies are shifted earlier for VPA compared to FEF.
So that's at least consistent with the idea that VPA might be generating these feature attention-related signals, and the FEF might be generating the spatially-directed signals.
But we needed an active manipulation to test these ideas. So in this case, rather than doing a lesion, we did a Muscimol injection, GABA agonist, which will suppress the activity in VPA. And we can measure the response on both behavior and on the responses of cells.
And so what we're doing is we're deactivating VPA in one hemisphere. The other hemisphere serves as the control, and here what we're plotting is in this task, the search task, the animal's percent errors after injections in the animal's performance in the ipsilateral versus contralateral versus midline of the visual field when we've done this deactivation in one hemisphere.
In the unaffected hemifield, there's really no difference in performance pre and post Muscimol injection, but contralateral or contraleisonal to the Muscimol injection, you can see there are many more errors when we've deactivated VPA. And it extends to the midline as well, but in the ipsilateral visual field, there's really no effect. So at least behaviorally, the animal's impaired in doing visual search for object features without VPA.
Now what about the effect on the responses in the frontal eye fields after we've done the VPA injection? To test that, we recorded from the frontal eye field cells before the Muscimol injection and then after them while the area's deactivated.
And before the Muscimol injection, we can see these signals related to spatial attention and feature attention in the population like we saw earlier. But after the Muscimol injection, the modulation according to feature is gone. And the only modulation we see is a modulation related to spatial attention. So the cells are still responding more when the animal's going to make an eye movement into the receptive field, it's just they don't care anymore whether the stimulus in the receptive the field is the one the animal's searching for.
So this is the evidence that VPA was the actual source of that signal that then affected the frontal eye fields, and the frontal eye fields then lose that information about, which is the most salient stimulus according to feature without VPA.
So to put this in a cartoon form as is shown here on this next slide. So imagine we have the search array where the animal's looking for a particular stimulus, so this is just a visual representation. And then if we plot what could be a cartoon representation of activity in the map of space in FEF, where now the intensity of the star here reflects the magnitude of the cell's response, we can see that FEF would have a map, basically, of salience as other people have proposed.
The more salient the stimulus, the higher the activity. So if the animal's looking for this particular red shape, the most salient activity in the map would be at the location of that stimulus in the array. And then that leads to the animal making the eye movement there.
Back in V4, we have maps of different features in the area, and before there's any attention effects, the activity in the map simply reflects whether that stimulus has that feature or not. So they're just feature maps. And then as time progresses, where the FEF map develops this salience representation, eventually we see that later in time, the V4 cells are showing this modulation of their activity according to whether the salient stimulus is in their receptive field.
So these are the feature attention effects we know that take place in V4, and we think are coming about as a result of these computations in the prefrontal cortex. The question is how are these signals getting here? One possibility is they're just coming directly from the frontal eye fields, so salient locations are the ones with the correct features. And they could be modulating activity in the V4 map. That's one possibility.
Or another possibility is in VPA, which has the representation of the target, what the animal is looking for, it has the representation of the location of that stimulus in the array. And it's feeding this back to FEF. It could be that everything just goes through FEF, or it could be that VPA feeds back directly to these areas like V4.
And so what's the root? The first question is do we actually know that VPA is essential for these feature attention effects in V4? That's the first place to start testing. Is that the source of feature information? That biases is V4 cells with feature-directed attention?
So for that first order question, we recorded from V4 just like we did in the FEF. Here we were now recording from V4 cells before and after we've deactivated VPA with Muscimol And we're plotting the response to the target stimulus in the receptive field when the animal's looking for compared to that same stimulus when it's not the target that the animal's looking for.
This is the population activity in V4. The red line shows when the stimulus in the receptive field is the target, and the blue line shows when that same stimulus is a distractor. This is before Muscimol, so there is this feature attention effect there. This is independent of spatial attention.
But then after the VPA deactivation, you can see that in the population, this feature retention effect goes away. So VPA is the source. It seems to be the source. Whether it's direct or indirect is necessary for the effects of feature attention in V4. This is just the population distribution-- this is the percent change in response with feature attention before and after the Muscimol injection, and you can see that before, there's a shift. And afterwards, this effect has gone away.
Whereas if you look for the effects of spatially-directed attention in V4, now we're looking at the difference between the green line and the blue line. You can see that with spatially-directed attention, there's enhanced activity in V4, which I've shown you before. And after VPA deactivation, this remains. So it seems to be largely unaffected. There's no significant effect. So VPA is specifically, the source for feature attention in V4.
So again, going back to our little cartoon model, we know that VPA is necessary for these effects back in V4. But it still doesn't tell us whether these effects are occurring because of the interaction of VPA through the FEF? Or is it possibly direct projections of VPA to areas like V4? And these are just parallel pathways for affecting V4. One pathway from FEF affecting spatial attention, another pathway through VPA affecting feature attention.
The problem is that we don't know whether VPA actually has these direct projections to V4. If you look at the anatomical studies, it's very hard to determine from looking at individual animals and anatomical studies whether this area that's very close to FEF actually does have inputs back to V4 or not. It's possible that the VPA projections, if you look at the anatomy, are only through IT cortex, which suggests that maybe the effects of VPA are on IT but V4. You just can't tell from the anatomical studies.
So we felt that we really needed to get a better understanding of how the prefrontal cortex, how those projections map onto the posterior cortex. And we needed to do it in a way in which we could look at the projections of all different parts of the prefrontal cortex in the same animal, and in this case, we wanted to look at functional connections. Not just whether this is a direct anatomical one, but do changes in activity and the prefrontal cortex cause a change in activity back in the posterior cortex?
So to do that, we used a method known as stimulation fMRI where we did fMRI scanning in these animals while we electrically stimulated different spots in the prefrontal cortex. I want to show you the results from our first animal. We've now replicated these results in a second animal.
We had a grid implanted over the prefrontal cortex, and you can see that here. And we had a large chamber with a grid, and we could put an electrode into any of these different holes and stimulate the cortex. And we could stimulate in different cells and so on.
And then use the bold signal in the FRI to show where do we see at least a bold signal change in these areas? And this is showing you some structural images with the overlayed bold signal change. In this image, you see this is the actual site of the stimulation electrode. The electrode's coming in here. You can see this change in activity around the stimulation electrode, and you see some comparable activity in the contralateral hemisphere as the result of inner hemispheric connections between the two hemispheres. And that gave us a good idea of where the electorate is.
We're not going to go through the connections point by point. I'm just going to show you some summaries of what the connectional data are showing us, and what we've done here is we've-- hers you see the prefrontal cortex. Here's the sulcus principalis the arcuate sulcus, which has the FEF in it. Here's VPA just outside FEF, and we're looking at now all of the sites that gave the most change in activity back in V4.
And when we do that, it turns out they're all in the frontal eye fields. Here, we're looking at what are the changes caused by these sites which have the most change in V4? You can see that they do extend. There's a lot of activity change in IT cortex, but they extend back into V4. They're also activity changes in the posterior parietal cortex we'll get to in a moment.
So we'd say based on this, the main source of inputs back to V4 are actually from the frontal eye fields, not VPA. If, on the other hand, we ask where are the sites that have the most cause, the most change in activity in IT cortex, those are in VPA. These are the sites that have the biggest effect on IT, and if we plot the activity changes caused by those sites, you can see now the focus is shifted into IT cortex with really virtually nothing back in V4. Again, there's also activity in the parietal cortex from the VPA activation.
So this is all suggesting that VPA is working either through the frontal eye fields or through IT cortex in terms of having its effect on V4. But in the course of doing these experiments, we discovered something else that we hadn't anticipated, although we can go back and look at the anatomy as consistent with this.
But by stimulating all these different points in the same animal, we could do really precise comparisons of the activation changes caused by nearby points, including what happens as you go from point to point to point to point in the prefrontal cortex. How does it change in the posterior parts of the brain?
Before I get to that, this is a slide here showing, this is just comparing the V4 sites showing V4 activation versus IT activation. The blue sites are the ones showing V4 activation. They're here in FEF, and the sites with IT activation are here in the IT cortex.
You can also notice that both these locations in the prefrontal cortex give you activation back in the parietal cortex, although there's a shift from ventral deep to superficial as you go from the sites activating V4 to IT. We'll just get to that in a moment.
Now I get to this question of what happens from the point to point? As you move point to point in prefrontal, where does the activity change? How does it change in posterior cortex?
So what we're looking at the stimulation sites in prefrontal, and now we're looking at the center of mass of the activation back. In this case, it's in the temporal cortex predominantly in the STS. And here, based on a cluster analysis, it's clustered points having similar activation patterns. It's clustered with the same color.
So here, this is deep in FEF. It's the most greenish color here, and you can see they give us the most posterior sites in STS. And as we move forward in prefrontal going through these different colors, it turns out the activation pattern in STS keeps shifting further and further forward.
So this posterior to anterior progression in the prefrontal cortex gives us a posterior to anterior progression through the STS part of IT cortex. So that mapping of prefrontal to STS was a real surprise to us. And it turns out there's a similar mapping in the parietal cortex.
So here again, we're doing based on a cluster analysis as we go from the green areas forward into more anterior parts of prefrontal cortex and also more dorsal parts of the lateral prefrontal cortex. Now we're looking at the interparietal sulcus. This is the parietal gyrus. You can see that you go from posterior to anterior in the interparietal sulcus as well. And as you go dorsally in prefrontal, you also move out onto from deep in the sulcus out onto the cortex here.
So there's some of topographic mapping of the posterior cortex in the prefrontal cortex but with the dorsal spatial system and the ventral object system with those maps overlaid. So the same spot in prefrontal is activating both the spatial system and the object system in a topographic sense.
It causes us to wonder whether the mapping principle in the brain for posterior visual area, as we all know, they map the retina. They're all retina topically organized. But in the prefrontal cortex, there seems to be a map as well. But it's a map of the rest of the brain. So you map the different streams onto the prefrontal cortex, and the difference is it's not strictly topographic. But you fold the ventral and dorsal streams on top of each other.
What is the significance of that map? We have absolutely no idea. But I think this is something that's going to be a very major topic of our future research as to understanding the topographic representation of something in the prefrontal cortex that combined space and objects.
I was going to sum up going back to our original question of how does VPA actually influence areas in the ventral stream? We know that this direct anatomical pathway doesn't exist, so if it were-- so since we know it is essential for the modulation of cells in these areas by feature attention, so it seems like one likely possibility is it works through the frontal eye fields. Another possibility is it works through IT cortex, and those are being tested in current experiments. Hopefully, we'll be able to come back and tell you what the answer is to that is how does the VPA modulate activity in the posterior area?
So I want to, just finally wrapping up, acknowledge all the people who did these experiments. The cluster experiment in IT with the decoding was done with Ying Zan, Ethan Meyers, [INAUDIBLE], show in collaboration with Tommy Poggio.
The V4 and FEF experiments were done with George Gregorio, Steve Clots, Wewe Zho, and Derasi and Leslie Ungerleider. And all the later VPA experiments are all done by [INAUDIBLE]. And this latest experiment, which is still in progress, the FEF simulation experiment was done by Ray Shoe and Arsiabe Show. Thank you very much for your attention.
Associated Research Thrust: