What are the computational functions of feedback to early visual cortex?
Date Posted:
October 23, 2018
Date Recorded:
October 19, 2018
Speaker(s):
Prof. Daniel J. Kersten, University of Minnesota
All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Abstract: The existence of feedforward and feedback neural connections between areas in the primate visual cortical hierarchy is well known. While there is a general consensus for how feedforward connections support the sequential stages of visual processing for tasks such as object recognition, the computational functions of feedback for recognition and other tasks are less well understood. I will discuss several proposals for the functions of feedback including resolving local ambiguity using high-level predictive knowledge, binding information across levels of abstraction in the visual hierarchy, and engaging lower-level "expertise" as the task requires it. Several human behavioral and neuroimaging results will be described that support these proposals: the first showing how the larger spatial context modulates local activity in the visual system, the second how cortical activity in human V1/V2 depends on whether shape information is extracted in the presence or absence of clutter, and the third how tasks requiring the analysis of spatial detail influence responses in foveal cortex.
PRESENTER: Hi, everybody. So welcome to today's Brains, Minds, and Machines seminar. This seminar was organized by the CBMM Student and Postdoc Council. And we're super happy to have the opportunity to have Dan Kersten here for a talk. Dan is a professor of psychology at the University of Minnesota and a member of the graduate programs in neuroscience and in computer science, so a perfect fit for CBMM.
His early studies are in mathematics with his undergraduate here at MIT and a master's at Minnesota. He then turned to visual perception, completing a PhD in Psychology with Gordon Legge at the University of Minnesota and a postdoc with Horace Barlow in the physiology department at Cambridge University.
He also spent time as a visiting scientist at the Max Planck Institute for Biological Cybernetics and here at MIT, as well. And he has lots of really good, old friends here. His work has spanned from Bayesian models of object perception to early neural coding of images.
And the students felt very excited about inviting him here today because his conceptual clarity and laying out formal frameworks for vision, all the while with his keen eye towards our perceptual experience, has really inspired many of us studying perception. Today, he'll be talking about his investigations into the computational functions of feedback to the early visual cortex. And with that, please join me in welcoming Dan Kersten.
[APPLAUSE]
DAN KERSTEN: Thanks very much for that kind introduction and for the invitation to come. How's my volume? Doing OK. Good. At lunch today, one of the students asked how I got interested in vision given that I had a background in mathematics.
And I owe it to MIT's definition of what a humanities breadth requirement was back in the 1970s. Because I could major in math-- or you could major in physics back then-- and you could satisfy your humanities requirement by taking courses from Whitman Richards, Dick Held. I took a course from Jerry Lettvin.
And so neuroscience and brain science somehow qualified as humanities. Without that, I probably wouldn't have been inspired to go into vision later. So I appreciate that.
I also owe a big debt to MIT over the years because of the inspiration I've gotten from the many faculty and the students and the students of students over the years. I've probably followed their research more than any other university or college where I have colleagues.
So let me just start off with the basics to orient everybody. I think everybody's familiar-- at least, most people are familiar-- with the pattern of feed flow of information at a coarse scale throughout the visual system. This is mapped onto the human brain. And later on, I'll be mainly talking about what we believe are effects in V2 and V1, early visual cortical areas.
And people are probably also familiar with this. This is a version of a almost compulsory visual hierarchical diagram that was used in so many talks over the years from Felleman and Van Essen. This was a version due to Wallisch and Movshon, which tries to illustrate the relative contributions, or the relative proportions, of neurons in different visual areas in the visual hierarchy in terms of the area of these rectangles here.
And then, the thickness of these diagrams is an estimate of the amount of connections you have between areas. And I think it is a useful summary diagram.
So a lot is known about the computational functions of feedforward. There's certainly a lot of excitement over the last six, seven years in terms of how deep convolutional neural networks seem to capture some of the aspects of object recognition and its organization in dealing with the natural image input. But it's also been known for a long time that we have lateral interactions that are involved in visual computations within areas.
And in the topic for today, it's also been known from the conductivity and a long history of research that there is a role for feedback. There are bidirectional connections. And that's illustrated in the next diagram. So I'm going to, again-- just to reference those dotted lines there-- I'll be talking about effects that we believe are due in V1 and V2.
Just another point to underscore-- the relationship between feedforward connections and feedbackward connections in both the ascending and descending feedforward and feedback pathways. One of the things-- I'll have a brief opportunity to mention-- that one of the ways in which people are using fMRI for evidence of feedback is to-- it reflects the anatomical structure feedback connections, which tend to project backwards to superior, superficial, or deep layers of cortex.
And so even with human fMRI, you can get some kind of constraint if you have high enough resolution to begin to separate out layers. And I won't talk a lot about that, but I have one little graph to show.
OK. So the key questions of why feedback signals to low-level cortical areas. Today, I'm not going to give you any dramatically new ideas. What I hope to do is show you some experimental evidence from our own labs that is converging on several ideas. There's certainly a long history of results and theory relating attention, visual imagery, and working memory-- more recently-- effects early-on in the visual system, early-on in V1, for example, and V2.
Probably from a computational perspective, in terms of fleshing out potential reasons that are testable and of interest, are learning this idea of building or tuning networks hierarchies for frequently needed tasks, and then, also, sorting through which unresolved low-level features may belong to higher-level hypotheses, if you will. whenever you have complex, ambiguous images locally, how do you resolve these ambiguities if you don't have enough information in the bottom-up flow?
And in that context, I'd like to focus on three sets of experiments that address this question of whether we can understand feedback signals in terms of ideas that have been around for a while. One is predictive error coding. Another one is binding across levels of abstraction. And finally, this idea that's probably less well known, and that is the idea of consulting low-level cortical expertise.
And all of these are somewhat vague ideas. And hopefully, as the field progresses, we'll have a better idea of how to test these ideas and distinguish them. And I'd like to also focus on processes that are probably automatic, at least in part, rather than feedback effects or top-down influences that may have more conscious direct control.
So just a little bit of history. So my own interest in the nature of feedback came from early work on Bayesian models. Some of that was inspired by work on regularization theory here at MIT in Poggio's lab. And it's this idea that, given an image or given some data, if you want to make good bets about the underlying causes or explanations S, scene properties, S, you could use Bayes' formula to break up the constraints into a likelihood and a prior term.
And although these terms don't necessarily have to be embedded in mechanisms, you could imagine them, or at least the dynamic processes that could be part of a feedforward structure. This whole idea of being able to test a prediction given enough hypothesis S against the image data coming in as a measure of error was intriguing to me.
And so a lot of the early inspiration came from this idea in the 1980s that you could think about vision as solving an inverse optics problem where you had some external world property S produces an image. And you have the brain's representation. And if you had some kind of generative model, you could also run the inverse of your forward model back to test and see whether your image representations are capturing a correct interpretation.
Probably for fun, I thought-- and maybe that's at my time of life where I'm thinking about-- we worry about people forgetting the contributions of earlier generations. And a generation before me was Donald MacKay. Some of you may be familiar with his son, David MacKay, who passed away a few years ago.
And this is a paper from 1956 in which he wrote, "It has been suggested that the correlated perception, as distinct from reception, is activity which organizes an outwardly directed internal matching response to signals from receptors. This organizing activity amounts logically to an internal representation of feature in the incoming signals to which it is adaptive, i.e. the feature which is thus perceived."
And I thought that was interesting. It sort of captures this idea of the use of a generative model for testing early-level hypotheses at the input level and even went on to talk about hierarchical structure. It's postulated wherein much of the organizing activity is concerned with modifying the probabilities of activity. So very general statement, but it reflects, I think, a program of research that many of us are doing and are interested in.
So today, I'm going to talk both about psychophysical and neuroimaging approaches to addressing the question of feedback, which is at a very coarse scale, so it's going to necessarily involve some basic assumptions which may or may not be true. But you have to start somewhere.
And first of all is the idea is the experiments they're going to talk about will take advantage of the hierarchical feedforward structure of visual cortex. And so that's the idea that you've got the input. We've got neurons with narrow receptive fields that effectively have tunnel vision. They feed forward, gradually accumulating information.
And eventually, we have information that's sort of summarized at the high level that captures a wide range of scale by this gradual sequential processing stages. But the idea is if we have measurements, or ways of inferring activity early on that's distant from where the local direct stimulation's coming from, then we have a way of measuring effective context.
And so the idea is to present a stimulus over here, and then we look at contextual modulations. The second way, which I alluded to a little while ago, is that one can also look for context-dependent modulations of activity in superficial or deep layers of cortex.
Some caveats that are worth keeping in mind-- I've mentioned already before that we know we've got lateral connections. And so one has to argue, or at least consider the possibility, that these contextual interactions that you might measure at an early level are due to propagations within an area or within a level of abstraction.
And then, also, I often think of as the elephant in the room-- I gave a talk once some years ago and Ray Guillery, the second author, accused me of being corticocentric because I was only talking about effects in V1 that might come from the cortex itself rather than from the thalamus. But there's a certain rich body of evidence for thalamic-cortical interaction. So these are caveats. And the hope is that our interpretations will last despite these caveats.
So I would like to start a long time ago with work that was first done with Scott Murray and then later on with Fang Fang. And it got me interested experimentally in the question of predictive coding kinds of explanations. And so this was an experiment that was done in which we took a look at this, called translating diamond stimulus.
And if you look at the one on the far left there-- we won't take time for you to experience it-- but one of the interesting properties is that it seems to be bistable. And you can either see a whole shape or it breaks up into local, individual shapes. And when we looked at fMRI activity in early visual system, we found that the response when the person perceived a whole diamond activity in V1 was suppressed.
It was as if there was this high-level explanation that was predicting downward, subtracting off the prediction, and you had, effectively, an error signal that was low. But if the diamond-- we didn't have a good perceptual explanation-- this was the intuition-- then activity in V1 would go up.
Now, one of the problems with this study that bugged us for a number of years was it wasn't clear whether the suppression of neural activity was localized in space and feature type. So for example, we did some measurements where we looked at areas that were really far from these red lines, far from the areas that were being directly stimulated retinotopically.
And we can still see some evidence of suppression. We went up and down. Our fMRI wasn't that good at localization. And so we were concerned about whether our interpretation was location- and feature-specific.
And the second question is whether this enhancement or suppression of activity that you see early on, how it may depend upon stimulus and task. Like if you have clutter or more information, more background, confusing elements, to what extent is your contextual-dependent modulation dependent on those contextual features?
So we did an experiment. This was an experiment done with Dongjun He and Fang Fang. By the way, Fang Fang there, at that time, he was a graduate student in my lab, and he's now executive associate director of the McGovern, its sister institute at Peking University and a dean of psychological sciences at Peking University. He's on a sled in Minnesota in that picture, so it's a long time ago.
Well, we thought we'd do a psychophysical experiment to address this question of whether suppression was localized to early cortical regions corresponding to feature-selective neurons directly stimulated. And so the idea was to take advantage of the known and well-established degree of spatial orientation selectivity in low-level cortical areas, particular V1 and V2, and the sensitivity to whole forms in higher-level cortical areas.
But rather than doing fMRI, we thought, maybe we can address this using what Bela Yulesz, known for random dot stereograms, called the psychophysicist's electrode, and that's to use psychophysical adaptation. And so we had two stimulated conditions, or two adaptation kind of conditions to draw on. And the first one is the familiar and old observation of the tilt aftereffect.
And so the idea was that if-- looking here, we have a vertical grading. It looks vertical to everybody. But then, if you were to stare at this tilted grading for a while-- a minute or two-- and then you go back and test it, look at the vertical grading again, it looks like it's tilted the other direction. And this is known to be fairly specific in terms of orientation and location.
In terms of whole shapes, another kind of adaptation effect is an adaptation to the aspect ratio-- in this case, of a diamond. So here we have a diamond shape. If you then look at what we call normal, and then if you adapt to a skinny one-- so you stare at the skinny one for a while-- and then go test on the original one, then the original one now looks fatter, OK?
So the thought was we can somehow combine these two effects into one stimulus manipulation and try to tease apart whether the adaptation effects on one condition-- for example, a whole percept versus local percept-- yield different adaptation effects. So here's how we combined them.
Well, I've always been fond of this moving diamond stimulus. And so we created a stimulus that looks like this, and it was just gently rotated. The little circles there indicate how all pixels in this image are just rotated. The idea was to prevent afterimages to those black and white little grading patterns there. So they get smeared over a small region of retinotopic space.
And then introduce two conditions. One condition is which we have these three horizontal uniform field blank bars there. And the idea was to cover up the vertices, and then, if you cover up the vertices, the motion that a person sees is of a diamond moving behind two bars. So you integrate all four of those grading patches into one whole, OK?
But if you just extend, or just drop, these lines a little bit below the edges of these gradings, then these grading patterns look like they're floating in front, and they don't belong to a whole shape anymore. They have four independent bits, or less dependent bits. And so we've got a diamond percept, and then we've got four oriented patches percept that are not grouped together into one coherent object.
And so just jumping ahead in the results, we could measure both a tilt aftereffect here, and we could also measure a shape aftereffect under these two conditions, whether a person perceived the stimulus in terms of a diamond or non-diamond. And we got this nice interaction effect here in which, under the non-diamond percept, we get a big tilt aftereffect and a reduced one when we're seeing the diamond.
And conversely, if you look at the shape aftereffect, we have a smaller shape aftereffect for the non-diamond percept and a big one for the diamond. And so our interpretation was that this perceptual grouping when you experience the diamond reduces the adaptation to local tilt, which we presume is affecting early-level neurons and, preferentially, V1 and V2.
And there are a number of other manipulations we did. You can even get adaptation across the eyes, suggesting that we've got both an adaptation to monocular neurons and binocular neurons in this stimulus. So we took heart. This was consistent with the idea of local low-level feature-specific suppression, akin to that idea of predictive error coding.
The next study that I'd like to tell you about addresses this question of-- just a little bit of background. There's a fairly large literature that has looked for these context-dependent effects on either an increase in activity measured through fMRI or a decrease measured through fMRI as a function of how elements get grouped. And so there are some conditions under which you're trying to group elements such as in the stimulus I have here, where we've got these little Gabor elements that are in sort of circular alignment. You get an increase in activity.
There are other experiments that also show an increase of activity when you have groups of elements that are in clutter. And there are also studies, for example, that show conditions under which you get suppression, like in the experiments that we have. And it's been a puzzle as to can you predict when you might expect an increase in activity or a decrease in activity in early cortical areas as a function of different stimulus conditions?
And so one of the ideas that we thought we'd address-- and this was done with a graduate student, Cheng Qiu, who's now working with Alan Stocker at University of Pennsylvania, and Professor Cheryl Olman, who's in my department at Minnesota-- where we had this idea that the enhancement that's sometimes seen at early levels [AUDIO OUT] when you're perceptually organizing something may have to do with how much clutter, how much separation is required by the visual system to pull out the form in the presence of distractor elements. And so in this experiment, we had the four conditions shown here.
We have what we called the aligned condition, where all these little Gabor elements, their orientation is cotangent to some invisible circle. We have odd-aligned, where we just randomized their orientation. And then, we can also have conditions where we embed those aligned circles in a clutter here and the unaligned in the clutter. And then, we had the subject do a task, a fixation task, and asked them whether these Gabor functions are aligned or not.
So here are the results. I should mention, too, that we ran localizers so we could analyze the data coming from voxels that were directly stimulated within that red annulus, as well as ones on the outside. And so we could pull apart the target regions from the background regions. And the main effects we got were in the target regions in V1 and V2.
And so the main story here-- so this is the bold response difference in-- if we [AUDIO OUT] aligned elements compared with activity in unaligned elements in the background, that's when we got the boost in activity. So unlike the suppression results that I showed when we had that diamond, we've seen an enhancement of activity in the presence of these background elements here. See that both for [AUDIO OUT] and the V2 voxels.
We had some evidence for hinting at suppression of activity in this no background case. That corresponds to the top two conditions there. But it's not a strong effect. The big effect for us was this effective background that seemed to amplify the activity associated with grouping these elements, these Gabor elements in the circle. So summary and background clutter, V1, V2 activity in that target region increased for aligned versus unaligned features.
Cheng Qiu went on to also do an analysis of the correlations between the time series in V1 and seed regions in V2. And this shows some of the results. So we did several different analyses. This is a psychophysiological measure which compares the time series in the space of the neural activity where you deconvolve your fMRI time series. And you can compare that-- cross-correlate that-- with the four conditions that we have here.
And the basic result there is summarized in this diagram, where the strongest correlations we had corresponded to this condition where we compared the aligned with the unaligned in [AUDIO OUT]. So in a nutshell, the idea is we have a increased correlation between V2 and V1 voxels under those conditions when a person is organizing the aligned elements in the presence of this background, as compared to the unaligned ones. So we interpret that as increased coordination, if you will.
The [AUDIO OUT] metrics. So if you use a seed region in V1 and analyze V2, you get the same kind of story. So it doesn't tell us about the causal relationship in terms of timing. None of this really does.
But it does suggest this idea that there's an increased role for feedback between areas when you're trying to organize ambiguous elements-- potentially ambiguous elements in clutter-- that activity goes up. So there's the summary. Some measures of functional connectivity between V1 and V2 increased when perceiving aligned versus unaligned contours in background.
Here's another study that was done with Damien Mannion and also Cheryl Olman that's also consistent with this general idea that if you have a potentially complex scene that requires integration of locally ambiguous bits that you might see increased activity when those things belong together, sort of in the opposite direction of predictive error coding. So in this stimulus, the idea was to-- I've got to keep pointing it over here-- was to select a number of apertures that would hide most of the scene. So it's like you're looking at this natural image scene through a set of holes.
And if you adjust things right, and with little bit of practice, and [AUDIO OUT], you can see through the holes. So you've got the holes there, and you can proceed to organize that whole scene behind the holes. And the experience is just like you're looking through a hole. So that's where you're well-organizing these patches that you see through the holes.
But then you can take another scene and pick out patches that don't belong with the first one and insert them or not insert them. In other words, you can leave the original patches, or you can insert the incoherent or incompatible patches in these apertures in the scene. And so this is an example here where there are some patches that just don't fit with the rest of the patches. They don't belong to that scene because they came from somewhere else.
And so now the question was, if one localizes voxels-- retinotopic voxels-- at these locations, do we see an increase in activity or decrease in activity [AUDIO OUT] those patches that belong or don't belong to that whole scene? And so the answer here is rather simple, and that is that we found that if the patches were coherent-- in other words, actually belonged to the scene-- then the localized activity there did go up relative to the other patches.
So this is also consistent with that idea of boosting activity if it belongs together. You've got the whole scene interpretation, and you've got activity early on in cortex that seems to be boosted in some sense, given the consistency.
And this is a paper published in 2015. And this is the only layer analysis that I'm going to talk about and the only one that we've done. But I thought I'd just point it out because at least it suggests that despite, I think, a lot of caveats that have to do with this kind of [AUDIO OUT]. This was done at a higher resolution than the other studies.
And what we have here on the x-axis is the distance from the white matter. And we have measured the coherent minus the incoherent response as a function of layer and reduced scene to see increased activity [AUDIO OUT] closer to the superficial layers of cortex, consistent with this idea that we might be picking up a signal that's being fed back to superficial layers. Anyway, I'm always a little hesitant about interpreting this, just because I know of all the problems that we've had in actually being confident about later specificity and its relationship to neural activity versus vascular.
OK. So the summary so far is that from the adaptation experiments, we have evidence of suppression of local orientation-sensitive activity in early visual cortex and as a consequence of higher-level perception organization, as a consequence of seeing that whole diamond shape. And so just form adaptation, we have results consistent with these predictive error ideas but also have evidence for enhancement under conditions of detecting future alignment in clutter consistent with this idea of binding across levels of obstruction, binding across levels of cortical hierarchy and, certainly, V1 and V2.
And this figure sort of summarizes the general idea. If you don't have a lot of clutter, then a high-level model such as a circle can do a good job at fitting elements that by themselves aren't naturally grouped.
But if you take that same stimulus there, and [AUDIO OUT] a bunch of other elements, then you lose that coherence. But you can get it back again if you happen to have stimulus conditions where you can locally group based on similarity and proximity. And then you can solve this problem of figure background segregation.
OK. So firstly, I've showed you evidence relating to the idea of predictive error coding, which is one way of sorting through potential features that do belong or don't belong with a high-level perceptual interpretation. We looked at this idea that looked for evidence of binding across areas where a high-level model links together low-level information at a different level of abstraction.
And so the third idea for what feedback may be doing, what its function may be, I'm going to call consulting lower-level expertise. I know it's rather vague. But part of the problem is we don't have, I think, really good ideas yet of how to interpret a small [AUDIO OUT] set of results that come from a number of labs and a particular result I'm going to talk about here.
But the general idea is that foveal V1 is an expert in processing fine spatial detail and orientation information. And there was a lot of real estate there I showed you in that wiring diagram from Wallisch and Movshon, both in V1 and V2.
So the fovea has a lot of real estate. And the foveal area on V1 itself has a lot of real estate, of course devoted to spatial processing. And so to look at this general idea that maybe critical areas have expertise in ways that we don't normally think of, it seems to me we can address this using this general idea and explore it in the context of V1.
And I'll talk a little bit more about the interpretation later, but this idea of expertise at a lower level-- again, it's not a new idea. Ahissar and Hochstein and Lee and Mumford had similar ideas with the idea that V1 might be an expert at a spatial buffer. It's an expert at dealing with fine detail. So the question is, are foveal cortical neurons consulted for the fine-grain spatial analysis of details?
And here's another domain in which I was inspired by work here at MIT when Mark Williams was in Nancy Kanwisher's lab. So a little bit of background here. So in 2008, Mark Williams, based on research here, published this paper in which they did a multivariate pattern analysis of voxels that corresponded to a foveal region that wasn't directly stimulated while subjects made a decision about whether this object up here, a spiky-- you have spikies, smoothies, and cubies-- whether this spiky object here was the same at a detailed level as the [AUDIO OUT] in the lower left-- so the same category, but they differ slightly because their little spikes might not be arranged quite well.
And the interesting finding was that the voxels in these non-stimulated areas could be used to decode the category of the object that the person had been exposed to. In other words, [AUDIO OUT], just looking here, can you read out from foveal-stimulated voxels whether it was spiky, smoothie, or cuby?
And when I was talking to Nancy earlier today and talking about when this first came out, we thought, there has to be some kind of behavioral correlate of this. Does it actually affect behavior? And together with Damien Mannion, who I mentioned earlier, we tried a number of things. They didn't work.
And so I gave up temporarily. So I'm going to revisit a method that does work shortly. And apparently, that was inspired by subsequent work done by Mark Williams, which showed that there was a behavioral effect.
And they could get a behavioral effect by using a transcranial magnetic pulse stimulated to foveal critical regions in the back of their subjects' brains at the right time. If you did it at the right time, you could disrupt their ability to do this task. And that [AUDIO OUT] was on the order of 350 to 40 milliseconds after stimulus onset. So keep that number in mind.
I'd rather not use TMS if I don't have to. I'd rather not use fMRI if I don't have to. And so we tried some psychophysical experiments. And along with two other groups, we discovered that one didn't need to use a TMS pulse to get interference or have a disruptive effect on your ability to make fine-scale discriminations of these patterns.
And the idea is to present the stimuli and then flash a noise pattern up right where you'd expect to activate those foveally-sensitive voxels from early all the way into V1 and V2. And the question was, if you insert that mask-- you really don't call it a [AUDIO OUT]-- it's not really overlapping the stimulus. If you interfere in this stimulus, this noise pattern, at the right time, can you disrupt performance?
And so I mentioned there were a couple groups, including one that worked later with Mark Williams-- Walden et al-- who showed that you could get these interference effects in fovea with just a noise stimulus. Well, I had sort of complained to my colleague, Sheng He, that I had struggled along with Damien to get some kind of behavioral effect ourself. This was before we started using this noise technique.
And so my main contribution to the rest of this talk here, actually, was to tell Sheng I'd got no effects, and I couldn't get a big effect. And Sheng went and worked with a student at the Chinese Academy of Sciences, and they came up with a paradigm that has produced a number of interesting results, which I'd like to tell you about now.
So the previous studies hadn't addressed three, I think, key interesting questions that we thought we would use this noise technique to address. And the first one, is foveal processing only engaged for [AUDIO OUT] fine spatial detail? Because that's the basic test that had been done so far.
A second question is, if there is this disruptive effect, can we get an estimate of its temporal window? How long does it last? And then, thirdly, is the feedback deployed automatically when the task requires it? And if I have time, there are a few other interesting questions I may bring up, too. But let's focus on these now.
So here's the basic paradigm. And so we've got, in terms of timing, we've got the fixation work comes on. Stimulus comes on with these two-- in this case-- two spiky objects. Remember the task of the observer is to say whether they're same or different images-- not same or different categories-- same or different images at the image level.
And then, that's followed by one of five different delays in terms of when the mask gets flashed on. The mask is very brief. It's only 83 milliseconds. And then, the observer makes a response.
So first of all, we [AUDIO OUT] the basic observation of a disruption re-replicated. And here-- just take you through this diagram here. We've got measure of d prime-- how well people are making this decision of same or different-- as a function of these different noise onset times there. And the dashed line up here is the control condition to see what they did-- measure their performance-- when you didn't have this interference flash.
And so at 50 milliseconds, we get a big suppression. And in a way, that's not so surprising. It's consistent with the tension [AUDIO OUT]. The stimulus is actually still on when that flash comes up. And so we'd expect some kind of interference. You'd generally get fairly global effects of a flash based on distracted attention.
But then, within 150 milliseconds, performance goes up. Interestingly, we get this drop at 250 milliseconds, OK? So there's a couple things we can infer from this. One is that it looks like if we did some modeling, assuming this was a tensional effect and this depression here was an effect of the feedback to foveal cortex, and we get a fairly-- I'll show you some curves later-- but you get a fairly narrow time window of effectiveness that's on the order of about 100, 130 milliseconds where we can get this interference effect.
The second thing, remember I mentioned earlier the [AUDIO OUT] results from Mark Williams' lab about you could get interference on the order of 350 milliseconds or so? Why is that? Or it's 250.
Well, it's actually not inconsistent. Because with a TMS pulse, you have to give the system a longer time because the TMS pulse, you've [AUDIO OUT] have to take the time for the center noise mask to get through. So it's sort of in the ballpark-- 100 millisecond difference or more. It might be a little bit long, but at least it's not drastically inconsistent.
So another time window of about 130 milliseconds. Is fine detail required to get the effect? So in order to do that, we low-pass filtered these objects. I should say Xiaoxu Fan did it. I didn't actually low-pass filter them.
And now, we get different pattern results where performance is initially suppressed again, presumably due to this global attention distraction process. And then, performance gradually goes up. But we don't see any dip in this curve, suggesting that the effect we're getting has to do with the task of analyzing fine spatial detail-- again, consistent with this general intuitive idea that we're somehow tapping into the expertise in the fovea, of its ability to deal with fine detail, even though there's no stimulus there.
And the question is, what's going on? So no corresponding drop for low-pass filtered images. And I won't show you the data, but a number of other conditions were tried, including speed discrimination task, equally difficult-- dots moving in one direction in the upper right, other direction, lower left. And you get a pattern similar to this prior condition here.
AUDIENCE: Have you tried a condition where you have an [INAUDIBLE] of noise rather than a just center noise?
DAN KERSTEN: Yes. Great question. Yeah. You don't see the effects if you put the noise on the outside. Yep. We've got a graph on that, but-- yes. So the noise has to be in the middle. Yeah?
AUDIENCE: Is the [INAUDIBLE] frequency [INAUDIBLE] relevant [INAUDIBLE]?
DAN KERSTEN: Didn't directly test that. The presumption is it does. But yeah. That would be another condition to directly check. There's an interesting point to make here, and that is that in order to presumably recruit, or at least anticipate the need for, foveal processing, how does the visual system know it needs it if stimulus is already sort of blurry in the periphery?
And one possible answer for that is that there's sufficient high-spatial frequency information for the system to know it's high-spatial frequency, perhaps related to crowding. And so it recognizes high-spatial frequency. There's a job for the fovea to do. And so let's prepare the fovea for that kind of analysis where you don't have the problems of clutter, at least not to such a large extent.
OK. Is feedback deployed automatically, or only when the task requires it? And in full disclosure, when Sheng told me he was going to do this experiment, I thought, wow. [AUDIO OUT] Cool. I'm jealous. I wish I had thought of it. I hope you find it the same, as well.
The idea is to take advantage of a well-known result going way back to Schepard and Metzler, and there's those mental rotation results. And that was the idea. If you have to decide whether two objects are the same or different, the amount of time to do it with a criterion level of accuracy depends upon the angular separation between the two.
It's suggestive of some kind of analog process in the brain where you're mentally rotating one object until it's aligned with the other, and then you make your decision. And if they start off with a big angle, it takes you longer to do the rotation. If it's a small angular difference between the two, you can do it faster.
So the idea was to incorporate that into this experiment with the interference patches and to have conditions in which we, [AUDIO OUT], as you saw before, where this patch there has to be compared with that patch down there, and they both have the same orientation, and then also have a condition where the alignment is different but only by-- I'll show you how many degrees in a minute, I forgot-- and then one that requires even more mental rotation, if you will-- a bigger difference between the two angles that separate those two.
And so this is the same test that subjects had before, except they were told to fix it and take the time to mentally rotate until they had them in sort of mental correspondence with each other, and then make the decision. But as before, we try to disrupt the performance by inserting this noise patch in the fovea, away from where the stimuli actually are, and look at whether we have a decrement in performance and, if so, at what times.
So here are the results-- the same kind of plot we saw before. We have d prime on the vertical axis, and we've got our different stimulus onset. A synchrony is plotted on the bottom.
And the top one, we basically replicated the first experiment. And this is where I showed you the curves where we can fit in a tensional factor and an interference factor to get that narrow time window I mentioned earlier. But we get this depression, our suppression interference effect, at 250 milliseconds.
However, if the subjects are looking at the stimulus, we've got a 40 degree difference. So they now have to take a little bit more time to do the mental rotation. Then we see we've delayed [AUDIO OUT] period at which we get this maximum interference.
And if they have to do an even bigger mental rotation, then we can get an effect of the delay. It's almost a half second. So it's a rather long period of time in order for this effect to occur.
OK. So just to summarize the basic results, when mental rotation was required as part of this peripheral object discrimination task, the temporal window shifts the time that foveal noise disrupts the peripheral object discrimination. And so this result, this mental rotation, is consistent with this idea that retinotopic cortex is not automatically engaged at a fixed time following the stimulus presentation but is recruited, in effect, or invoked, when the rest of the brain, if you will, is ready to do that processing.
So I'm not going to take time to talk about in detail-- there's several experiments in this study. But I did want to take time to summarize the results because I think they may be key to understanding what's eventually going on. And in a further experiment, we examine the question of whether this foveal interference depends upon saccade planning.
Now, it's a little more complicated experiment. But the answer is, yes, it seems to. In other words, if you have a task where the subject, part of their task is to fixate away from the target-- you have a spiky object up here, and they have to make a rapid fixation down there-- but if you insert this interference pattern before they make the saccade, we lose this suppression effect. So it's as if saccade planning seems to be part of it.
It's a little bit mysterious. So I don't think that can be the whole story. But it's still consistent with this idea of the visual system knowing about its own expertise, if you will, at a lower level and preparing a system for an analysis of fine detail.
And we also did fMRI experiments in which we looked at whether information not only about the category could be recovered from the non-stimulated voxels, as has been shown by Mark Williams in Nancy's lab, but whether the image property of the overall orientation and whether this cuby or this spiky was oriented horizontally or vertically. And you can recover both kinds of information from those foveal voxels that aren't directly stimulated.
I think I've summarized everything so far. And I wanted to leave you with just one set of questions I have that go back in my own history-- in fact, I might have even talked about these results at MIT, maybe 12 years ago or so-- and something I've been puzzling about for a long time. This an older finding which involved the perceiving of the size of an object.
So hopefully, for most of you, this ring at the back of this hallway looks a little bit bigger than the front. So you can make these effects really big if you have the right cues. And a good example of that demo comes from Goldstein.
I don't know if any of you have seen this before, but this [AUDIO OUT] at railway track. They both have the same image size, OK? And if I move that image-- I'm not doing anything in terms of scaling the image-- but we can fit that bottom image right on top of the other. This is a really, really strong effect.
And with work done with Scott Murray, and then subsequent studies with Fang Fang, we looked at whether this effect manifested at all in primary visual cortex in terms of the spatial extent of activity. So we know from retinotopy that if a ring actually does get bigger, then you're going to have a spread of fMRI or bulge-related activity. You go from representations that are more [AUDIO OUT] or posterior to ones that are more anterior.
And we found that you also get that increase in spatial extent of activity if it's just an illusory size change. And one reason I wanted to bring this up-- because I know there are a lot of bright young people here-- and I have never quite figured out what this means. And it really, really puzzle [AUDIO OUT].
And so one question-- so this just summarizes the results, and I just want to wrap up here quickly-- but I wanted to leave you with a few questions that might be related to this idea of the visual cortex as using feedback because of its expertise, if you will-- its expertise in spatial processing, processing [AUDIO OUT]-- things like size. And so what does that increase in perceived size result-- why does it result in this increase in spatial extent of activity?
And one hypothesis is that 3D object size is represented by the extent of activation of V1. And I think there are lots of reasons why I don't think that's the case. V1 could be used at some more abstract level as a kind of sketch pad to make relative comparisons.
But then there's this third idea. And that is that these illusions persist even if you have a hard segmentation problem. So I don't know if this works for you, but this ball here, for me, still looks bigger than the one below. And you can do quite a few manipulations of that sort.
So there's a challenge, just in terms of deciding which bits and pieces belong [AUDIO OUT] before you, just in terms of angular extent, before you decide how big is the object, and you incorporate 3D depth. And so this is, again, speculation, but maybe that's somehow related to this idea of V1.
We're getting these feedback effects in V1, these contextual effects in size, because of a process that needs to pull out, really, what is the angular extent? And after all, if we do have a retinotopic representation-- and V1, that's a reasonable place to get that angular estimate. But then, we still need an explanation for why we have a bias. Why is there this error in angular extent?
OK. So I need to wrap things up here. So presented some evidence that one of the functions has to do with the selection and integration of local features through feedback.
We looked at reduction of local activity related to explained higher-level features, higher-level properties, in terms of predictive coding, but also at this [AUDIO OUT] that show that increases in neural activity depend upon the complexity of the segmentation problem, the background clutter, and so forth, suggesting a role for binding across levels of abstraction, and finally, evidence to support this idea that V1 may be involved in a richer set of computations than just spatial temporal filtering, that it's also involved in high-resolution tasks that are recruited when the task demands it.
And this open question here of whether our results are also related to predictive remapping, related to saccadic planning. And maybe I'll just skip to the end here and say just a reminder that everything I talked about here owed a lot of other people. So thank you.
[APPLAUSE]