06 - Individual Subject Analysis: Part 2 of 2
Date Posted:
August 3, 2018
Date Recorded:
May 28, 2018
Speaker(s):
Gang Chen
All Captioned Videos AFNI Training Bootcamp
Description:
Gang Chen, NIMH
Related documents:
For more information and course materials, please visit the workshop website: http://cbmm.mit.edu/afni
We recommend viewing the videos at 1920 x 1080 (Full HD) resolution for the best experience. A lot of code and type is displayed that may be hard to read otherwise.
GANG CHEN: During the previous session, we talked about the typical approach was that we make a strong assumption about the hemodynamic response. So we use some standard curve to model the hemodynamic response embedded in the signal.
So now we switch to a dramatically different approach. So we don't make such a strong assumption about the hemodynamic response shape. Instead, we're going to use that data had to capture the hemodynamic response shape. So the way we do it is using so-called TENT basis functions. So multiple basis functions.
So each basis function is a tent shape, like a camping shape. Mathematically, it is defined as this expression. The way we do it is basically, at each time point, suppose we want, for example-- let's switch to the next slide-- suppose we want to cover 12 seconds of hemodynamic response. So like this. So each time interval is two seconds. So in six, we cover 12 seconds.
So those 12 seconds, that period of time, in this case, we use five TENTs. So we put each idea at some specific time location. So that each TENT has a unit height. So it's like a ruler for those time post locations. So with those unit rulers, it depends on the data point at those time locations. So we estimate the hemodynamic response.
So in the end, at each location we get the beta values. So in the end, those five betas hemodynamic the beginning and end which are held at zero. So then we get the overall hemodynamic response. In addition to that, we also have the hemodynamic response magnitude, which in this case we can pretty much approximate with the maximum value among those five betas.
So that's the basic idea. It's pretty straightforward and intuitive. But the major challenge is that, instead of one beta per condition or per task, in this case, we have five betas. So we'll have to deal with a group. And also, on the individual level, how do we summarize those betas to have some statistical significance to show the evidence of the hemodynamic response?
So that's the idea of a multiple basis functions. So with that, basically each voxel will have those multiple beta values in the end.
Another [INAUDIBLE] usually this approach does, just like the typical modern approach with one standard curve, it does not require that the stimulus timing has to be locked with the TR grids, unlike the other implementations available in software packages.
So here is an example that demonstrates the modeling approach. So in this case, we have 10 runs. Each run has 136 TRs. Each TR is two points. So that's the data set that I believe was published more than 10 years ago in that paper. So this is a experiment that has four conditions. It's a two by two design there are two factors.
The first factor is that the image, the video type is either a human video or is about some tool movement. So that's one way to categorize it. Another way to categorize each video is that either it is a well image or it's a dot video image.
So with the four different conditions, basically it's a two by two, or the four combinations. So that's basically the layout of the experiment. So I think that's it.
The four conditions, those four conditions can be categorized in two different ways. One is the object type. So it's the image. That image can be human or can be a tool. Human, it's just the picture of a human or some description of human. I will show you an image, a video of those four conditions later.
A tool can be a hammer or can be a saw. Just human tools. That's one different way. So it's either human or just a non-human object.
Another way is the video for both human and tools can be a well image versus a descriptive image, like different dots to show that's a human or it's a tool. So that's a two by two design. So two by two is four different combinations. So that's basically it. We have four tasks, I mean subjects. And this kind of will visualize those four images. Then each probably repeats many times.
So this is basically shows a human image. Right? So this is also a human image but it's dots. So this is a tool. This is a saw. This is a hammer. Right? So these are the four different types of images. So on the left column, it's a human. On the right column, it's a tool. So that's one way to categorize those four tasks.
Another way is that horizontally, this is the well image versus the second row, that's the dot image. so that's pretty straightforward. So here basically shows a human moving. This is a hammer. Right. So those are the four conditions.
Now if we look at the design matrix, it's pretty messy. That's because we have 10 runs. Right? So each run, we model with the three parameters. They are baseline, DVR drift past the quadratic, so that's the second order of polynomials. So each one will have three. So 10 runs. We end up with 30 columns for the baselines and the slow drift.
So then, we'll have four big columns each. The big column, remember we have four conditions. Each big column contains, in this case, seven TENTs. So we'll have seven regressors for each task. Four tasks. That's seven times four, 28 regressors. So the traditional approach, we only have four regressors for the task. But here, in each task, we have seven regressors. Those seven betas will characterize the hemodynamic response. So this model is more complicated. You're going to need more data, of course.
So the last six columns, of course, those are the [INAUDIBLE] parameters. Right? I mean, we could plot out this. But the plotting is very crowded. And also, because this is a TENT, there's not much to say about the TENT. Unlike the traditional approach, you see the curve. Right? So this, you just see those dots shifted.
So that's why, if we were to plot it out, the plot for the [INAUDIBLE] parameter is probably still a good idea. But since here we have, I don't know, 50 regressors totally-- maybe more than that, actually. 30 plus 28. 58 plus 6. So that's 64.
So in the end, actually let's say it is not a good idea to demonstrate. Because it doesn't show us much about the subtleties about the hemodynamic response curve difference. I mean, Wednesday, I'm going to show you a better data set which really shows the subtleties. Here, this pretty much looks the end is--
Or the magnitude. We don't say much about the-- even the magnitude is roughly at the same location. And we don't see much difference in terms of the undershoot on these terms of the up strokes. So this is not actually a good data set to demonstrate the modeling approach.
All right. So sometimes we'll call this approach a deconvolution. I mean, convolution versus deconvolution is basically like multiplication versus division. So in this case, we can only make an assumption about the hemodynamic response. So we have to find out the hemodynamic response. So that's why it's sort of like we only have this multiplication [INAUDIBLE] one, which is stimulus timing. But we don't know the hemodynamic response.
So that's why we'll have that data. So we'll have the final product. We know one of the multipliers but don't know the other. So that's why we call it a deconvolution. It's like a deviation. Right? So the word convolution is it's like how you roll something, roll the carpet. Right? So you roll together. That's called a convolution. Deconvolution, basically you go the other direction.
So that's the concept of a deconvolution. It's like a division you have the final result. Or the final product, which is that the EPI data set. So you try to find out the hemodynamic response curve, the function.
Of course, this approach is largely always perfect. It has a process of [INAUDIBLE] tool. I mean, the cool thing is usually, most of the time, the detection power is much higher. And I'm going to show you at the group level because that even more dramatically shows the subtleties of the hemodynamic response capability.
But the downside is that's why it's largely popular. Because you have to know how to do it. And also the challenge is how to check the final result. You're not dealing with just one beta. With multiple betas, how do you describe the result? How do you summarize it? So that's the challenge part. And also, if you don't design your experiment carefully, you may end up with something that cannot solve the part of the equation. Because with so many regressors, you may end up with [INAUDIBLE] problem.
So some of the regressors overlap with each other than a larger differential between different conditions. So that's another challenge. So that's why, when you design an experiment, you had better try to avoid that situation through different ways to arrange the sequence. I mean, randomization. And also because of careful to separate the tasks to some extent.
So those are the two extremes. Either you do make a strong assumption, which is a traditional approach. Or you don't make an assumption at all about the response shape. There is a middle ground. That middle ground is the SPM approach that has the concept of three curves.
First one is called a canonical curve, which is this blue curve. I mean, this orange curve is called the colonic hemodynamic response. Then there is a second one, which is the blue one. The temporal derivative, which is basically the first order of a time derivative. The third one is called the dispersion curve.
So the second one basic gives you wiggle room, even though the orange one shows the peak occurs there. But the blue one will try to shift where the peak occurs, shift the location of the peak a little bit. So that's the flexibility of this temporal derivative, where the peak location gives you that little bit of wiggle room.
The third one basically makes some adjustment about the wave of this peak. For example, we talk about a half maximum width here, the green one will try to squeeze it or maybe extend it a little bit.
So with this approach, basically you modulate a little bit about the peak location and the width of the peak. It doesn't do anything, it doesn't do much about the undershoot. Also in the end, nobody cares about the shape difference. So that's why, even though you end up with three beta values, usually people just throw away the second and the third beta values. Then focus on the first beta.
So this modeling approach gives you a little bit of adaptability for the shape a little bit. But in the end, they don't care about that flexibility. So they just focus on their first beta, the magnitude. So that's that approach. So this option is available a basis function.
You can either use the first one, as SPM does. Or you can use the second one as a second option. Or you can use all three. So all three options are available. So that's the same description.
So multicollinearity. Usually, you will hardly have this problem. But if you do, there are two reasons you may have this problem. The first problem is basically human error. You made a mistake. Supposed two conditions, you just use the same stimulus timing file for both conditions. Of course, you end up with multicollinearity. So you end up with two regressors as the exact same. So that's one possibility. That's just a bookkeeping problem.
So the second possibility is that you didn't design your experiment carefully. So, for example, you always have two tasks, one after the other. You follow the same pattern. Always AB, AB for those two conditions. Of course, then you end up with a multicollinearity problem.
So those are the two scenarios. But I mean, if you design the experiment carefully and are careful about the stimulus timing you can avoid it. But there are tools definitely that can help you to diagnose the problem. So that's the multicollinearity as a practical issue.
Another one is about residuals. I mean, residuals is the last term people don't care about. Of course, they don't care about them. But [INAUDIBLE] is because the data is structured. There is a temporal structure. So why do we need to care about this?
The traditional model for regression is that we have an assumption, using at least the squares, that the residuals are white noise. It's noise. And also white. White means there is low temporal structure. That means they are totally pure noise.
I mean, the signal processing engineering, people talk about white noise. There's no temporal structure. To the neighboring data points, there's no correlation basically. That's the assumption. That's why it's called white noise. I mean, when you have problems sleeping, maybe sometimes the white noise will help you.
But in this case, the [INAUDIBLE] the data is usually a lot of white noise if you simply adopt the traditional regression model. I mean, the main reason is, I think, because of the physiological impact because of breathing, heartbeat. There's that. And also maybe the scanner has some thermal fluctuations that may also contribute to it.
But anyway, this is a very serious problem for Brock design. When each trial has a long duration, this problem is very severe. The temporal coalition can reach 0.3. So how do we deal with it?
That means that ordinary squares is not good enough. But what's the impact, before we talk about the modern approach? The impact, if we focus on the beta values, you probably will not see much difference, whether you were going to take care of the temporal structure with residuals or not. So the betas, all of the betas measure the brainwaves bounce magnitude in the brain.
So the property in statistics is called the [INAUDIBLE]. So that means, roughly speaking, the betas are about the same. Roughly speaking. We call it the [INAUDIBLE]. Of course, in reality, there are definitely a lot of [INAUDIBLE]. But negligible, so we don't need to worry about that.
You may ask, if the impact of the betas are negligible, why do we need to talk about it? Especially when we go to group level, most of the time, people only take our beta values. Right? But there are a couple of reasons. Sometimes, people do care about the individual subject and its result. Why?
Because it doesn't have much impact on the beta. But it does have impact on the standard error of the beta values, the reliability of the information about the beta. So that means the t statistic, the f statistic will be impacted, will be changed, whether you take care of the temporal structure or not.
What is exactly the impact? It is called an inflation. That means the t value would be higher than if there is no temporal structure. So that's why it's called an inflation, inflated t statistic and f statistic. So that's one reason, I mean, people sometimes do care about individual subjects and their result.
Also to group [INAUDIBLE], sometimes people do take they both beta and the t studies to group analysis. Why would people do that? That's because the betas, the traditional approach is a simple approach. You only take it the beta values. You do make an assumption. That assumption, you assume all the betas are equally reliable.
People just do that, make that wrong assumption. And also, that approach is much simpler, easier to deal with. And also, not all software packages have the option of [INAUDIBLE] to group analysis. So for that reason, we do need to take care of the temporal structure.
Another reason is that sometimes we want to use the residual to estimate the spatial correlation, the smoothness of the data. So that's another reason we do need to take care of this. So there are multiple reasons. Some of them are for the group analysis. Some are for when we deal with the multiple [INAUDIBLE] correction at the group level. We estimate the smoothness, the spatial extent. So for that reason, we need to take care of it.
So for that, you have a new way to approach, it's called ARMA I-I, Auto Regressive Moving Average. That approach, I mean, we don't want to go into the details. But the idea is to try to estimate, I mean to whiten the noise. That's another term to describe it. Or to model the structure in the noise.
So you have this voxel-wise. Each voxel has their own temporal structure. So there is a paper, I think, recently to compare the three software packages. How do they handle the temporal structure? It compares their approaches and compares their impact. So you can read the paper.
And roughly speaking, I think AFNI is the most complicated approach because it does voxel-wise. FSL probably is in the middle ground. SPM is the simplest [INAUDIBLE]. Also the result is probably worse than the other three, I believe. But according to this paper.
So that's the temporal structure. So another practical issue is that we have multiple runs. Usually, we [INAUDIBLE] this kind of subject, for example, 10 minutes. Usually longer. A half hour or even one hour or longer.
So multiple runs. The different ways to deal with multiple runs, as I briefly talked about it before, this is a practical issue. I mean, partly theoretical things, too. So basically you have all the three different ways to do it. That depends on what are you going to do in the end.
So if we don't care about across run differences, like a [INAUDIBLE] effect or compare whether the same task has a different effect across runs. If we don't care about that, usually, I mean by default, the AFNI is you grow them together. If you have multiple runs, you just treated them as one from the beginning to the end.
Of course, when you specify the data structure, you need to tell the program you do have multiple runs. The practical issue is, how do you deal with the multiple runs in terms of the slow drift? That's because each run, we assume to have a different baseline. They have different slow drift.
So as long as you tell AFNI you have multiple runs, where those discontinuities occur, then the program automatically models the data with different baselines under different slow drifts.
So in the end, each condition or each task, you have just one beta, assuming that you just use the standard curve. With multiple basis functions, that's a different story. But anyway, that's the default approach.
That doesn't mean you cannot do other approaches. Like suppose, you do care about and do want to compare whether different ones have a different effect, whether they have [INAUDIBLE] effect. You want to compare the difference across runs. You can do that.
You just treat it as-- I mean, AFNI does have the option. You just treat different ones, create different regressors basically. When you created the stimulus timing files, we had to label them differently. And that's [INAUDIBLE] just treat it as different tasks. That's pretty much straightforward.
So just for those of you who are familiar with either FSL or SPM, I believe with FSL you cannot concatenate them. You have to treat each one separately. So if there are three runs, you have to build three separate models, three separate models. So in the end, in neuroimaging, that's why I prefer to call it the individual [INAUDIBLE] versus group analysis. But some people call it first level analysis versus second level analysis. I believe that's SPM terminology.
But in FSL, they call the individual [INAUDIBLE] first level. Then, because of multiple runs, they have to average those multiple runs. They call that a second level analysis. So the group analysis for FSL would be called a third level analysis. So it's a pretty messy situation. So that's why you usually avoid using the first level, second level, third level. So just straightforward, I just say the individual or group level.
So for SPM, you can concatenate those multiple runs. But you have to model them as separate regressors for each condition. So that's similar to what FSL, but it does have the convenience that you can concatenate them. But you have to model them as separate regressors.
In the end, probably they do have a way to summarize, to average those multiple betas per condition. So that's this practical issue of dealing with multiple runs. Any questions about this? I mean, it's pretty straightforward. It's just a simple concept.
So I think the last practical issue is the percent signal change. I mean, this is more AFNI-unique. That's because the three software packages have two different ways to deal with this scanning normalization of the signal. We know the data from the scanner is very arbitrary.
So the real values we see, the signal numbers are 2,000 or 3,000 or 1,000. So it's arbitrary in the sense that you cannot attach any meaning to those numbers. Because they can be different. Even baseline can be different from the same scanner across different days. Or the same subject, just kind of at a different time point. Let alone if you switch a different scanner, even with the same brand.
So that [INAUDIBLE] is really a headache. So that's why usually we have to standardize it, to calibrate the signal. That calibration is called, AFNI would call it scanning. How to standardize it, how to calibrate it, that's the subtlety here.
AFNI would do it at each voxel separately. So how do we do it? So suppose we have 200 time points. We divide those 200 numbers by the average of those 200 numbers. Suppose we divide it then multiply by 100.
So we do that. Then the idea is, in the end, you still have the time series. But the values would be roughly one to 100. Because you divide by the average. Then it would be roughly a one to one. But you multiply by 100. So it would be 100.
So in a model, remember, the baseline in AFNI is just the intercept. Then the regressors are something above that intercept. So we assume, because we divided by the average, then the baseline is roughly about the average. So the real regressor of interest would be something above that.
So we can interpret that as an approximation of percentage signal change relative to the baseline. So that's in AFNI the beta values are interpretable. And we call that a percentage signal change.
So you remember that is an approximation. Why is it approximation? Because ideally, we want the beta to be interpreted as a percentage signal change relative to the baseline. But nobody knows what the real baseline is ahead of time. Even after we model it, we do get the so-called baseline.
But whether that's a baseline or not, we are not so sure. Why? Because there is slow drift. Where do you see there's a slow drift? Sometimes it goes up. Sometimes it goes down. Or sometimes you have both situations as it goes up and then goes down later on.
Where do you say that baseline is? I mean, what's the definition of baseline? So we don't have the exact, we don't know the exact baseline or where it is. So that's why, instead of dividing by the real baseline because we don't know, we divide by the average of the time series.
Suppose there are 200 numbers. We divide by the average of those 200 numbers. But then, how good is that approximation? You may ask. Well, I mean, I wrote a commentary about this. If you do a back of the envelope calculation, you will say that approximation is good enough. It's pretty much legible. I mean, it's 1% off the percentage of the percentage signal change.
So that's the AFNI approach. It's voxel-wise. So that's why, in the end, because it's calibrated, you can compare across objects. You can compare that beta value across different regions of the brain. So it's interpretable. And it makes sense when you do group analysis.
AUDIENCE: Doesn't that imply there is some threshold of activity where your block design has a lot of intense stimulus over time? Isn't there some threshold where that average is going to get up to a point where it becomes irrelevant?
GANG CHEN: Yeah. That may have some impact, especially if you have a lot of blocks. You don't have you love baseline time period, I guess. But still, the impact would be minimal, I think. Nobody has done any. I mean, nowadays, that kind of design is weird.
Yes. It's a legitimate concern. But I don't know the-- my gut feeling is that it's probably still OK. I have to do another calculation good to see you how bad that is, if that's really the case. Any other questions?
So that's the AFNI approach. I believe the alternative approach is that they don't do this voxel wise. Instead, the tool for the whole brain shared the same calibration as [INAUDIBLE] scanning. So you average across all the time points and also across the voxel, the spatial part. So that's called random scanning.
I don't understand their rationale. But that's the way it's done in the field. So in the end, to me, their beta values are not interpretable. If they are not interpretable and they are not comparable across objects, across brain regions, then I have trouble understanding why you need to do group analysis.
When you do group analysis, you want to compare the subjects or average them under some common ground. Right? If they are not comparable, then what's the point of doing that? So at least conceptually, that's a little bit problematic.
So as far as I know, this also has an impact in the result when people show their brain image, show crosshairs, colored blobs, when they show that in AFNI-- as Daniel showed this morning-- we have the differentiation between the beta and the t statistic.
In AFNI, we specifically ask you to show the betas. We don't want you to show the t. We don't care about the t, except that as long as your t values pass some threshold, the specific t value-- whether it's 7.5 versus 3.7-- it doesn't matter. As long as it's about some threshold, the t value doesn't mean much.
For us, the beta is the most important thing. Because the beta shows the real magnitude. The t value, maybe it's some indirect indication of the brain response. But t is dimensionless because we divide it by the same scale. So it doesn't have interpretation.
For many of reasons, I wrote a commentary about this issue two years ago on this. First of all, maybe this is dramatic. But usually I use the, in physics, if we want to measure the distance between the Earth and the moon, in the end you don't tell me the distance. You just say, my t values is such and such. Is that something physicists would be happy to see? Of course not.
But the strange thing is, in brain imaging, people don't talk about the brain's real response. They just show you the t value, the magic number of 0.5. So that has unfortunately been the reality in the field for over a quarter of a century. That's how it has been going on. So that's just for this.
And the way you compare different studies, you do want to compare to the real magnitude. You don't want to compare the t statistic.
Also, when you do meta-analysis, that's pretty popular nowadays, you want to summarize across different studies. I mean, nowadays the meta analysis is so crude, they don't use betas. They don't even use the t values either. They just find that so-called peak of voxel then [INAUDIBLE]. Then they make an assumption to meta analyze across different studies. They don't even care about the beta values.
That's because, in the field, there are multiple reasons. One major reason is, people don't report the real effect this makes. So that's what's going on. So I mean, I can keep babbling about this for a long time.
But the point is here, it's a good habit. And it's a healthy way to move forward in the field. We do want to report. So not just for the whole brain image. We also want to report our eye level result. We also want you to report the real beta, where the effect is made.
I believe with the other software package, there is probably a workaround solution. For example, I heard there a package called the Mars Bar. But it's a [INAUDIBLE] package. You can use that to convert the beta values from your group analysis to percent signal change. If you really care about that, why do the first step? Just do it during the pre processing. Then you don't need to use the so-called Mars Bar to do that conversion. Why do you need to go through other steps?
I think one defense I heard in the FSL camp, they argue that if they do voxel-wise scanning, they may have trouble around the edge, the border of the brain. They have some irregularities. But I don't understand exactly what kind of irregularities they're talking about.
But anyway, at least I haven't seen such a problem yet. Maybe, maybe there is. But the border of the brain, the mask, I mean that's easy to take care of. We don't really trust those voxels on the border. I mean, why do we need to sacrifice, just because of those few voxels, we sacrifice the interpretability of the beta values? So that's my argument, my defense of the approach.
So I think this is probably pretty much my last slide. I mean, when we look back, [INAUDIBLE] at the individual level, nowadays it's just routine. But usually, we don't do a good job in modeling. That's because we can only explain a small portion of the signal in the data.
That's because we don't fully understand the mechanism of the port signal. We just use whatever is available in whatever implementation. In the end, we have to be content with what we have. So that's the individual level.
At the group level, it's pretty much similar, which I will elaborate on, probably on Wednesday. It's a similar situation, the modeling approach. But we'll have to go baby steps to build on previous results to try to improve the modeling.
So basically, this talk focused on the individual level with the typical routine. As the last slide, I summarize here. Also, I showed that at the beginning as a preview.
So in summary, basically we have three different ways to model. Why just use the standard curve? So there is an empirical curve. So that's the simplest approach in the end when we have one beta per condition.
Second approach. We don't make any strong assumption about the shape. Instead, we try to pull other shape information from the data. We trust the data. So you have to either use the TENTs, multiple TENTs. Or use the cubic spline. The difference between the two is pretty negligible.
Also I want to emphasize that, in AFNI-- that's another subtle difference between AFNI and other implementations-- that, if we use the TENT as a multiple basis function, the stimulus timing does not have to be synchronized with your TR. It can be anything. I mean, TR is when the scanner acquires each volume at the beginning of that acquisition.
So with the TR grid, the stimulus timing, the task, whatever the events or trials, they don't have to start exactly at the beginning of the TR. It can be anywhere. That's even if you use this approach.
But I believe the other implementations do require the stimulus timing to be locked with the TR. If they are not locked, I don't know whether you have a solution or not. But at least in their implementations, that's a requirement. So that's some subtlety about that multiple basis function approach.
So the middle ground, that's the SPM approach. It's also available in AFNI. You have three basis functions. So you have this colonic curve, then the other two adjustments. One modulates where the peak location is. The third one is just the width of the peak.
So then we talked about a few practical issues such as concatenation and the percentage signal change. All right. I think I finished five seconds ahead of time.