17 - Group Analysis in fMRI: Part 2 of 2
October 16, 2018
May 30, 2018
Gang Chen, NIMH
All Captioned Videos AFNI Training Bootcamp
Gang Chen, NIMH
For more information and course materials, please visit the workshop website:
We recommend viewing the videos at 1920 x 1080 (Full HD) resolution for the best experience. A lot of code and type is displayed that may be hard to read otherwise.
PRESENTER: Most of time it's fine, but it depends on the situation, whether-- how much [INAUDIBLE] difference across the subjects. If the difference or the variability changes a lot across subjects, then we may face some issue. So that we can-- because what we hope reigns is not just one [INAUDIBLE] so they-- I cannot say-- I mean, it depends on the region in the brain, too. So it's difficult to say ahead of time what's the exact impact. So on this, you have to try the two approaches, then say the difference. So that's basically the scenario.
So in theory, of course, it needs-- doesn't hurt you, and even if there is not much accuracy and heterogeneity across subjects, the approach still works fine. It's just that whether you wait, the waiting would be treated equally, if there is little heterogeneity across subjects. So theoretically speaking, this is always a good way to do it, to adapt, but the more practical issue is whether you want-- the competition cost, and also there are a lot of complexity. So that's something you have to keep in mind.
So now let's switch to the specific AFNI programs, which scenario you would use which program. So that's basically [INAUDIBLE] a roadmap. So that's-- there are many factors that really take into consideration when you decide which program to use. So first, I want to measure how many programs we have for group analysis. There are quite a few. First of all, the t-test, which is a t-test-- the problem is called a 3dttest++. Why it's called a plus plus? Because there is an older program called the 3dttest. That program is already obsolete so we don't need to mention that.
So 3dttest++, as the name indicates, is a program that handles the simple t-tests that use that program. You can basically do one sample t-test, two sample t-test, paired t-test. You can also use the two basic general [INAUDIBLE] model. So if you have a quantity of variable, you can do regression.
So you can do on-- several on [? COLA. ?] So you have two groups, for example. You can-- two groups plus a covariance like age, IQ. You can throw it-- you can do general linear model. So that's-- even if it's a paired t-test plus a covariant, you can reduce the paired t-test into one sample t-test that you can use to reach [INAUDIBLE].
To handle that, I mean-- are two conditions. You just get the difference between two conditions plus the covariance. So that program is kind of actually more than just a [INAUDIBLE] as the name indicates, the t-test. So you can also use-- sometimes it's to a general linear model you would pair the t-test plus the covariant.
So that's still one scenario. You have t-test. Another scenario you may think about is 3dttest++ has the embedded functionality of doing multiple testing correction embedded in, which I will talk about this afternoon. So that's a-- other programs like ANOVA, 3dMVMs, 3dLME, [INAUDIBLE] mix the-- [INAUDIBLE] a mixed-film approach.
The multiple comparison correction step has to be a separate step, unlike 3dttest++. The program itself can correctly perform the corrections. So that's another advantage if you use the 3dttest++. So even if you have a normal scenario, because of that advantage you may break your ANOVA structure into multiple t-tests. Then you use 3dttest++. So that's something you may take into consideration. So that's a t-test to do a t-test or to a general linear model.
If we have ANOVA then basically you can use 3dMVM. That's a multivariate modeling approach to handle all sorts of ANOVA structures. So if you have any-- ANOVA, we don't have the problems of dealing with the incorrect graphs, that is the formulations on this-- the t-tests are also-- [INAUDIBLE] t-test are also under the [INAUDIBLE]. So whatever your ANOVA structure or ANCOVA, as long as [INAUDIBLE] within-subject ANCOVA, you can use 3dMVM.
So the last program is a 3dLME [INAUDIBLE] model that-- it's purely-- we use that to handle some particular situations like missing data, or you have within-subject covariant. Those are the two popular scenarios you may consider using 3dLME. Another situation is that if you want to calculate the ICC, intraclass correlation, you can use 3dLME to compute a intraclass correlation.
[INAUDIBLE] three ANOVA programs that there are probably-- for simplification I-- or just equal all that because nowadays that means that just recommended to use 3dMVM. Unless you are familiar with-- you are already mention the ANFI user. You know how to use those programs. I will briefly to mention them as well, later.
So first started with 3dttest++ as briefly mentioned before. It's-- oh, I forgot. Also mention 3dMEMA. That's pretty much the mirror image of a 3dttest++ except only-- the major difference is-- so in MEMA you need to take both beta and the t statistic as input. So that's a major difference.
So we'll use that to handle t-tests, to handle our univariate general linear model. But you can also handle simple ANOVA structure like 2 by 2, or 2 by 2 by 2 even, as long as you know how to break the ANOVA structure into multiple t-tests. That's-- especially the interactions. You need to-- [INAUDIBLE] it's a little bit tedious by the [INAUDIBLE] you know they're-- you get the hang of it. Then you can still use 3dttest++ to do it multiple times instead of just once.
So that's the three tiers of [INAUDIBLE]. Then [INAUDIBLE] scenarios ANOVA, I see here there is a 3dANOVA, 3dANOVA2. That means we can just equal or just simply use 3dMVM because that's easier to understand and handle it. So there are multiple different ANOVA structure. I mean, it can be purely between-subject ANOVA or purely within-subject, or can be in a mix, a mixture of two types.
All right. So let's look at one example. Here, this is the [INAUDIBLE] experiment design Greg already mentioned yesterday or even Monday. That will have two conditions. So that's an auditory condition versus a visual. So without visual versus without auditory. So the first script [INAUDIBLE] we use 3dttest++ under one condition to one sample t-test for the [INAUDIBLE] visual condition.
So-- and it's this standard AFNI [INAUDIBLE] format to write a script. So first you have a program name. Then we'll use the prefix to specify the output file name. Then if we want to mask out the outside of the brain, you can provide the mask. Then you just list all the input files. In this case, remember we have-- we just focus on one condition.
So 10 subejcts, so one group of subjects? Plus each target has one which is the [INAUDIBLE] visual. So you may notice that your AFNI, the file from the individual subject output, analysis output, the file contains its four-dimensional data set. So each-- the file contains four dimensions-- the fourth dimension is the-- in this case, we call it-- the term is called a [INAUDIBLE], and each one is a brick, so three dimensional. So that can be-- each 3D data can be-- data value can be t statistic, it can be f statistic, can be correlated value.
So because we're here, we want to only take the beta associated with one particular condition, which is the without visual. So we need to specify where [INAUDIBLE] selector use the square brackets to specify which beta we want to use as input for plus [INAUDIBLE] parts. Now you may notice here within the square brackets, I use the labels instead of a number. Either one would work but I strongly recommend you use the label. Why? Because the label is self [INAUDIBLE] immediately you see the label. That's because as a user you designed it. You will put that label in. You know what it is. Unlike the subject number, the number is just a number. If for number 5 that doesn't tell you much. Not as informative as the label. That's one reason.
Another reason it's less-- I mean, you would not be able to make a mistake. If it's a number-- some subjects, the output file from the individual surplus may be different across different subjects. So that may be a-- may cause some trouble. So that's another reason I strongly recommend to use the label instead of subject number-- subject numbers. So it's always the label attached with the pound sign, then the zero underscore coefficient. The coefficient that [INAUDIBLE]. So that that's pretty straightforward.
So how about we use the mixed effects model approach? Take both betas on the t-stat as input. That's the second half of this slide in 3dMEMA. Pretty much the same structure, but instead of one set of input file, we have two sets. First of all, the second set is the t-statistic. As you can see here, there is an appendix part, the underscore Tstat. So the label, also, in this case, is both more informative than the separate numbers.
One more thing I want to mention, here, is the last line in the 3dMEMA script is that for missing data zero, that is important for those voxels along the edge of the brain. When you have a mask-- right, a mask? Maybe they're slightly different across different subjects. The mask may be for this subject, for this particular voxel on the border, the next subject maybe does not have the data. Right?
So you find 10 subjects. Maybe only seven of the subject has value has a beta available. Three of them maybe have a number of zeros. So we need to organize this. For that particular voxel along the border of the mask, you only have seven subjects available. If you don't do something special, then just take the other three subject with a value of zero, then-- I mean, the program will work, of course. We are allowed to see it and complain. But the problem is the analysis is not accurate, because you have missing data, basically.
So how to handle that is basically you tell the program, where ever you see a beta value of zero, consider you have missing data. Then you can do [? conferences. ?] Only folks on those subjects with real values. The zeros values for the subjects with zero values, we're going to treat as missing data. So that was programmed [? with ?] a more accurate modeling for the [? conferences. ?] Any questions about this?
So that's 3dttest++. So similar, actually, for 3dttest and 3dMEMA. For [INAUDIBLE] a similar concept. The option is called zskip. It's the exact same concept, just a different option name.
So that's a one sample t-test. Simlarly for a two sample t-test, which, basically, you have two groups. A paired t-test, you have one group of subjects but two conditions. So here's an example using 3dttest++ to handle paired t-test. That's still the exact same data set, with just to perform the difference between the two conditions. It's a reliable visual versus reliable auditory. So that's why we have two sets of input.
So we use set A, set B-- those two options to specify the two conditions. Plus, we leave either option paired. If we don't put that option, it will treat those two sets as two groups. So the default, if we don't put the paired option, the program would treat it as a two sample t-test instead of a paired t-test. So we have to be very careful. So if it's a paired t-test, don't forget to put that option.
We can do something similar with 3dMEMA, except that we have to put the contrast as input, because the program cannot handle the two conditions as separate input files. So that's why we provide-- look, notice the label, here. We provide the contrast, the beta for the contrast, as input-- as well as the t statistic for the contrast. So other than that, it's pretty much similar for the one sample scenario.
So that's a t-test scenario. Then ANOVA-- that means there are several scenarios about the NOVA. So firstly we have the one way between subject, and NOVA, that means just have one factor. It's a between subject factor. That's 3D between subject in NOVA. Then you have multiple way between subject. That means we have more than one factor, but that all the factors still are purely between subject factors.
So let's switch to another end. It's one way within subject factor ANOVA. So we have one factor. That factor is within subject. That means, like, two or more conditions. So that's ANOVA.
Or so here is an example. Even though it's a paired t-test, we can treat as ANOVA. Because paired t-test is ANOVA. It just has one factor, that factor has two levels. Right? So it's just a special ANOVA. So we can use the same paired t-test scenario to use the ANOVA structure. So in this case, it's an older program called 3dANOVA2. But you can use 3dMVM2, as well. So it's just different ways to conceptualize the data structure and use a corresponding program.
So we can keep going two ways within subject ANOVA, then two way mixed. That means you have one between subject factor, plus one within sampling factor. So that's mixed.
So now let's come back to the demo example I mentioned before. Right? So in this case, we have-- remember this? It's a two by three, plus the covariate example I showed at the beginning of the demo example. So the [INAUDIBLE] case-- remember, first factor is a between subject factor. We have two groups. We have patient versus control. The second factor, it's three emotion conditions-- positive, negative, and neutral.
So two factors, plus we have a quantitative variable as h. So two by three plus h. So the traditional approach in neuroimaging, when they use univariant general linear model, they would not be able to handle this. because we have the h variable in the model, the traditional univariant general linear model would not be able to handle it. But if here we'll use a multivariate model approach, so then it's pretty straightforward.
So the program is a script. It's a little bit different from the typical [INAUDIBLE] other problems, like 3dttest++, 3dMEMA, or 3dANOVA2 programs. So here, we need to put the data table-- I call it data table-- at the bottom of the script. So the table basically lays out the data structure.
So the first column is the subject labels. I mean, it's subject ID, for example. The last column is the input files. In this case, the [? lft ?] format. Right? So MRI, that's a [? lft ?] format. So remember, the first column is the subject IDs, the last column is the input files. In between, you just put all the experimental variables. What is a factor, either factors or quantitative variables, covariance. So in this case, we'll have three columns. We have three variables-- three experimental variables. We have two factors, plus we have a quantitative variable.
So the second column, that's group. So basically either it's control or it's patient. So this column, condition, that basically says each beta is either positive, negative, or neutral. So this column, in the middle, that's h. You put the numbers-- the h values for each subject. So that's the data table. So that describes the data structure.
So the first line, that's 3dANOVA2 program, the program name. Prefix-- we specify the input file names. Jobs-- we use that option to specify number of CPUs we want to use, because this program usually takes much longer time than 3dttest++. So we need to take advantage of parallelization. So we use the multiple CPUs.
So then the next two lines it's a little bit esoteric, here. So let's first describe the second line, bsVars-- "bs" means "between subject variables." So in this case, we'll have two between subject variables. First is group. Right? That's basically patient versus control. Then we have h. H even though it's a quantitative variable, it's a covariate, but it is also between subject variable. Why? Because each subject just has one number. So that's between subject variable.
So also, you notice that I put a star there. That's our notation. That means we want the both main effects plus the interaction. So A star B-- the variables A and B. A star B A plus B plus A colon B. So that means that we want main effect of A, main effect of B, plus the interaction between the two. A colon B, that means the interaction between the two variables.
So the next option here, wsVars-- we use that option to specify within subject variables. So here, we only have one, which is the condition. So we have positive, negative, or neutral. So those three conditions. So that variable is a factor, so that's a within subject variable.
The next one qVars-- that's a quantitative variables. So in this case, we have h. That's a quantitative variable. So that's basically covariate.
Then the next three lines-- if we stop there, without the next few lines, basically we're going to get it [INAUDIBLE] pass test, but [INAUDIBLE] test. Main effect plus interactions. That's it. But that's usually not good enough for us. So we need to use the t-test to partition out those [INAUDIBLE]. So that's why we need to use those three lines to specify exactly what post op tests we want. For example, the first line, here, we use this general linear test-- GLT, general linear test-- GLT label. The first number is basically index. Then we put a label-- the label basically is an indicator of what the test is about.
So then we use GLT code to code exactly what we are looking for. For example, let's look at the code first. So we want to control the group-- either patient group. So first you have the variable name-- space, colon. After the colon is a space. We have the weight. I mean usually, the weight is either one or minus one. So we want to fix the group-- either patient group-- then also fix the condition and as a positive condition. So that's why here in both cases the weights are one. Weight is one.
So that's very boring. Right? So we want to basically see the combination for the group and the condition. So basically, we want to say the positive condition of a patient group. So that's the first post op test. The second one, we're looking for the contrast between positive and negative for the control group. Right?
For you, probably, the first time, it's a little bit overwhelming. It's difficult to decode this, unfortunately. But if you write the script, it probably had to take a little bit of time to learn this [INAUDIBLE] it's called a 3dMVM validator. That GUI, the only thing you need to do is clear this table underneath. So then the GUI will help you to [? logic ?] the right way the script, like this. But it has the patterns that you can use to specify exactly those options, and they are the post op tests.
So for beginners, probably that would be a better way to start with. Once you know it, then probably it's easier to write the script. Probably much faster to use the GUI part.
So that's basically the structure of the script. So because the options are between subject, within subject, and quantitative variable. So those are something new. That [INAUDIBLE] getting used to it. So you need to specify the variables.
The program will automatically based on the information provided to generate the model. So in this case, basically the model is the group by h, by condition. So you can get all the possible interactions among those three variables. It's going to give you three way interaction, two way interactions, plus all the main effects. There's three main effects. So that's basically how the 3dMVM works.
Now let's switch to another scenario. It's that so far, we only talk about the situation-- each condition you have one beta. That means we model the [INAUDIBLE] response with one curve. Right? As I mentioned, that [INAUDIBLE] we make a strong assumption regardless what take, regardless what subject. I mean, regardless of what [INAUDIBLE], it's the same curve regardless of what curve you use. That curve is fixed.
So if we want to be a little bit fancy, we don't throwaway that assumption, we just use the data to estimate the [INAUDIBLE] response shape. So then you [INAUDIBLE] with multiple betas. Suppose you use 10/10. Right? 7/10, for example. You end up with seven betas. So what can we do? Right? So that's the challenge at the group level, where we have multiple betas.
That's one scenario. Another scenario, in SPM, you have those [INAUDIBLE] adjusted-- I called it the adjusted shape approach. That, you have three basic functions. So the typical SPM approach, you throw away the second beta and third. Those two betas. You focus on the first beta-- like the canonical curve. Canonical response curve, like the first beta. So you just do it with the typical approach.
So [INAUDIBLE] a beta, and also, how do we handle the multiple betas at a group level? The simplest approach is basically, I think, let's just ignore the complexity. Just treat those multiple betas as a factor. So that's the simplest approach.
Just supposing you have seven betas associated with those seven 10 functions. You just call that an extra factor. It's a within subject factor, of course. Whatever you want to quote. You quote a time, or quote a basis function, you could just put a label as an extra factor. So then you may ask, what kind of gain would it give us? Right? If we tried that.
So here is an example. There's a better example [INAUDIBLE] to demonstrate this modern approach. So the first row is just-- we use multiple betas. In this case, we have 10. 10 functions. So we have two conditions in this example, and plus we have the h as a covariate.
So actually, it's a two by two-- it's a child versus [INAUDIBLE]. So that's first. It's a between subject factor. We have two conditions-- the congruent versus the incongruent. So we have two factors-- one within, one between-- plus we have h as the covariate. In addition to that, each condition is either congruent or incongruent beta values, because we use the 10 basis functions.
So we compare two approaches. First approach, we estimate the [INAUDIBLE] with 10 betas, with 10 basic functions. The second approach, the second row, we use the SPM approach. We use three basis functions. Canonical curve, time derivative, plus the dispersion curve. So those are the two approaches.
So in the end, when we come to the group level, the approach I mentioned before is that we just treat those 10 beta as a factor. So in the end, whenever we say something about the effect, we basically look at the interaction of that multiple betas with the other, whatever effect we're talking about. So in this case, we look at the interaction of those two factors, the interaction between group and condition. So that's a two way interaction, right? Two by two.
So that's not an interaction in the traditional approach. If just one beta, we would look at the two by two interaction. However, here, because we would have 10 betas, so originally, that two way interaction becomes a three way interaction. So we look at interaction between those two factors and the 10 betas. So that would be the three way interaction. That would be this part, this image.
Versus the approach [INAUDIBLE] approach would be on the right-hand corner-- this one. So that's the SPM approach. You only take the first beta-- ignore the time derivative beta and the dispersion beta. So you focus on the first beta, the beta zero. So that's how much you see, versus we used the [INAUDIBLE] estimate the [INAUDIBLE] response. So that's the big difference.
You may see here, this blue circle, here-- that's [INAUDIBLE] SPM approach. But a lot in this approach missing. However, that missing part can be compensated by this extra information available through looking at [INAUDIBLE] main effect. That can be found through a complementary approach.
So this, it's more powerful, in the sense you're going to see more clusters-- way much more than the traditional approach. You can do detectable. Why is that? Well, not just because in the end, you look at the clusters. You are also going to see something about the excited response curve. Unlike the traditional approach, you just have one beta. They're laughing. You can talk about the curve. Right? Here, because we [INAUDIBLE] response, you can immediately used to prod out the response function shape.
So here, let's look at specifically what those curves are. So the first column are the two groups. I mean, the first group-- focus on the first group. The two colors code for the two [INAUDIBLE] so the first color is for the adult group. Second color is for the child group. So the two colors-- the blue one is for the congruent condition. The red one is for the incongruent.
So look at each one. We can see, for example, the first one-- the magnitude of the peak is about the same between the two conditions-- congruent versus incongruent. But the difference is about the recovery, to undershoot part. Of course, the traditional approach will not work. The SPM approach will not help either, because in the end, you throw away whatever the adjustment [INAUDIBLE] peak. The peak will not help with detect the subtle difference like the undershoot part. Right?
So even for this, the second scenario, here, there is an undershoot for one condition at the beginning. This undershoot is right after the onset, before the peak. Undershoot-- I mean, the traditional approach will not help you to detect it either, because that subtlety [INAUDIBLE]. You have to use this approach to see such nuance.
So there is many other such scenarios like here [INAUDIBLE] undershoot at the beginning, and the end, as well. So that's why this is the right estimate the [INAUDIBLE] response shape to help with [INAUDIBLE] the detection power. How we handle the scenario where you estimate the [INAUDIBLE] response shape?
So then there are other complications-- complicated [INAUDIBLE] but I can ignore that. Trend analysis. Covariant. So I want to say a little bit more about the covariate. The covariate-- one issue with the covariate is-- I mean, there are several scenarios. One is that you may have outliers, but it's always complicated how to handle outliers. But let's ignore that.
One more thing, and one specific thing I wanted to talk about-- the complication of modelling covariant is called so-called "centering." Why the centering is important? So relatively speaking, the centering is not really such a big deal. I mean, textbooks usually don't even discuss it. It's more a practical issue. The main thing is, it would have impact on the interpretation of the result.
Why? Well, let's look at whenever we do a regression, covariance in here, especially if we use it for the scenario where the variable is a quantitative variable. For example, in this case, we talk about h. We want to use h as a covariate. Suppose at a group level, we just have one group of subjects, just one condition [INAUDIBLE] centering a big deal. That's because when we fit the IQ as a variable, we just simplified as a straight line.
Whenever we have a straight line, we need two parameters to specify that line. One is the intercept, the other one is the slope. Right? So that's the how shallow or how steep the change between the brain response and the IQ is. So one is the intercept, one is the slope. If you are only interested in the slope, the slope is quote "marginal effect." That means, when IQ increases by one unit, what's the amount of brain response would be. So that's basically a marginal effect.
But the intercept-- what is the intercept? Intercept is when the IQ is zero, what's the brain response? Well, that sounds like very strange. Right? Nobody would be interested in the brain response when the group average IQ is zero. Nobody would be interested in that, right? So that is a problem. But the really interesting thing is, when the IQ is at some particular number, like group average or the population IQ, then we want to know the average brain response.
So that's what the centering issue is about. So how can we do that? How can we achieve that? That's the crucial question. That's the centering part. How do we do that? That's basically.