20 - Group Analysis: Part 1 of 2
Date Posted:
January 28, 2019
Date Recorded:
May 30, 2018
Speaker(s):
Gang Chen, NIMH
All Captioned Videos AFNI Training Bootcamp
Description:
Gang Chen, NIMH
Related documents:
For more information and course materials, please visit the workshop website: http://cbmm.mit.edu/afni
We recommend viewing the videos at 1920 x 1080 (Full HD) resolution for the best experience. A lot of code and type is displayed that may be hard to read otherwise.
PRESENTER: So back to the point we discussed about how to model the covariate when you have a quantitative variable. So one thing I forgot to mention is that in the literature people may say, or in other software packages, they say when you have a covariate-- in that case, in their usage it would be at some variable of low interest, like age or sometimes even gender. So when they describe it, they say I covariate out, or I regress out that variable. So that description to me is a little bit vague and inaccurate in the sense that whenever we have a quantitative variable, or even a categorical variable, when you put in the model, you can't just say simply I regress out or covariate out. It's more accurate to describe it as basically you control that variable. Or to fix or to hold that variable at some particular value. That's really what it is about.
So that also is tied with what I was mentioning-- the centering issue. So before we talk about the centering issue-- one thing I briefly mentioned this morning is that whenever we have outliers, quantity variables always have the danger of outliers. Traditionally, people may just remove those outliers. That's a little bit harsh and also arbitrary. So a better way is to not simply remove them, because there is no basis to make such a decision, right? It's pretty much like a threshold. So a better way probably is to adopt an approach called a robust regression. So you don't remove them. Instead, you downgrade them-- downgrade those outliers.
So that is available as an option in 3dMVM, but this only works between subject variables, like gender, or quantity variables like age or IQ. But it does [? largely ?] work when you have within subject variables. So that option will help you to deal with outliers. There are a couple of other subtleties whenever you have quantity variables. I'm going to skip this, but it is a little bit tricky.
So the reason the centering is important as I mentioned before, even with a simple case when you have one group of subjects-- just have one quantitative variable-- in this case, IQ-- if we only care about the slope, then centering does not matter. You don't have to center, because regardless, whether you center or not, the slope is always the same regardless, right? However, if you are interested in the intercept-- in this case, we want to look at the brain response. When we incorporate IQ as a variable, then centering becomes very crucial, because that will have huge impact about interpretation. Of course, about the result as well. So if we don't center the IQ values, we would end up with the result for their brain response when the IQ is 0 for the population. So of course that's not something we really want. So that's why we need artificially to shift-- to change the original IQ values.
So one possibility is to subtract every subject's IQ value by 100. So you artificially change it. Then the new 0 would correspond to the population IQ values of 0. So then the intercept would be interpreted as the brain response when the group average IQ is 0. So that's one possibility. So that's the population center. Another possibility is the group average. Suppose for this, there's a group of subjects at a college campus. So the average IQ is a little bit higher than the population. Suppose it's 115. Then in that case we center around this group mean. Then you interpret the result-- the intercept [? is ?] the brain response-- this group average. So you have options. Of course, not just these two. It depends on the context. You may center them around other values which would be convenient for the interpretation.
So the point is you need to keep in mind that centering is a crucial part of this interpretation about the intercept. This is not, again, a theoretical issue. It's more a practical issue. So you need too take care of that-- depends on the situation. Sometimes the program automatically centers each covariate at the group or average. But if that's what you want, that's fine. If not, then you have to take some extra steps before you put those numbers into the program. Well, we'll come back to that later.
But that's just one group of subjects. If you have two groups, or if you have two conditions, then you have, in this case, age. Then in the literature in other software packages, usually they just say the covariate of that variable-- in this case, age-- when they do that, they make a big assumption. It means that they only consider potential interaction between that variable and age. So in the end, they just assume those two conditions in this case share the same aging effect. No interaction effect. If you don't put that interaction effect in the model, of course, the program will give you what are you are looking for. That doesn't mean that's necessarily a good model or a reasonable model. So ideally, probably, you do want-- don't make such a strong assumption. Let the model help you, assist you to find out whether such an interaction exists or not. So ideally, regardless, you probably put the interaction-- consider the potential interaction and let the data, the model, decide such an interaction; effective or not.
So that's the interaction part whenever you have another variable in addition to the quantitative covariate. In addition to that, there's also the complication of centering. So in this case, of course, it's a little bit dramatic.
Suppose we have two groups of subjects. Each group has their own average. Suppose, for example, two groups. One is adolescents, and the other group is adults. So if age is a covariate. So in that case, of course, adolescents-- kids-- suppose the group average is 15.7. That's the average age. The other group is adults. Suppose it's 42.3. So then, the issue becomes what are you going to do with that case, right? I mean, neuroimaging, often I hear people talk about-- and there's a website that is specific, say, within-group centering, does that make sense?
But that argument to me is a little bit troubling, because I would say it depends on the context. Sometimes it makes sense to center our [INAUDIBLE] or our mean cross groups. Sometimes you will probably have to think about it carefully. Maybe within-group centering, it makes more sense. So in this particular case, with kids versus adults, if we center our [? 1 ?] or our mean of the two groups, then we lose the definition of kids, right?
Because suppose that you combine the two groups. That the average age is 27.3. Then you do that. The kids-- the [? lesson ?] group is largely not kids anymore. I mean then the group comparison-- that would have a big impact on the interpretation, because when you compare the two groups, the age would be something in between. That's the kids group-- it's not kids anymore. Right? So that's the problem part.
So you have to be very careful. I mean, sometimes, of course, it does make sense to combine the two groups. But that depends on the situation. It depends on where you center it. You can see the group difference-- it could be a dramatic difference. So here this dashed line in the middle-- that means you center around the common mean. So the two group difference is huge. Right? So if we move to here, or also move to here, then the difference would be a dramatic difference. So that's something you have to be mindful about. So that's about centering.
The next topic is about intraclass correlation. How many of you know intraclass correlation? Have you heard of the concept of intraclass correlation? No? So some of you know this. So it's a concept that probably in 1970s-- close to the 1980s. So this is a famous paper to describe the concept. What this is, it's some special kind of correlation. But when people apply the concept to FMRI, basically you want to measure the reliability. What kind of reliability? It's when you-- for example, you scan the same group of subjects. Scan it twice. Once scanned one time, the next time scan another exactly the same task. So you want to see how reliable the effect is. It's the same group-- same task. Just repeat it once. So then that's called a intraclass correlation.
That's different from the general concept of a correlation-- Pearson correlation. In that case, those two variables, x and y, can be totally different. For example, correlation between height and weight. So those are two totally two different physical measurements. Here we're talking about exactly the same thing. In this case, for example, it's basically brain response. So that's percent signal change. Both are beta values-- percent signal change. Measures exactly the same thing, but we just measure the reliability. That's why it's called intraclass, instead of inter. Inter is the example I used, height versus weight. That's between two different types of variables. This intraclass means the same classification. Same thing.
In the literature, people talk about this. There are many papers now where people emphasize the reproducibility or reliability. That's why there's this concept of intraclass correlation that can be applied to such a scenario. So there are different ways to describe-- there are three types. That's why there's ICC 1. The first one is the different types. The second one, we can pretty much forget about. Let's just focus on the first index. So there's ICC 1,1; ICC 2,1; and ICC 3,1. So I think we can pretty much ignore the first one. The second one-- 2,1-- that's pretty much the popular one, I think. That's basically you treat both--
So the idea is there are two indexes here. First one, the i, is the subject. The j is, in the case of suppose we scan the subjects twice, that's basically we repeat it twice. That's two sessions, for example. So you want to say, between the two session, how is the reliability. It boils down to just the correlation value between the two session. So that's why if you think of the situation as a model, we just have this-- you can quote a two way-- I mean, a one way one random affects ANOVA, or little mixed effects model. So that's the concept of intraclass correlation. So that's the ICC 2,1.
ICC 3,1-- sorry, probably this is a typo here. It should be two way instead of one way. Because both this subject and this session, both are random effects. So you can think of this model as an ANOVA structure or as an intermixed effects model. So the third one is we tweak the session-- I misspoke, too. The i is actually the session. So the session effect is a fixed effect, but the subject-- lambda j is the subject. So in that case, we have a one way random effects ANOVA or little mixed effects model. So in this case, we can also calculate the ICC intraclass correlation. So the formula is slightly different. Unlike in the previous case, at the bottom, the denominator will have three variances. But ICC 3 will only have two, because their [INAUDIBLE] effect is a fixed effect.
So that's how much I want to say. There are different ways, we published a paper last year to describe different ways to calculate it. But right now if you are interested in calculating ICC, right now, 3D LME. That's a little mixed effects model approach. There is an option called ICC. That option automatically-- if you evoke that option, you will get the ICC 2,1. So in the future, I will probably write another program called 3D ICC, which allows you to calculate the other three types and plus a few other options. So that's how much I want to say about intraclass correlation.
So the last part, I think, about [INAUDIBLE] I want to talk about is called inter-subject correlation. Probably some of you have heard of naturalistic scanning, right? Anybody doing naturalistic scanning? No? The typical FMRI, you have a task. Right? You have to ask the subject to do something in the scanner. That's one scenario. Another scenario is the other extreme-- the subject will do nothing. That's called resting state.
So about 10 years ago, people started to think about something in between. That's because the tasks the subject performed in the scanner is too artificial. Like you push a button, or you view some emotion images. Those are little but like-- or building or face-- it's a little bit remote from what we experience in real life. It's too artificial.
So people argue, maybe I should try something closer to real life experience. So in other case, the typical thing people do is watch a movie, because that's-- the subject in the scanner watches a movie. It's some events, right? So it's close to real life. That's one possibility. People have also done music listening in the scanner, or political speech, or some speech in different languages. So people have done things like that. So that's called naturalistic scanning. One extreme case-- in Germany, people have collected data with the subjects watching Forrest Gump. That's almost three hours. That's a lot of data. They put the data online, actually, available. I don't believe it's the whole brain, because for some reason, they just-- part of the brain. It's probably high resolution data.
Anyway, so I'm going to briefly talk a little bit about this, since nobody seems--
AUDIENCE: [INAUDIBLE].
PRESENTER: OK. But you may do something different. But let's say what we can offer here. This is just the overview. So usually it's movie watching, music listening or speech, or play games sometimes. So the classic paper is this one, was in 2004. So that's really over 10 years ago. So he started this concept of inter-subject correlation. So from a modeling perspective, some people [INAUDIBLE] use the [? semantics. ?] They annotate the movie, for example, or then they'll try to sort of similar to task related.
But here, this is more general. It's also a simpler case. That you just have the time series. That's why it's called the inter-subject correlation. Because the subjects are pretty much synchronized. They are doing the exactly the same thing, but regardless of the content. So they just calculated the correlation between any pair of subjects.
So suppose you have five subjects. So A, B, C, D, E. Then you just calculate between subject A an B, subject A and C, subject A and D, and so on and so forth. So five subjects. You end up. Then you have, what, 10 combinations, right? So that's 10 possibilities. So they calculate the correlation. That's why it's called the inter-subject correlation. So five subjects. You end up with 10 pairs. Then what are you going to do at a group level? So that's the question, and that's the challenge as well. So that's what I'm going to talk about. So here I list the complexity of this case. So three subjects. You end up with three pairs, right? So four subjects with a 6, 5, 10. Then you keep going-- that number goes up pretty quickly. So any, subjects you end up with n times n minus 1 divided by 2, inter-subject correlations.
Well, the challenge is that the group label-- what are you going to do? The problem is, n subjects, you end up with n times n minus 1 divided by 2. I mean it's-- what are you going to do? Do a t test? But the problem is those n times n minus 1 divided by 2-- those pairs are not independent of each other. Some of them are, some of them are not. So what are you going to do? That's the challenge. So why it's a challenge? So if you look at just the one group of subjects on that basics-- so n subject-- so the S1, S2, all the way to Sn-- that's the subject, right? Subject labels. So the n times n minus 1 is basically half of the matrix excluding the diagonals. So you can either focus on the lower part or the upper part because it's a metric, right? Correlation matrix.
Suppose we're focused on the lower part-- the lower triangular part. That's the n times n minus 1 divided by 2. So they are correlation values. Pearson correlation, because we calculate the correlation between the same voxel in my brain and the same voxel in your brain. Then you have the Pearson correlation. So in [INAUDIBLE] analysis, we leave the Pearson correlation value is bounded between minus 1 to 1. The way we handle that-- we Fisher transform it to z-score. That's just for the convenience to make our assumption Gaussian distribution. So on the right hand side, that's why we have that z matrix. But again, we'll leave the focus on the lower part.
So what are we going to do? That's one group. With two groups, we'll end up with have this structure-- the color structure. So even though it's one matrix, but if we combine the two groups, we'll end up with partitioning those two triangles here. This is the first group. This is the second group. We can compare the two groups, or this could be the inter-group-- I mean between group-- I mean cross-group interaction or similarity. So you can either compare those two groups directly, or you can compare the group versus the cross group similarity. So the different things we can talk about.
So what's the challenge? Exactly what's the challenge? Suppose we have this case-- we have five subjects. With five subjects, we'll have 10 pairs. Right? So n times n minus 1 divided by 2-- that's 10. So we pull in those 10 numbers, those z-scores. 10 numbers here and 10 numbers here. So you can see this is the correlation structure for those 10 z-scores. Why do we have that structure? Well, diagonals are 1s, right? Nothing special about that. The 0s. For example, here, this-- we assume it's not a 0. Why is that one not 0? Look at the x and y direction. Horizontally, that's Z 4, 1. That means that's the z-score between subject 4 and subject 1. Vertically, that's between subject Z 5, 1-- that means that's the z-score for subject 5 and 1. Those two z-scores share a same subject-- subject 1. So because they share that subject, it's reasonable to assume those two z-scores are correlated.
So if you look-- same thing for this one, because these two z-scores share the same subject-- subject 4, right? And some of the 0s-- because those two z-scores, they don't share any same subject. That's why they're 0s. So that's why this is difficult to handle when you go to the group level. Because of this structure. I mean, what kind of a structure is this? Some of them are now 0s, some of them are 0s. Can you see some pattern in this? Well, maybe there is a pattern. The pattern is just so hard to capture. So as far as I know, there's no way easy way to capture that structure.
So what have people done so far? Historically, in the early days, people have simply just [AUDIO OUT] And even though definitely there's a correlation structure there, they just assume-- either they adjust the degrees of freedom, or they say the one simulation is sort of like a t-test. So that's one approach. Then recently, there's a Matlab toolbox called the ISC toolbox-- the Inter-Subject Correlation toolbox. But the other one is very popular. The other one-- how do they do that to handle this? They deal not at the group level. Instead they randomize the time series. So of course, first they calculate the real correlation matrix [AUDIO OUT] voxel. So that's the real z-scores. Then they use the concept similar to the permutation tests, for example. They randomize the time series. So they randomly change the order of the temporal sequence of each voxel's data.
So randomize this, then calculate the matrix. So they're doing that many, many times. For example, 5,000 times. So then they use that as a law of distribution. So then compare the [INAUDIBLE] z-score versus the [INAUDIBLE] distribution. So then they come up with the p value. That sounds reasonable, right? You randomize. You randomly shuffle the data they use [INAUDIBLE] distribution. That's pretty much like, nowadays, group analysis. For example, if I say, oh, they have randomized. It depends on the situation, but they shuffle at the group level. This case, they shuffle the time series at the individual sublevel. Then they use the idea as a [INAUDIBLE] distribution. That sounds reasonable an opinion, but how that well-behaved-- nobody before has thought to check whether that does control the false positive rate. [INAUDIBLE]. Nobody has done that. So that's the one available solution.
Another solution called the leave one out. That's a little bit earlier version, actually. It's powered by the same group. The idea is that suppose you have 10 subjects. You calculate the inter-subject correlation with 10 subjects. You have 45. So they do that. What do they mean by leave one out? So each time, they take one subject out, then calculate the average time series of the other line subjects, then calculate the correlation between that one subject and the average of the rest. So they do that for each subject, right? So you have 10 subjects. First they take the first subject out, [AUDIO OUT] the other line, then you calculate the correlation value. So do the same thing for the next subject. So in the end, 10 subjects end up with 10 inter-subject correlation. But that inter-subject correlation is one subject here versus the rest of the group. So that's called leave one out.
In the end, you end up with 10. It's not inter-subject anymore, but it's 10-- some correlation values. Then they go to group level-- they just do the typical t-test. So that's the alternative approach called the leave one out. But in reality, all those methods don't perform well, and we published two papers about this. The first paper will demonstrate the previous methods will not work well. So the first paper-- basically we also adopted an nonparametric approach. But I'm not going to talk about the details. Because it works well, but the nonparametric approach is limited in terms of-- I mean, its computational cost is a little bit high, and also it's not flexible in terms of a modeling. I you want to have a covariate, or if it was something else, then it's just not flexible enough.
So our first paper basically focused on either to put [INAUDIBLE] in, or do a permutation test. But instead of doing that at the individual subject level, shuffling the time series instead, we shuffled among the subjects. We calculated the correlation matrix first. That's it. Then just shuffle at the subject level. As long as we do that, then we did subject permutation or [INAUDIBLE]. Depends on the specific scenario. We performed the simulations, showed that the false positive rates [? accountability ?] is pretty good, unlike the traditional approach. So that's the nonparametric approach.
So the second the paper we published in twos 2016. This we switch to a parametric approach. The parametric approach is nice. I created [AUDIO OUT] wise the data [AUDIO OUT] So for example, on the left hand side, it's the inter-subject correlation between subject i and subject G. That's a z-score, right? On the right hand side we do this. First we have the intercept, which is basically the group average correlation. I mean, it's a z-score of course. And that's the population average. Then we have two extra terms. Those two extra terms-- basically it shows the contributions from the two subjects-- subject i and subject G. Then we have the last term is the residuals. So once you realize you do this decomposition, then suddenly this is just a linear mixed-effects model. So we have the fixed effects with the intercept, plus two random effects. Those two random effects are the contributions from those two subjects. Once you realize it, that's pretty intuitive.
So once we have this model, then we'll make a reasonable assumption that random effects-- we just assume that Gaussian distribution with the same-- of course from the same population with some variance. Then we have this model. Then pretty nice. Then also some nice property as well. Remember I showed you the correlation matrix. Some of them are correlated and some of them are not. We can exactly capture that correlation and we show the bond between 0 to 0.5. You cannot go beyond 0.5 actually. So that's a nice property and a mathematical proof to show why there is a [AUDIO OUT]. This exactly-- I mean, you can calculate here why it's 0.5. Because only when this is lower than 0 then you get 0.5. So it's very nice.
So with that model, basically it's very convenient. So either you have one group or two groups, or even when you have covariate. Basically the linear mixed-effects model is more adaptive. And we also did simulations to show their false positive rate control is also pretty good. So that's the inter-subject correlation part. In case anybody is interested in this, we're probably planning to-- I promised this a while ago-- to have a program to do this kind of a group analysis.
So here's a basic example. We have two groups. This is the exact same data we used in the second paper. So we have totally 48 subjects. 24 males and 24 females. The subjects watched in the scanner six movie clips. Totally there are 400-- maybe over 400-- data points connected. So on the first image, that shows the nonparametric approach. It's either bootstrapping or permutation test. Then the second one is the linear mixed-effects modeling. I just showed you that. So the two results are pretty much similar.
But the parametric approach is nice. They also give you extra information. So I can also calculate the [INAUDIBLE]-- the third image. That's that shows how correlated [AUDIO OUT] among those pairs. Remember the [INAUDIBLE] is the correlation in that big matrix. So that's the third one. In addition to that, I can also show the variance, the [INAUDIBLE] and the subject, cross-subject variability. That's the fourth image. So it's pretty nice, and that's why the parametric approach is usually better than nonparametric, not just for inter-subject correlation-- even for typical [INAUDIBLE]. Even though they claim, nonparametric approach is, in terms of a controlled [INAUDIBLE]. But in terms of modeling, it's definitely much less flexible and less adaptive.
I think that's pretty much what I want to talk about the inter-subject correlation. So people [AUDIO OUT] probably using this can be extended to other modalities, like EEG or MEG, but it's not as popular as FMRI. Any questions about intraclass correlation or inter-subject correlation?
To talk about now, switch a little bit different. I mentioned at the beginning of [? covariances ?] that the typical FMRI data [INAUDIBLE] at the group level, we do whole brain, right? That approach-- [AUDIO OUT] something need be challenging. For this part, I'll talk about the motivations. Why I wanted to do RI based [? covariances ?] instead of the whole brain. Ideally, theoretically, I could apply the approach to the [AUDIO OUT]. But that's right now it's not feasible. But we will see why. But to keep focus, let's just talk about RI-based. So that means we'll have a bunch of RIs.
So my motivation is-- a little bit of background is about the cluster stuff. There's a controversy about the false positive rate [? accountability ?] is a little bit not as strict as desired. So there's a famous paper about the [? pigs ?] in the news, right? So that's one. That's [AUDIO OUT] motivated me to start to think about this approach. Because that penalty usually is very high-- very severe. The cluster stuff. People sometimes really have trouble having clusters to pass that threshold. So that's one major motivation.
Of course, there are a couple of other scenarios, like people-- so-called graph theory. So we'll have-- for example, within state data you calculate the correlation matrix among some number of RIs, like 200 RIs. You have the correlation matrix from each subject. How do we handle [? covariances? ?] So that's another motivation for me. And the same idea in the DTI-- you have a way to make a connection. You have a matrix. A big matrix, too. So you have gray matter regions. Then you look at the white matter-- the bundles. Then see if they are connected; that you have-- probably you have that matrix as well. Then I would go to group level. How to handle that? That's another challenge. Of course we can also think about the inter-subject correlation I was talking about at the RI level, but that's something later.
So those are the background issues in my mind when one's thinking about how to handle this. How can we do better than the [INAUDIBLE] analysis at the voxel level? So that's basically the layout of this small session about RI-based on analyses. So the motivation is that paper-- 2016. The Eklund paper. So that criticized different software packages' trouble to keep the false positive rate at a nominal level. So my viewpoint [AUDIO OUT] ideal. I mean, if you really trust that approach, yes. That argument does hold well. The criticism is valid. However, we look at it differently, than we may have a different perspective. I will say that a little bit later on.
So before we talk about that, let's start with, why do we care about THE false positive rate? Or family-wise error, or type-wise error, or whatever. So the problem is, when with the traditional statistics, we always started with the so-called null hypothesis. So for example, in the brain, it started with in this region, nothing going on. No activation. That's the null hypothesis. Whenever you do statistics, you have that null hypothesis. So that's your starting point. So you pretend nothing going on.
Then you basically put up a straw man, then you start to attack that [AUDIO OUT]. So how do you attack? You're calculating, for example, typical way, you have this t statistic. OK. Now I have a t statistic. The t statistic is large, 3.7. OK. Now the p value is small, 0.001. Now we have doubt about the original null hypothesis. We can see the original null hypothesis is unreasonable. We checked the null hypothesis.
We take the alternative hypothesis, which is, now we can see the description is at this region, it's statistically significant. People usually don't mention that word-- statistically significant. That is statistics. Either [INAUDIBLE] People usually forget to attach, to describe the significance. Well, usually when I read a paper, I always attach to that. The reason is that simply because the p value. It's not about the effect itself-- whether that effect is significant or not. It's not. The effect itself can be big or can be small. So when this is significant, it only means it's because of the statistical value itself. It has nothing to do with the effect. The effect can be big, it can be model, it can be small. So that's why implicitly they simply mean the statistical significance. It's not about the effect is significance.
But more problematic is that in the end, people make a decision. That decision is binary. It's dichotomous. Either you have that you can make it a claim, or then if we just feel-- that t value is small, it fails to survive the thresholding, then what? Does it mean that region was not activated or no? Well, people have to make such a decision-- dichotomous decision. People's mind is simply [INAUDIBLE] there is a low activation. But in reality, that region can't be activated. Simply there are several reasons it fails to reach the significance. Can be the effect itself is too small-- we'll need a bigger sample size. That's one possibility, right? So the other [INAUDIBLE] is-- I mean, in the brain, there could be other scenarios. Alignment is not perfect, or just data is just too noisy.
So that's the background. So how can we avoid that dichotomous decision, and also, will it challenge the approach? That is the threshold approach. The p value. And also the strategy of assuming that pretending nothing's going on in the brain. Then we do something. So that's something we'll need to about. So instead of having that straw man, that is witch hunting. So maybe we can do something different, because the traditional approach is always you start by the so-called conditional probability. You pretend like nothing's going on. That's null hypothesis. So that's something-- it's caused a lot of illusion, or a misunderstanding, or misleading statements. Is that because people's mind-- maybe they think the p value is-- when we have my data, then that's the likelihood that you would get nothing going on in the brain. No, it's not. The real p value is you pretend nothing's going on-- what's the chance you'd get the correct data?
So those two things mathematically are the same, but people usually misinterpret the so-called p value. So whenever you say the p value, 0.05 or whatever the p value you have, it's a conditional probability, and that probability is pretend nothing going on, what's the chance you would get the current data? That's the meaning of the p value. So there are many things-- I mean, people have been criticizing this. It's called the NHST-- Null Hypothesis Significance Testing. Or this dichotomous decision.
Let's even put those criticisms aside-- even from a practical perspective is that applied to the typical FMRI analysis, in the end, when you make a decision about cluster, the current approach is how do we control the false positive? If it's just one voxel, life is easy, but for the whole brain, you have for example 200,000 voxels. The approach is the so-called cluster, right? The cluster stuff based on the spatial extent-- how big the cluster is or how small.
This is so-called the multiple testing problem. Some people call it multiple comparison. More accurately, it's more about multiple testing. It's the multiple comparisons. The subtle difference is that-- because for FMRI, we're using the same model, which is the testing is exactly the same. So it's 200,000 voxels-- we repeat the same model. The [INAUDIBLE] multiple comparisons is more like we need to ANOVA, for example, three conditions, versus [? bb ?] versus c [? inverse ?] c. That is a multiple comparison issue.
So for brain imaging, actually-- well, not [AUDIO OUT] for brain imaging, but that's a minor issue. If we really have multiple conditions. But what we are talking about here is not the [INAUDIBLE] multiple comparison. It's called the multiple testing, because the same model is repeated many times-- as many as the number of voxels. So that's why [INAUDIBLE] is called a multiple testing problem.
So [AUDIO OUT] approach is you would immediately the term [INAUDIBLE] jump into your mind But nobody does that. That's because the penalty is just too severe. Nobody wants to apply that approach. So to avoid that severe penalty, people do so-called clustered stuff. So clustered stuff-- there are a few approaches. First of all, there is [? SPMs ?] [INAUDIBLE] theory. That's pretty much using a similar concept. [INAUDIBLE] is using Monte Carlo simulations. So in that sense, you choose so-called [AUDIO OUT] threshold. For example, threshold of p 0.001. Whatever the t value is, but threshold based on p 0.001. So once you threshold that, you find some clusters. A bunch of clusters. Clusters can be some of them big, some of them are small.
Then you decide which cluster can pass some threshold in terms of the cluster size. How do we decide that cluster threshold? In [INAUDIBLE] for example, you want Monte Carlo simulations with noise. Not with your data-- with the noise. You control the so-called false positive of that. It means, well, really nothing going on, so that's why we use noise. They use the same brain structure and spatially correlate the data with noise. Then randomly generate to shuffle them, then say, OK, run it, for example, 5,000 times, then [AUDIO OUT] the clusters with different sizes. Then control how many of them passed, for example, control to 5%. That's how you come up with the cluster threshold-- the minimum cluster size. That's the [INAUDIBLE] approach. That's implemented in a program called 3D [AUDIO OUT]. Also the option in 3D [INAUDIBLE]. That's the exact same thing.
SPM, there's a [INAUDIBLE] theory. It's a different concept, but the idea is pretty much the same. It's just that in the end, you base it on the spatial structure, and find out there are some minimum cluster size. So in the end you come up with-- they survive the clusters. So that's the one approach. A second approach-- the first approach is a little bit troubling is that you have to start with a primary threshold. 0.01, for example. Sometimes all 0.05. 0.05, for example, or 0.02. But that choice is arbitrary. So I will talk about later on why that is arbitrary. I think we should take a break. Then come back-- continue the rest of this part.