Tutorial on Statistical Inference On Representational Geometries
Date Posted:
November 8, 2022
Date Recorded:
October 25, 2022
Speaker(s):
Heiko Schütt, NYU
All Captioned Videos Computational Tutorials
Description:
Representational similarity analysis (RSA) is a popular method for comparing representations when a mapping between them is not available. One important comparison RSA is used for is between neuronal measurements and models of brain computation like deep neural networks. RSA is a two step process, first a matrix of pairwise dissimilarities between conditions is computed. This matrix is then a summary of the representational geometry, which can be compared directly between different representations as it has the same dimensions. In the first half of this tutorial, I will go through some recent advancements for RSA that improve the reliability and statistical accuracy of RSA substantially: First, I will explain the reasoning for cross-validated distance measures for computing the dissimilarity matrix and for whitened similarity measures to compare them to each other. Then, I will explain why simultaneous generalization to new subjects and new stimuli is hard and a solution based on bootstrapping. And finally, I will explain necessary cross-validation based extensions for flexible models. In the second half of this tutorial, I will give a guide how to run these analyses using our new rsatoolbox in python by going through demo notebooks that illustrate the functionality.
Relevant papers:
Schütt et al., 2021: Statistical inference on representational geometries
Walther et al., 2016: Reliability of dissimilarity measures for multi-voxel pattern analysis
Diedrichsen et al., 2021: Comparing representational geometries using whitened unbiased-distance-matrix similarity
GitHub repository: https://github.com/rsagroup/rsatoolbox/
GitHub demo repository: https://github.com/rsagroup/rsatoolbox/tree/main/demos
MARIA FERNANDA DE LA TORRE: Thanks, everyone, for coming. And of course, thanks to our speaker today, Dr. Heiko Schütt, who is joining remotely today from New York, where he is currently a postdoctoral fellow at the Zuckerman Institute. Prior to this, he got his PhD from Tubingen in Germany. And his work is mostly focused on models and methods for vision research. And today, he'll be presenting a statistical inference on representational geometries, which should be relevant to a lot of the work that we are doing in this department.
So without further ado, thanks for agreeing to do this. And yeah, feel free to go ahead whenever you're ready.
HEIKO SCHÜTT: OK, yeah. Thanks for the kind introduction. And thank you for having me. Welcome to this tutorial. So, yeah. I'll be talking about the statistical inference on representational geometries, which is also the title of the most recent paper we wrote on this topic. So this is essentially our improvements for representational similarity analysis, which is the method for comparing representational geometries Niko's lab has been working on for quite a while. And so, yeah. Before I go into any more details, to get everyone on the same page, I want to start with what is this representational similarity analysis and what do we use that for.
So the typical problem we face when we want to use representational similarity analysis is that we have a bunch of different conditions or stimuli which we presented to brains and got some measurements. And we also have something that we want to compare to those things, like a model here, it's a caricature of a deep neural network. And essentially, we want to know which of our models is most similar to those brain measurements we got.
And crucially, we do not have a direct mapping between those two representations because if we had that, we could, of course, just transform one into the space of the other and do the comparison there and wouldn't need anything special. But this is often not the case, right? In most cases, we don't know how our voxels in the brain correspond to features and locations in our deep neural network model. And so for this case is when we use representational similarity analysis.
What is RSA? Well, it's a two-step process. First, the measurements on both sides, so for both for the models and for the data, are transformed into the dissimilarities between the different conditions or stimuli. So for each pair, we compute how dissimilar or distant they are in this space.
And once we've done that, we've got these RDMs, so representation dissimilarity matrices, and they are in a common reference frame, right? Because now the distance between conditions one and two here is-- should map directly to the distance in those-- and we've now got live transcriptions, apparently. Great. And that allows us to directly compare our models to the data. And then we can plot this. For example, this would be the different layers of a neural network. And for each of those, we get how similar it is to our data.
So as I said, this is a two-step process. First, we have computing the dissimilarities between the individual patterns, and this will often depend on how the actual data was acquired. And then there is the second step, where we compare these RDMs to each other where, we, again, have many choices.
And very much connected to the second part, we also want to have the actual statistical inference, so things like pairwise comparisons or comparisons against the noise ceiling or against zero, et cetera. So we want to not only get one number for each model, but crucially, we also want to get our uncertainty about this number. So this is the whole goal. And this is how-- a method how we can do this comparison.
So practically, when would we want to do this? The first criterion [INAUDIBLE] is that we want to have many measurements of features for each stimulus. If we have only a few, then again, we can very easily just look at those individual measures and don't need to-- don't need RSA. And the other part is that, of course, it only works if we care about similarity. If we instead wanted a map from our model to the data or the other way around, this is not the method we should be using.
And I listed here just some examples for when RSA could be applied. For example, the DNN layer comparison is actually something people have done quite a lot in recent years. You can, of course, do the same for handcrafted models. You can compare to behavior to find out where in the brain the dissimilarities match human-reported dissimilarities best, for example. And another classical one is to compare brains between different people or between different species, even, to-- the most classical one would be Niko's work from 2008 comparing monkey and human brains and finding that IT matches reasonably well between the two species.
OK. And now there is a relatively big slide on the theoretical reasons for using RSA, which I included here mostly because RSA originally was somewhat ad hoc in its conception. But by now, there have been quite a lot of investigations into what exactly RSA measures and why that might be a sensible thing to do. And so I wanted to mention this just to make you aware of-- that there is some theoretical background, although it will not be the focus of this tutorial, which is meant to be more practical.
So first, the thing that was already in the title is this representational geometry idea. So given the distances between the different conditions or stimuli, we get-- well, the distances right, which is important for generalization and similarity judgments, of course. That also gives you angles between those different patterns, so things like whether they are on a straight line or not. You could find out just from those distances. And of course, also the topology.
Another thing is if you actually match your dissimilarities to the statistical properties of your measured data, then also things like what kind of differences can you decode from the brain or encode in these things can be extracted from this information.
Another plus point is that it focuses completely on the observed conditions, which is the only space where you can actually make statements based on your data, right? So this is especially advantageous when you have really big dimensions where you might have lots of dimensions which you can't make any statements about from your data because you don't have any variation in your actual stimuli.
Then there is the point of what is it actually invariant to. And that's shifts and rotations. So we know exactly what kind of transformation you can do to this representation without changing the RDM. And that seems somewhat sensible. So you can-- especially permutations of the neurons don't change the RDM, and reweighting-- rotations also includes if you distribute the information you had in one neuron across a few others. You will also get the same RDM. So there is-- this is a sensible set of variances which we have well characterized.
And then there is a mathematical point that's really important for us in terms of measuring, is that distances are well preserved and are random predictions. So there is a Johnson-Lindenstrauss lemma, which says that the number of dimensions you need to get the distances right up to some error grows only logarithmically with the number of dimensions in the space. So the point is we have a good chance of us recovering the RDM to some accuracy with a number of measurements that is much smaller than the number of neurons we actually have in the brain. So this is an important idea so that we can actually hope to get a reasonable estimate of this RDM from our data.
And then there is a last point that is very strongly related to the kernel matrix, which is the basis for two other methods that have recently been used for a similar purpose. One is pattern component modeling, PCM, which comes from-- also from more fMRI-like statistics research, and center kernel alignment, which comes from the deep neural network literature for comparing those to each other.
And this last point I wanted to really drive home because those are recently used quite a lot. So here is the kernel matrix on the left and the representational dissimilarity matrix on the right, both for the same data, and you should immediately see that they contain a very similar type of information that the first four patterns here are somewhat clustered. They are more similar here in the kernel matrix and less similar-- less dissimilar according to the RDM and similar for the last six. So they are obviously quite related.
And actually, if you look at the formula-- so this is for the kernel matrix just for the linear kernel. And this is Euclidean distance between the patterns. Then you see that really, the distance is this very simple linear formula from the kernel matrix. And so the distances are completely recoverable from the kernel.
And now, if you are very fast with linear algebra, you might notice that here, diagonal is exactly 0 for the RDM always and it's not for the kernel matrix. So there is some information lost here, but this information can actually be completely recovered. So this part of information, if we do one little change for the RDM, namely that we add one more column, which is the comparison to the 0 pattern. So that's-- then if you set xj to 0, this just becomes exactly the entries from the diagonal and the kernel matrix, so these blue parts are exactly equivalent to the diagonal of the kernel matrix. In this way, you would have then a linear mapping between the two, which is invertible, which is about as closely the same thing as you could ever hope for in this kind of analysis.
And yeah, just because I forgot to mention this, there are a bunch of other names for the kernel matrix if you come from different fields. From physics, it would be the Gram matrix. In statistics, it would be the second moment matrix. It's been used much more often than the RDM in mathematics, and so it's important to see that those two are actually very related. And I wouldn't care that much which ones you'd use for the comparison section.
OK. So I hope that gave you some introduction on what RSA actually is and what-- when we would use it. And then in the rest of my tutorial here, I'll go through, first, the two steps of computing RDMs and comparing RDMs and what we found in more recent years what are helpful good choices for those steps to do.
And then I'll talk a bit longer about the actual statistical inference, where we recently, for the first time, actually tested that it is accurate and correct. And those two points are the points where we found deviations from that and fixed them. And I'll explain to you how-- where those problems arise and how to fix them. So that will be the outline of the talk. And then afterwards, we'll go through this in the toolbox, and I'll show you in some demo scripts how to actually do all of this.
Nonetheless, not only this later part can be interactive, so if you have any questions in between, please feel free to ask them and just speak up, because you are quite small, and so I won't see if you put up your hand or something like that.
AUDIENCE: I have some questions. Can you please elaborate a little bit on the link between RSA and the encoding models? You had a point in your theoretical slide.
HEIKO SCHÜTT: Mm-hmm. OK. Yes. So I can try to make that link quickly, essentially. So there are two links that-- to encoding models. One is through, actually, pattern component modeling and encoding models. There is actually a nice paper on this part from Jorn Diedrichsen with Niko Kreigeskorte on-- there is some equivalence for a regularized encoding model. So if you just put a 0 mean prior on your weights for the encoding model, the evidence for that becomes PCM, effectively.
And that is characterized by this matrix here. So the predicted correlations between all your patterns are the same when you use a regularized encoding model or whether you directly-- like, this is the second moment matrix of those predictions. And so this is-- one link is this, once you have the regularization. This is the evidence. So the one-- the prediction without fitting the models corresponds exactly to this.
And the other part is there is a-- like, this is-- I had it mostly for the-- just, yeah, this part here for encoding and decoding. This is actually easier to see, perhaps, for decoding. Even just if you have all the-- like, if your distances encode how confusable all pairs of stimuli are, then for-- you can always extract how well you can decode which stimulus it is from a given pattern. And you can also put a limit then by inverting this, effectively, on the encoding models as well for an-- and that's then for an arbitrary mapping to-- from stimulus space to this subspace band by the stimuli. Therefore, you can recreate that. Did that make sense?
AUDIENCE: Yes, thank you.
HEIKO SCHÜTT: Great.
AUDIENCE: There's a-- [INAUDIBLE] followup question, I think.
AUDIENCE: So is there any work on how does the data that you look at affect the similarity? So basically, how much data is enough to accurately characterize the surface? And similarly-- maybe I'll give you an example. What if you're trying to work on the visual cortex, and what you have is more responsive [INAUDIBLE] more responsive? Like, your data is way off and you've got [INAUDIBLE]?
AUDIENCE: Let me know [INAUDIBLE] repeat the question, if you can't hear us.
HEIKO SCHÜTT: So the first part, I think I got. The second part is-- I didn't fully get, unfortunately. But let me first start answering the first one, and then we can repeat the second one, I think. So the first one was just on how much data you need to accurately estimate those things. We have effectively some direct formulas to transform measurement error on the original patterns into uncertainty about the dissimilarity estimates, at least for like squared Euclidean distances and stuff like this, which we will see a few of actually during this tutorial.
So we can directly transform this. The reason why there is no general recommendation for this is always, these things will scale with the size of the noise versus the signal-- like, the signal-to-noise ratio in your original data. And that is vastly different for different measurement methods and different cortical areas, et cetera. So always your estimates for how many measurements you need will scale with how good or bad your original signal is.
And so effectively, we can give you the scaling loss. Like, if your signal to noise ratio is that much worse, how much more data would you have to collect to get equally good dissimilarity estimates? But an absolute cutoff is probably not that sensible, just because the signal-to-noise ratios vary that much.
OK. Well, then, let's jump in with step one. How do we compute RDMs, right? So these observations come mostly from these two papers by Hamed Nilli and Alex Walther, who just ran simulations, mostly, and tried to find out what kind of dissimilarity measures yield the most reliable results in the inference. And generally, continuous measures are more reliable than any kind of accuracies or things like that. So if we do some kind of decoder, we would use this, the log likelihood ratios or t-values on the discriminants and things like that over things like accuracies.
AUDIENCE: What do you mean by continuous measure?
HEIKO SCHÜTT: I mean-- this refers mostly to the-- if you're looking at things like a decoder between two conditions, that will often give you a continuous output value how likely each of the-- how far-- how well discriminanible those two conditions are, and that's the value you want to use, rather than things that are thresholded to give you how many of the actual condition-- the actual tests you ran were discriminated or not. So continuous in the sense of really, it's a real variable, not three out of five correctly classified. That's all for continuous. Did that answer the question?
AUDIENCE: Yeah, thank you.
HEIKO SCHÜTT: Great. And then the general recommendation is to follow the noise shape for what kind of distance you use so. For fMRI-- what that concretely means for fMRI is that we often assume that it's Gaussian noise, just on our data. And then a sensible measure of distance is Mahalanobis or Euclidean distance and [INAUDIBLE] squared or not squared of this.
If we were instead looking at spikes, for example, then often the assumption is we have somewhat Poisson-like noise, where the variance grows with the signal. And we can use a symmetrized KL divergence between Poisson distributions, for example, or alternatively, transform the data with a variance equalizing transform that would be a square root. And then model it basically as a Gaussian and consequentially-- consequently used the Mahalanobis distance for those data as well. So this is the guiding principle, is if we want to interpret it in terms of the statistical properties, we use should use the matching thing for these kind of distributions that we assume.
And then the last point on this slide is just the correlation distance has been used a lot, I assume mostly because Niko used it in his very early work. But there is not really a corresponding noise to the correlation distance. So there's not a really good theoretical reason to use that particular one, actually.
And now--
AUDIENCE: [INAUDIBLE] for models [INAUDIBLE] computing an RDM for the brain and then RDM for the model, right?
HEIKO SCHÜTT: Yes.
AUDIENCE: We have an assumption about [INAUDIBLE]? What do we do there?
HEIKO SCHÜTT: Yeah. In principle, it's like you could use some noise assumptions in your model, too. Almost always people just use Euclidean distances and just don't make any assumptions about correlations there. Then the only point to take in mind is to match it to what you're doing in the model. So if you're using square Euclidean distances in the model, then you should probably use the squared Euclidean or Mahalanobis distance in the data and the unsquared one, then do it on both sides. Because otherwise, the scaling will be off either way. But that's-- it mostly is just Euclidean distances in the model space.
Yeah. I mean, maybe one thing I should say is in model space, you would have equal-- you could have similar considerations, though. If you're saying, for example, the actual scalings of my features are meaningless anyway, you could do things like normalizing your features first or using a cosine similarity instead of a Euclidean distance or something like that. And that can be sensible. But it should, again, then match what your assumptions are about what about your model features, then, is sensible and meaningful and what is not. Does that make sense?
AUDIENCE: [INAUDIBLE] I guess a followup on it-- one other strategy that I think some people use is using some sort of transfer function to map the model activation to somewhat a similar measurement of, let's say, the brain, and then use that as the basis for comparison, like an fMRI-like signal from the model. Then you [INAUDIBLE].
HEIKO SCHÜTT: Yes.
AUDIENCE: [INAUDIBLE]
HEIKO SCHÜTT: Yes. And we-- that is actually one of the main reasons why people have then flexible models, which we'll talk about much later in this tutorial. But yeah, this is-- this would be one common way of giving your model more flexibility, effectively, to match the data, is to have some mapping that first transforms your model data in some way, which you assume is similar to the distortions your measurement, like your fMRI, applies to the original neural firing rates.
That would be the idea that you want to match this in your model, and then you'll have some transformation. If you do that, then definitely I would just recommend to use the same measure of the similarity in this output space as you do in your data, right?
OK. Great. So as I want to say, there are two more things that recently came-- I mean, a couple of years ago by now, came to the front as important for calculating those RDMs. The first one is taking covariances into account. So what is shown here is assumed there at the star that would be a-- the true average response pattern, just in 2D here. And then we have some noise distribution around this.
And then the point is although these red and blue stars are equally far away from the back one in this 2D space, we would probably want to say that the blue star is closer to the black one for essentially two reasons. One is to improve the reliability of the distance estimates. So a dimension, like this one, where the spread is very large. If we down-weigh those, we will get a more reliable number out of our dissimilarity calculation. And the other thing is to improve statistical interpretation, right?
So if we want to interpret dissimilarity or something like how easily could I confuse those two patterns? How often would they be misclassified? Then, of course, the black and blue one are more similar to each other than the black and the red one.
So how do we do this? This is the formula, actually, for what is called Mahalanobis distance. So effectively, here this x minus-- xj-- i minus xj transpose times xi minus xj, that would be the normal Euclidean distance, this inner product. And we essentially just scale everything here in the middle with the precision matrix, so the inverse of the covariance matrix to the noise.
And this can be rewritten by a transformation, this minus one half here, so this matrix square root of inverse for-- just to the individual patterns. And this way of applying this is called prewhitening or multivariate noise normalization. So this would be the method how we can take into account this covariance structure.
And this general thing here also implicates the univariate noise normalization. So that's the same thing, just with a diagonal matrix here, which then just scales individual measurement channels up and down, corresponding to their variance of the noise data. So this is already covered. And then it's really just dividing by the square root of the variance, so dividing by the standard deviation.
Whichever one we want to use here, we need, of course, some estimate of the noise covariance. So where do we get this from? There are essentially two ways. One, which is common-- the first one is common for fMRI, where you have first solve a big GLM to estimate your pattern activations. And then you can use the residuals of this GLM to estimate how strongly the different voxels are correlated to each other.
Or alternatively, if you just have many measurements for each of your individual patterns or conditions here, then you can also use the deviations of the mean of these measurements to get your noise covariance, which is theoretically slightly closer to what you would actually want. But both of those are valid estimates of the noise covariance you can use.
And last but not least, there is an important part in this procedure which is called shrinkage. So if you just take the noise covariance from this, usually you will be very far off from the true covariance matrix. This is due to just limits on the data. This is on number of stimuli by number of stimuli, number of voxels by number of voxels kind of matrix here, and usually, we just don't have enough data to estimate this properly.
So these shrinkage estimates produce a far more stable estimate of this covariance, which has a smaller error, and very importantly, always yields invertible matrix for-- which is really important because that's really what we are plugging into our formulas here on the left. So this is an important step. It's essentially just a few formulas to transform your estimate, but this makes this-- turns this from almost completely random numbers to something sensible. So please keep that in mind.
OK. The second part for estimating RDMs I wanted to mention is that there is a distance estimation bias. And this comes from just the fact that distances are always positive. So if we, for example, just take this one black star here just with a completely isotropic noise around it and draw two samples from it, they will have some distance to each other. So even if the two underlying patterns are completely the same, if I add noise to both, they will then have some positive distance, right? And that will mean that always noisy points of this will yield a positively biased estimate for the distance between the two points. And of course, larger noise yields larger bias.
We can even write down formulas for this. Effectively, the bias here in the squared Euclidean distance scales just exactly with the variance of this-- of the original distance estimate here.
AUDIENCE: Would you clarify why there's always a positive bias for distance? Because if we have noise, it can [INAUDIBLE] two points [INAUDIBLE].
HEIKO SCHÜTT: Yeah. Yeah, in principle, they can be closer. Effectively, what you get is if you assume-- yeah, let's assume these two points are actually the two points. Then along this one dimension here where they are different, this one direction, you will actually get an unbiased part effectively, right? Like, they're sometimes further apart, sometimes closer together. But along all other dimensions, you just randomly displacing them will just increase the distance.
And it's just-- yeah, it turns out that it's exactly this formula. After relatively boring math, you can just calculate what this bias is. It's easiest to see, I think, really, for zero distance things. Because then it's clear that the true distance is zero. And there are definitely cases that happen due to the noise that produce a positive distance. And then the average of all of those will also be a positive number. And so at least for zero distance things, it should be immediately clear that it's truly-- there must be a positive bias for all of those estimates.
And for the other ones, it just turns out for these squared Euclidean distances, it's actually always-- like, it's a constant offset, so the true distance doesn't matter for the amount of bias you get. But that's just-- that is-- you see by formally solving these equations, I think, is the way to see that.
AUDIENCE: [INAUDIBLE]
HEIKO SCHÜTT: OK? And this estimation bias could be reasonably fine if it was the same for all the distances. Then if we just have an offset in our RDM, and if we correlate them later that might actually not be that problematic, but it's definitely really problematic if the bias is different for different distances.
And when does that happen? Well, first, if we have different variances for different conditions, that can happen quite-- under many circumstances. Then it would be super problematic because these conditions would then be-- appear more distant from all the others than they actually are. And of course, if there are covariance between conditions. This mostly happens due to being temporarily close. So if you have measurements that were always taken in the same order for some reason, then things that are closer in time will often be more correlated and thus less distant, less dissimilar, from each other than other conditions.
So how do we solve this? The typical answer we give is to use cross validation. So here, the idea is if we have multiple measurements for each pattern, so just for the red and the blue here, then we can-- instead of multiplying the same distance here, we can multiply the vectors of one pair of measurements with the vector for another pair of measurements. And as long as these two measurements for the patterns are independent, which you can often enforce by just measuring them sufficiently far apart in time or in separate runs for fMRI, then your distance estimate for this will actually be unbiased.
So how does this work? Here it is in formulas. So instead of-- here, this would be the just taking all the pairs of dissimilarities and multiplying them together, you now here just exclude the ones that come from the same run. And then you get an unbiased estimator. As I said, this works to remove the bias if the noise is just-- is zero mean, which is always true in our assumptions and uncorrelated between those runs.
OK. And then it's just a fancy name. We can call this crossnobis for the cross-validated Mahalanobis distance. And of course, this works also without the noise normalization. So it could be also across Euclidean distance, basically, where you skip the part about noise normalization.
Yeah. For people working with spiking data, it works equally well for the symmetrized scale. There's also a cross-validated version of that. And then there's the last point, which we are often asked about, is these things can now create negative distance estimates. So once we multiply the difference in one run times the difference in another run, we can get negative values out of this. And unfortunately, this is inevitable for an unbiased estimator. Again, when looking at things that have a true distance of zero, once we can get positive estimates, sometimes there must be some negative estimates that also happen to balance this estimate.
This turns out to be actually unproblematic for all the steps we further do with these distances, but it has been-- people often stumble over this, so this is normal and happens for these things. And once you correct it, want to correct it back such that it is a valid distance matrix again, you need to invest some more corrections to these kind of estimates.
OK. So conclusions for computing RDMs. Generally, the ones that we understand best are squared Euclidean distances with these noise normalizations, so Mahalanobis distances. If you want to be a bit more general, the thing to aim for is to match the error distribution in your data. And the Euclidean Mahalanobis are the ones that match the Gaussian distributions.
And then the two points you-- using taking the noise covariance into account is a good idea once you shrink your estimate for that. And most of the sample estimates are biased. And we can make them unbiased by using cross-validation. OK.
And so before I come to comparing RDMs, are there any more questions on this construction of the items? That doesn't sound like there is, so we can continue to comparing RDMs. OK.
So how should we-- yeah, someone? I head someone? But anyway, how should we compare RDMs? Well, first of all, we have a matrix which is symmetric, so we can reduce ourselves to just looking at this upward triangular part of the matrix and make it a vector. And now anything that compares either vectors or these matrices would work in principle, right?
However, there are a few things to observe. First, we should probably ignore scale because we always almost never know that, because again, it scales with the noise things, so even the units of percent change or things like that in brain data would not be predicted by any models, so they should-- just scaling the whole RDM up and down shouldn't change our evaluation probably.
With that restriction, there are still many methods that have been used. Correlations are very popular. The cosine similarity is slightly more modern, I guess, just taking the dot product of the normalized vectors. This has the advantage that the distance to the zero point is taken into account. Then rank correlations, very similar to correlations, and recently, some distances on the manifold of matrices have been proposed as well.
But of course, the question is what works best and what should we do, actually. So, yeah. There seems to be some reverb or something I sometimes get.
Anyway, the problem we observed in this recent paper is that there are, again, covariances between the entries of an RDM. So in a very similar picture as we had in the construction for the RDMs, the entries of an RDM are correlated with each other, so these error ellipses are not round. And so then if we had some-- here it's illustrated the other way around. So if we have two models here and a data set, it might be-- the data set might be closer to this one, but actually more likely under m1. And the correlations ignore this fact. So if you just calculate normal correlations between the RDM entries, they essentially assume independence, and this is wrong and makes the inference weaker.
So very similar as in the first step, a similar solution for this is to look at what is this covariance and then correct the measures for it. And in this paper, we just solved this actually for this squared Euclidean and related distances and can thus correct it.
So how does this covariance actually look like? What is the covariance? So this is now a covariance matrix between the entries of an RDM. So this might be relatively small for you to read, but the first line of this would be the distance between the first and the second condition. And the next one is the first and third condition, first and fourth condition, et cetera. And then each-- any pair of two distances is correlated to each other if they share a condition. So for example, here, the distance from stimulus one to stimulus two is correlated from-- to all the ones that include stimulus one and all the ones that include stimulus two, and then the 0 to the rest.
And so you can go through all the lines, and that's-- those are exactly the points where you see some correlations. This pattern is perhaps even easier to see when you have a larger number of conditions. Then, still, those would be all the dissimilarities that contain the first stimulus, and they are all correlated to each other. And then the stimulus pair 1, 2 is correlated to everyone in the block one and the ones in block two. And then correspondingly, 1, 3 is correlated to all of those, and then the ones in block three, and so on and so forth. So this gives you the whole covariance matrix.
And nicely enough, in contrast to the part about estimating dissimilarities, this is not an estimate, but we can just compute what it is even for-- even if all the original estimates were uncorrelated to each other, right? So we don't need to estimate this anymore. And once you have different variances, it looks a bit more complicated. OK.
Those are the formulas. And I don't want you to go into too much depth for explaining them. But one important part is they have two parts. One, this right one here, depends on the delta, so on the differences between the patterns, actually, while the first part doesn't. So there is an overall part that just comes from the noise. And then there is a part that depends both on the noise size and on the actual difference between the two parts.
And this is true whether or not we are using cross-validation in our estimates. And in fact, maybe as a hub for understanding the cross-validation, they are extremely similar. It's just that they scale slightly differently in the amount of this noise. So the ones with cross-validation are slightly more noisy, which makes sense, as we are excluding some of the pattern differences from our calculation.
OK. And once we have these covariances-- yeah, I just said that, we can then create what we call the whitened similarity measures here. For that, we actually ignore this signal dependent part because that would mean that we need a different covariance matrix for any pair of distances, whatever the original, underlying one is. And then we can take the cosine similarity and Pearson correlation. So if you have not seen them before, those are the formulas for cosine similarity and correlation in vectorized form, right? And we can turn them into whitened versions, simply again by plugging into all the scalar products the inverse covariance matrix, so the inverse of the matrix I just showed you.
And that creates those whitened versions. And of course, we tested that they are actually better. Before I show this, this one, the whitened thing here, is actually exactly equivalent to center kernel alignment if we assume the iid noise over the stimuli. So for anyone who liked this one before, that one is exactly the same as that.
And if we look at this now, here, the dashed lines are always the whitened measures, and the continuous lines are the ones without whitening. And you have here plotted how often we select the correct model, so the correct model had the higher evaluation plotted against how much noise we had. Yeah. So we generally see that these whitened measures yield to a higher proportion of correct choices in the model selection. And that is true independent of-- it actually gets slightly more so the more patterns we have in our data.
So conclusions for RDM comparisons. Distances are correlated if they share a condition, and we can compute exactly how this covariance looks like, and then can use whitening. And that just yields general purpose slightly better measures of RDM similarity. And if you want to read more on that, read in this paper.
And now for the last part on the uncertainty and tests. We will run slightly longer, I fear, for the talk. Anyway, so our question is, how do we get error bars on our model evaluations, and how can we then use this for tests?
There are generally two big approaches that have been used for this step. The first one is to just use tests across subjects, so once we have more than one person or one animal measured for our things, we can then use the variance across subjects for testing. And then you can use simple T-tests or rank sum tests or things like that to do your comparisons.
And that's completely valid, but the thing you're testing for is whether this will generalize to new subjects, and there is no generalization to new stimuli, although we might often want this. And the other approach is to use bootstrapping, or in general resampling methods, where we can actually control how far we want to generalize. So obviously, this will be the part I will be talking about most because we do-- that's where we made progress towards better things.
So to give you an explanation of this, I first want to go through how we actually validated the inference very quickly because I'll show you intermittently some results of these evaluations to show the points where you should know where those numbers come from.
So to test generalization, we need some population of stimuli and of subjects. And so what we did to do this is to run a DNN-based simulation, where we would then use randomness in the chosen stimuli to represent. We just randomly choose images. That's our randomness across stimuli. And for subjects, we change the readout from the deep neural network.
So concretely, we choose random images from Echo set as our stimuli, use AlexNet as a deep neural network, and then the randomness across subjects is just randomly placing the voxels with a random weighting across the feature dimensions as well. And with those two randomness parts, we can then simulate data. We do so by going through effectively an fMRI simulation scheme where we first get the true activations for each channel, each stimulus, and each of our subjects, so for each random placement of the voxels, and then run through creating a temporal sequence and estimating back activations with the GLM and then concretely creating our dissimilarities. So this is similar to what we think might be the noise in well measured fMRI data.
AUDIENCE: How did you create [INAUDIBLE]?
HEIKO SCHÜTT: So the subjects in this case were always a new random sample of voxels, in this case. So we just randomly sample where those voxels were placed and how each Voxel weights the features. And then a different subject is just a new random draw of those voxels. That's all the individual variability we have in here.
AUDIENCE: So I'm not sure what you mean by voxels in deep net. Like, here, you have-- could you-- what's that w?
HEIKO SCHÜTT: So the-- yeah, so just to make it really concrete, so to-- for the first channel of the first subject, we will randomly draw a position from this 2D grid x and y and a random weight from 0 to 1, uniform, and then add up from a local average as illustrated here, the feature maps at around this location, add them up, and that gives us the response of that first channel to all the stimuli for the first subject.
And then the point is simply we will have one sample of voxels for subject one, and then if we simulate a new subject, we just draw new positions and new weights for the features for this subject. That's all the variation. So there's just a different set of voxels we use, effectively. Does that make sense?
AUDIENCE: Yeah.
HEIKO SCHÜTT: OK, great. Yeah. And so once we have this, the measures we used were primarily the relative uncertainty, we call it. So we divide what we get as our standard deviation of the model evaluations from bootstrapping divided by the actual variance across multiple repeats of our experiment, which we can just run in simulation. And this ratio should be one if our method is exactly accurate.
And then to measure how good our method is, how powerful, we calculate the signal-to-noise ratio. So how big is the variance across models? So in our case, the different layers of AlexNet compared to the variance across all the repetitions for different models.
And then we can look at this. So, for example, for just-- if we just use a condition bootstrap, so we just bootstrap the conditions and actually just vary the conditions, so we use the same voxel placement but change the stimuli, then indeed we end up with ratios here relatively close to one, at least once we have sufficiently many conditions. And similarly, if we just bootstrap the subjects and just vary the subjects, we again get values that are pretty close to one once we have sufficiently many subjects. So this seems to work fairly well for those simple bootstraps.
However, we then run into our first problem immediately, namely that if we bootstrap both, so we choose new stimuli and new subjects, we actually get substantially larger bootstrap estimates for our variances and standard deviations. Then we get an actual variation across the experiment. So we are substantially over conservative. And this is a factor 1.4 over here is-- that's a factor 2, roughly, in the number of subjects you would need to get anything significant, et cetera, right? So this is a significant error.
So how do we fix this, and what's the problem here, anyway? It turns out that this has nothing to do with RSA, actually, but just with bootstrapping across two random factors. What happens essentially is that if you just bootstrap the subjects you also-- you don't get an estimate of variability across the subjects, but you get variability across subjects plus variability due to noise. And similarly, if you do the condition bootstrap, you get variance across stimuli and variance across the noise. And then, unfortunately, if you simultaneously bootstrap both, you get, well, subject stimuli, but then actually three times the variance across the noise. So more or less once similar to those two, and then an extra one.
And we confirm this in substantial simulations, also ones which have nothing to do with RSA. Even just a simple linear sum, this happens. And we can actually go through the math. It's, again, relatively boring math, but you can solve this, and this is indeed true. There are some factors in front of them, like n over n minus 1, but this is the formula for this variance estimate you actually get.
Now, on the plus side, once we have this kind of formula, we can also create an adjustment which just takes those numbers and produces an estimate that, on average, has the right variance. And so this is our adjustment for this two-factor bootstrap. And then, I mean, just by plugging in, you should see this works out.
And if we do this, we then get much smaller deviations from one and get essentially as good as for just resampling the stimuli or conditions. So this is the correction for generalization to both subjects and stimuli.
And now, for the last point, I want to quickly talk about flexible models. So those are any models where you have parameters that you fit to the data to match the RDMs better. And why would you want to do this? The main reason is actually-- we mentioned this earlier, is that there might be transformations due to the measurement model that we don't know about.
So this is illustrated here. So let's first go through what this image shows. These are 4 to 5 layers of AlexNet and MDS embedding of the RDMs. So these multiple things of the same size are different random local averaging, so similar as we had our subject variability in the simulations. And then the size here is the size of the averaging region.
And you see that for each of those layers, the RDM does substantially change if you average over larger areas. This is similar to smoothing your maps before computing differences. However-- so the RDM changes substantially, and we might not know how big our voxels are in our model space, so we want to take this into account. Nonetheless, the models seem to be here fairly well separated, so we think we should-- if we can construct a model that can produce any of the RDMs along this path, we should be able to find out which model is best fitting.
And so how do we deal with these kind of flexible models? First of all, we need for these measurement models then when we want to use them. And then we recommend here essentially to just use cross-validation. And then to estimate uncertainty, we run this cross-validation within the bootstrap. And that seems to work fine first.
So let's go through this. So we have-- for each RDM, we choose some stimuli as fitting. We choose some fitting subjects and evaluate on a separate set of subjects and stimuli. And then-- so this would be one thing, and we cycle through all the combinations, as in typical cross-validation. And then we have bootstrapping around this, bootstrapping both the stimuli and the subjects. So this will then allow us an estimate of the uncertainty again of our model evaluations.
So we call this procedure overall bootstrap cross-validation. And the thing to think about is here that we have this bootstrap and the cross-validation, and they determine different things. The bootstrap determines the generality of our inference, so we resample-- if we resample subjects, we try to construct an inference that will generalize to new subjects or the population of subjects and correspondingly across stimuli, right? And if we do both, for both.
And similarly, the cross validation determines how general your model needs to be. So your model needs to generalize from the training data to the test data with the same parameters. That's what you're enforcing. So if you cross validate only across subjects, that would mean you have to fit only the same-- like, you would have to fit the same stimuli, but in you subjects again. If you cross validate only across stimuli, you're allowed to have different weights for different sets of stimuli of people, but you have to explain other stimuli as well, et cetera.
And essentially, most combinations of these are somewhat sensible and have some application somewhere. And all of them are possible in the toolbox. But those are the two things you need to think about. How far do you want your inference to generalize? And how far do you want your models to generalize? And that should determine your choices for bootstrapping and for cross-validation.
And now, unfortunately, there is one problem with this procedure, namely that cross-validation causes variance. So this shouldn't be too surprising once you think about it, that if you choose a different permutation of the stimuli, for example, so choose the-- and this changes your cross-validation folds, you will get slightly different numbers for your models. This is not surprising in the first place. However, this is usually a very small fraction of variance that we can safely ignore.
Not so in RSA because in RSA you have an effect of some of the things being ignored. So I illustrate this here for the training and test sets of stimuli. The test sets are in red. This is threefold cross-validation for the stimuli. And now, if you overlay all the test sets, you should see that only the red ones here ever entered the evaluation, while all the gray ones never entered our calculation of model performance. And that's the reason why if you permute the stimuli, these gray regions and red regions will change positions, so a different set of the dissimilarities you estimated will actually enter the model evaluation. And thus, this variance component is fairly large in RSA.
So one way of solving this would be to just run, really, many cross-validation random assignments within each bootstrap. And then it would eventually get very small variance, and we could ignore it again. However, that would be prohibitively expensive computationally because for each of those we already need to run this in each bootstrap sample.
So what we can do instead is, following these formulas, we can just extrapolate from a few cross validations and how variable their mean is to-- and how variable it is for just one cross-validation assignment to infinitely many ones. And, yeah. This works very well. So what I put here is if we just have the uncorrected estimate, so this would be how much the variance is compared to the final estimate and the corrected one. And you see even if you just run two cross-validation random assignments, you can already accurately estimate the variance with many ones. And that's the same value you would converge to if you ran many cross-validations [INAUDIBLE] fold.
And now to decide how many should you actually run, we just plot here what happens to the variance of the variance estimate, so how accurately can we estimate this when we either increase the number of cross-validation here in gray or if we increase the number of bootstraps. And increasing the number of bootstrap samples works much better. So essentially, the recommendation is to use just two cross-validation assignments within each bootstrap. And then you run many bootstrap samples to get an accurate estimate of the variance within these things. This is here written out.
And now I think I'll actually jump over these, but we did validate this also in both fMRI data and get roughly-- get quite close matches to what we would expect. And in calcium data, we can recover which brain areas those are and see the exact same effects from the two factor bootstrap working better. So this is not just in simulations, but also in resampling data, we see similar results.
OK. And with that, slightly after time, I can conclude for this introduction talk tutorial thing. First, I hope I convinced you somewhat that RSA is useful method for making these high-dimensional comparisons between brain data and models. To compute the dissimilarities, think of the noise structure. And for cross-validation, if you compare RDMs, by using these whitened measures yields more reliable inference. And we have these two corrections that are necessary if we want to generalize to new subjects and stimuli or if we want to use flexible models.
And all of that is, as we promised, available in the RSA toolbox for you to use. And effectively, I'll now switch to explaining to you how to run the demos.
So this is the view you end up when you click on this link. And here on this Code button you can get a link to download this. So we will not-- first of all, I shall ask, are there any questions about what I talked about? Otherwise, I would transition to the demo part.
AUDIENCE: I'm still confused about what you mean by flexible model in this context.
HEIKO SCHÜTT: A flexible model would be just anything, any kind of model, where you have parameters that are fitted to the data, right? So in the classical, like old-school RSA, you just have one representation so you get one RDM, and that's your prediction, and there is nothing to be fitted about that. However, in many cases for these measurement models or if you think there should be an arbitrary weighting of the features or something similar, you would then have a flexible model, so just a model that has some parameters that need to be fitted to data. That's what we call a flexible model.
AUDIENCE: Are we talking regression? What do you mean when you say the parameters? Like, you have a layer of network [INAUDIBLE] brain data, and then the classic RSA, you don't do anything between the activations. You just compute the RSA, right?
HEIKO SCHÜTT: Yes. And--
AUDIENCE: And with flexible model, you fit something, but the regression model [INAUDIBLE]--
HEIKO SCHÜTT: Yes. So for--
AUDIENCE: --then you do RSA?
HEIKO SCHÜTT: Yeah. So the two, perhaps, most typical kinds of flexible models here are either you have something like these voxel sizes here. Then it's a selection or interpolation model. So you just give the set of RDMs, and the flexibility is choose one of these RDMs. Or alternatively are these weighted models, so you have a bunch of RDMs and assume that the RDM in the model is sum of those RDMs. So this would be what we call a weighted model that is-- the most typical thing would be indeed the feature weighting. So if you, for example, allow a weight for each of the features to be applied more strongly or weaker to the-- contribute more or less to the RDM, and that would be another way of fitting the samples.
Yeah. But in principle, you are very free in what you would do there. As long as you have an effective method to find the best parameters for a given RDM, you're good to go. [INAUDIBLE].
OK. But then, let's go to the flexible-- to the actual practical part. I'll just quickly click through the idea of how this is implemented in the toolbox, and then we can go over and do this. So to calculate RDMs in the toolbox, you first need to create a data set object which lives in this place and is effectively just the NumPy array of the data, measurements by the channels, so the voxels in MRI, the channels and MEG/EG, the neurons and neural data, and a dictionary of descriptors, so which measurement is which pattern and which names for the channels.
And then there is the function that takes such a data set object as input and gives you an RDM as an output. And that's this calc thing. And it just has a method input how to do this. And then it just works the same way no matter which dissimilarity you want.
For comparisons, there is-- again, there is just a function that takes two of these RDMs objects as input and gives you back their similarities. And as for the calculation, there is a method argument. And this implements all the methods we know of, effectively.
And then for model tests, there is-- again, we need to create a special thing, this Models object, which has some internal RDMs as its basis and then implements two functions, effectively, one given some parameters, give me the prediction of the model, and a method for getting the fitted parameters. And this is-- we even need to do this Model part if we want to have fixed models, so models without any fitting. There's just ModelFixed, which just takes one RDM, and then we'll just predict this one RDM, ignoring the input.
And these ones we can throw in with an RDM object into evaluation functions, which will do all the evaluation methods I've shown you so far. OK.
So let's try this out. For anyone who wants to follow along, those are the commands to get this running if you have git and Anaconda installed, which most of you, if you're familiar with Python, will probably have. And I'll actually do this on screen. And, yeah. You find this-- also these links on the website of the GitHub repository.
OK. So let me close my-- and then we will go through those three demos. And just because I'm closing this now, I want to thank my collaborators working with us. And let's see how we can do this practically.
So this is a terminal. And I'll just start by doing all this installation stuff. And first we create-- I create an all-new conda repository, rsatutorial python. Right. Yes, we want to install this stuff. Right. And anyone who wants to try can-- this should all run on your computers, too.
Right. So now we are in this conda thing. I already typed in the git clone thing, so I have here a thing called rsatoolbox I change in there and install this, which takes a moment. I'm downloading all the [INAUDIBLE] packages. Do, do, do. So.
OK. And then we need jupyter because-- that's because the demos are in those notebooks. [INAUDIBLE]. OK. Is anyone trying to do the same in the room?
AUDIENCE: Yeah. It seems to work so far.
HEIKO SCHÜTT: OK, great. Just to [INAUDIBLE] and [INAUDIBLE] your notebook. So here we are. Great. So this is also just the whole GitHub repository, right? So you have little rsatoolbox as well and the EC file now, but we go into demos. And yeah. Essentially, the three ones I would like to go through today would be the one for the data set, for calculating dissimilarities, and then the one for the bootstrap to show you those things.
So let's start with the data set. Right. So I can-- yeah, so it says here this will explain how these data set objects work. And again, this will be the format you have to get your data into for using the toolbox. So if we just import all those relevant things-- yeah. I forgot that it always takes a moment once you install scipy and numpy [INAUDIBLE] compile stuff.
OK. So you should just-- and what it does now here is that it's just here loading as measurements this data set for the demo, which is then this measurements thing. And we have some number of conditions and the number of voxels. And this is a plot of it, so you have here some conditions clearly producing less strong responses than others. And this would be just your response matrix here.
And now we can create such a data set object. It's also described in text here. But effectively, the main command is this one, where we have-- create this data set from the measurements matrix and the three descriptors. So descriptors is just a dictionary with some arbitrary things we might describe. In this case, it's just saying it's session one, subject one, for example. And then we have observation descriptors and channel descriptors that are just descriptors for correspondingly what kind of conditions are those and voxels.
And we can actually print this, and then we'll show you here this is the whole thing. And there's descriptors that look like this and just prints out this information from the data set, right? Yeah. And you can then here do things like creating this from random data as well, of course, or subsetting this kind of data. So this is the basic data set thing.
And it has nice functions. So if you just-- well, I can run, this for example. And here in the end, we have a command called subset_obs here, and this will select a subset of the observations where you can just say, OK, based on conditions where it's 0, 1, 2, 3 or 4, and it will produce this subset of the data. And this works for each of the dimensions. And you can also split by channels, for example. So I have this per ROI, et cetera. And this way, you can create here your data set and manipulate them.
So if you, for example-- yeah. What was it called for example? Yeah, just data here. Then it has here access to the different ones. And get_measurement will get you your measurement matrix back. It has numbers for the different things and allows you to do all this. It also has save and load commands as well so we can, in principle, create those and save and load them to disk as a whole. OK.
So this is the basic data set structure. Does this make sense, or are there any questions for generating these data set objects? OK. Then maybe the only thing is here with the multi-subjects. Then usually what we do here is that we make a list of these data sets. So for each-- so in this case, once we've run this, this is a list of these data set objects list, and you can then data 0 the first one. And I think in this case, there are five in total. OK? Great.
So that's already the first one down. So let's look at calculating some dissimilarities, which is probably a step further. OK. We can again run this. So in this case, this is basically you're loading the same kind of data, just as a basis for calculating this. And then calculating an RDM is super easy. It's literally-- this one calculates the RDM, right? So it just takes the data as an input and then a descriptor to tell it which of the measurements go to which condition or stimulus. And in this case, we call this conds.
And then we can get this. And by default, it does just a normal square Euclidean distance. And then it says here we have an RDMs object with one RDM over 92 conditions with squared Euclidean distance, and it copies over the description of this. So this is how you calculate an RDM.
And as is explained here, this has this method argument. And we can quickly have a look at this link, which goes to our-- well, not existing index? That's not good. We can fix this. OK. But we have a list of these distances. That was embarrassing, but anyway, the point is we can just put in the method argument here, and this will get us correlation distances, for example. And to get nonsquared Euclidean, we actually just use a square root transform of [INAUDIBLE].
Right. Now we can access the values in these RDM objects simply with get_vectors and get_matrices, right? And these will produce our contents of the data. And we can plug them as well. So there is a plotting function for RDMs. And this is the RDM from these simulated data. And it's very simple as well.
As a side note here, if you're not using our calc function to get RDMs, we can also skip this and produce them by hand. So in this case, I just showed this here by saying get_vectors and then running this on there, but effectively, you can-- this is the constructor for this RDMs object, and you can just give an arbitrary RDM here, and then you get your RDMs object out of this, so in case you have some way of getting the RDM that we have not yet implemented, you could still get your RDMs object in this way.
If we're doing this for-- well, if we want to do this for several RDMs, this is completely implemented for this list style of data set. So if you make a list of these data sets, like this data list here, then you can just run an RDMs calculation on this whole list, and you get an RDMs object with-- and compare it with multiple ones of these. So this would be the way to do this for multiple subjects. And then the nice thing is that in the end, you have one object which contains all your RDMs for all your subjects for later evaluations. And again, this has here simple ways of printing this out as well.
Yeah. Similar to the data set, there are subset and subsample, actually, methods here for the RDMs. They are slightly different in subset is set, so it will think that 1, 3, 4 is the same as 1, 3, 3, 4 here, in this case, while the subsample one actually repeats the third subject then. So those are functions to subset.
And equivalently, there are ones for the patterns. We called it here. So those would be taking out some conditions or adding-- of them or a subsampling, again with the distinction between subset and subsample. And the show_rdm also takes a bigger set of RDMs. And it shows here the five different ones we got from adding noise to them early on.
So this should be fairly convenient so far for getting all this. Then one of the parts I explained in the talk was to use cross-validated distances or dissimilarities, I should say. And we can also do this.
Now, if we use, for example, this crossnobis thing here, which is one of those cross-validated ones, we additionally need this cv_descriptor. So that's the one that selects which runs are supposed to be separate, so which ones we think are sufficiently separate from each other. Yeah. Sessions would be a very sensible thing, so this would be, I have a set of different measurement sessions and I assume the sessions are truly independent.
And then otherwise, this works the same way as always. And we get our cross-validated RDM this way. And that's all we needed to do to calculate this. And yeah. This should be as convenient as any other distance metric here.
And similarly, if we want to take the noise covariance into account, then we, of course, need a covariance. Those are the two-- like we have these multivariate and univariate noise normalization things here. And they come simply from these functions, precision and covariance from measurements. Here, this would be how you get the covariance matrix. But we never use this, so we-- in the toolbox, you always need the precision matrix to run this.
But it is simply this call. There is this function precision from measurements. And it takes another method input here. The diag one creates a diagonal matrix, and the shrinkage ones are calculated-- have these two names, shrinkage_eye and shrinkage_diagonal, so those would be the ones-- this one shrinks to equal variance along the diagonal. This one shrinks to the data diagonal.
If someone wants to know about these shrinkage estimates, I think they are very interesting, but it takes a while to introduce you how exactly those two are different. But effectively, those are slight variance on how to do the shrinkage. And yeah. This is the kind of shrunk covariance matrix you get here with some estimates of diagonal structure and primarily the diagonal here.
If we had residuals, then there is an equivalent function precision from residuals. So just in-- it doesn't remove the means here, as it did for the measurements. But otherwise, it does the same thing and takes exactly the same kind of shrinkage arguments here to get you the precision matrix.
Now, once we have this, we can then calculate also with this calc_rdm still this thing called then the Mahalanobis as the method and need this additional thing, which is the noise, which should be the noise-perceiving matrix. And this works. And now we have just here showing those RDMs, that they are ever-so-slightly different from each other, but not fundamentally. So it's just that the diagonal and shrinkage estimates will give you gradually more reliable and values for those distances. OK.
AUDIENCE: Just in the data set here, it's n repetition of each condition? Is that how you get to cross-validated, correct?
HEIKO SCHÜTT: Yes, they are. So I mean, I can--
AUDIENCE: How do you identify repetition of position? You have sessions, as [INAUDIBLE]?
HEIKO SCHÜTT: Yeah. So let me-- maybe I'll just show you here. So you have-- yeah. I mean, it's called data set here, so we can just-- it's data set, right? So it's a data set thing, and it has some measurements matrix. And then these observation descriptors is what really drives those computations, right? So it has one annotation called columns where it says the first one is for condition zero. The second one is for condition [INAUDIBLE]. This is for condition zero. Fourth is for condition one, and so on and so forth.
And then correspondingly, there's the sessions argument, which is 0, 1, 2, 0, 1, 2, 0, 1, 2, in this case. So you have one line in this data set object for each of your measurements annotated with in which session this was measured in which condition it was measured. And that's exactly the two inputs you need for this cross-validation, right? So you then tell this function with-- here, once we do cross-validation, you need the descriptor for which ones are the conditions and see cross validation descriptor to tell it which are the independent measurements [INAUDIBLE].
AUDIENCE: OK.
HEIKO SCHÜTT: Yeah? And yeah. Just to say, like this-- this dictionary of these annotations is observation descriptors. And we just had to give this originally for creating the data set, right? And essentially, any list-like thing with equally many entries as there are measurements in this data set works. That's what we need for the data set. And I can just look it up. It's-- yeah. So it's somewhere up here, where we actually define the data set. So it's-- here it is the observation descriptors, but that's not the one we're looking at now. Yeah. It is within here. Yeah, here, cross--
AUDIENCE: [INAUDIBLE]
HEIKO SCHÜTT: Here it is for the cross-validated dissimilarities, right? It says conditions here, and then it's just repeating them correspondingly. And this is where we set the observation descriptors conditions sessions and then pass it as this observation descriptors to the creation of the data set.
AUDIENCE: Yeah, makes sense.
HEIKO SCHÜTT: Great. Yeah. And then just to say, this works just fine to create these RDMs. And this is with cross-validation. Also nothing more needs to be done, effectively, right? So just create, and it's just the fancy plotting thing which takes these [INAUDIBLE].
And maybe if you're ever in doubt for calculating those, there's, of course, a help for this function, which tells you that you're supposed to pass this data set object, and those are the options and what kind of arguments you should pass to them, right? OK.
So this would be how we calculate RDMs with the tool box. And, yeah. As you hopefully see, this is not getting more complicated if you want to use the cross-validated or Mahalanobis distances, this should just work the same way it does for a simple coordination or Euclidean distance. OK. Are there any questions for this one?
OK. Then let's jump to the third and last demo on bootstrapping. OK. OK. There is some introduction text, which we will skip over. In this case, we just load the data. So this is not doing anything interesting. It's just loading the data. And then we create these model RDMs. It's just an RDMs data for these things. And just for fun, this is the actual RDMs, and we will soon see them. So these are the different RDMs here for one of the particular layers of AlexNet. OK. Yeah. And these are actually versions for different measurement models.
As we've seen before, we can print out information about these RDMs. And like this is just-- so so far, we've seen this in the previous demo. We just created this RDMs data set, and we can look at it. That's great. Now, first actual inference will be fixed model inference. So again, this code for you just doesn't do much interesting. It just loads the data and puts them into the right format of matrix. And gets all the different names for it. And then we can choose here a data set for the RDMs.
So this is which repetition, which noise level, or how big we want the voxels to be. And then we get our RDMs data in the same way we earlier created the-- by hand kind of thing in the previous demo. So we just run this creation and get our RDMs data. And we can look at those. Those would be some fairly noisy data RDMs, OK?
Now, if we want to run fixed models on this, we will create a model for each of our layers, getting the right RDM here with a subset, measurement model complete for this one, for example, and then create all the-- I'm just printing out here which models were created for the models created are the convolutional layers and the fully connected layers and the properties.
And so now we have here a model for each of those. And the model creation here is extremely simple. The only two arguments we need for creating a model here is a name, here i_model and an RDM. And that's also the only thing that is saved within this object, really.
So we can go through this. And yeah. Then we have our data RDMs and can run our first inference. So we have our first model inference where we put in the models and the data and some method how we would want to compare them, maybe just a simple correlation distance. And we can run those. And ta-da, we have a prediction accuracy rate measured as the PSNR correlation for each of those models with error bars computed across the different subjects, pairwise comparisons, and error bars and everything. So this is the simplest kind of these inferences.
We can also just print those results. This is often asked for. So these are evaluations plus/minus standard error of measurements, and p-values for the two conda comparisons. In this case, they are all clearly significantly less than the noise ceiling. And some of them are better than zero.
And as it says here, there are get_something functions for this object that gives you all the information you might want, so the confidence intervals, error bars, means, noise ceiling, standard error measurements thing here, and test ones for all the tests, tests against the noise ceiling, pairwise tests, and tests against zero. And they just return a matrix of p-values. So in this case, you just get this matrix of p-values, OK? So this works as a very simple interface to all these things. And this plot function creates this fairly nice summary plot that contains all those statistics as well. OK.
And then we can come to bootstrapping. The simplest one would be to use eval_bootstrap_rdm, so this would be bootstrapping the RDMs, so a.k.a. subjects in this case. And we can run this. This now takes a moment because it actually runs through those bootstrap samples and then produces a plot essentially in the same format as we just had. And here, just in the printout, you should see that these values are extremely similar to those ones because bootstrapping across RDMs is really relatively boring, because for those, the T tests will actually work fine, so you will get exactly the same kind of things here.
We can run-- for running the same across-- bootstrap across the patterns works the same way. Again, takes here like half minute to run. But it's-- from the format, it just stays exactly the same. In this case, I think the error bars get somewhat larger, so there is more variance due to the patterns in this case than due to the subject. OK. Yeah. And as I said, and then correspondingly, fewer of those pairwise comparisons actually become significant. But in principle, this works just as well.
We still have the just bootstrap to do the bootstrap across both. And it produces even larger error bars. I will not run this one now, just to save a bit of time. But importantly, we have this dual bootstrap here, which does exactly the correction I was talking about for the bootstrap to both subjects and stimuli. So maybe-- yeah, maybe I have to run those two after all, sorry.
Yeah. So we will just-- now just run the bootstrap actually across both dimensions and the corrected one. And then we will see eventually in those outputs here that the-- while the actual evaluations are similar between the two, the standard errors for our uncorrected bootstrap are substantially larger than for the corrected one. And the correction is accurate, as we showed in the simulations. Yeah. Unfortunately, that now takes another minute to run. But you can believe me that those runs are quite true. OK.
So this runs the correction for the two bootstrap things, so let's-- well, yeah. So this would conclude the part on just bootstrapping. As you should appreciate here, switching between the different bootstrapping methods is simply replacing this name of the bootstrapping function. Otherwise, everything stays exactly the same. And the only drawback for further generalization is that it takes a bit longer to compute. But I hope you agree not terribly long. And now we run these. And it's a bit harder to see this way, but we can, again, print those two and see that the standard errors of measurement all get somewhat smaller by the correction. And correspondingly, our p0's. OK.
And then cross-validation for flexible models. That will be the very last one. For that, of course, we need some flexible models. What I choose here for the-- for our demonstration are these model select ones. These are ones of those you just have a bunch of RDMs here for the different averaging sizes, and then fitting just consists of trying all of them out and choosing the best one. And that's then the RDM it keeps predicting for the rest of the time. We can create those. That was very fast.
And then we can first do manual cross-validation, where we just create a training test on ceiling set from these kind of functions here where it splits those off and then run the cross-validation. I mean, one thing that is immediately obvious is this is the same data, so these fitted models do far better in this case, which is somewhat emphasized here because this is, of course, simulated data. So one of the RDMs it is allowed to choose from is the actual model how these data came about, so it's not surprising that it works well. But in general, of course, all the models will do better with fitting, usually, on these kinds of data sets.
Anyway, this gives us cross-validation. However, that's what is reporting in these warnings. It does not get any error bars from cross-validation, so there are no tests from just one cross-validation [INAUDIBLE]. Yeah?
And then we can run the bootstrap cross-validation. Same commands, just setting here k_patterned and k_rdm to some values, which will yield the splits. And then you get error bars and all your tests for the flexible models as well. And as a nice side note, also, now the right model wins, so the model that was actually used in the data here, the third convolutional layer.
And with that, I think I've shown you all the things I mentioned in the tutorial that we were supposed to be able to do, and we can. If you want to see more on how to use the toolbox, there are a couple of more demos with hopefully somewhat helpful names. And we actually-- like the-- yeah, we also have here our-- just go to the-- our Read the Docs helps for this, which gives you an overview on getting started on how to calculate some RDMs and explanations of all the different dissimilarity metrics, et cetera.
And I hope you'll find everything there you need to know. If you don't, always feel free to write me an email and ask. And with that, I would be done.