suite2P: a fast and accurate pipeline for automatically processing functional imaging recordings
Date Posted:
August 9, 2021
Date Recorded:
July 29, 2021
Speaker(s):
Carsen Stringer, HHMI Janelia Research Campus
All Captioned Videos Computational Tutorials
Description:
The combination of two-photon microscopy recordings and powerful calcium-dependent fluorescent sensors enables simultaneous recording of unprecedentedly large populations of neurons. While these sensors have matured over several generations of development, computational methods to process their fluorescence are often inefficient and the results hard to interpret. Here we introduce Suite2p: a fast, accurate, parameter-free and complete pipeline that registers raw movies, detects active and/or inactive cells (using Cellpose), extracts their calcium traces and infers their spike times. Suite2p runs faster than real time on standard workstations and outperforms state-of-the-art methods on newly developed ground-truth benchmarks for motion correction and cell detection.
Helpful links:
Speaker Bio: Carsen Stringer is a group leader at HHMI Janelia Research Campus. She did her postdoctoral work with Marius Pachitariu and Karel Svoboda at Janelia, and did her PhD work with Kenneth Harris and Matteo Carandini at University College London. She develops tools for understanding high-dimensional visual computations and neural representations of behavior.
PRESENTER: Carsen has done some of the most impressive and amazing work that any young person, I feel I can say that because I have so many gray hairs, any young person I know has done in the last few years, in the area of neurophysiology and information processing in the brain. Carsen did her undergraduate degree at Pittsburgh, and then she did a spectacular PhD with my friends Ken Harris and Mattio Carandini at University College, London. And there Carsen developed Suite2p, and then has published over the last couple of years a string of papers, showing amazing high dimensionality of visual cortex responses, the astonishing widespread nature of activity related to internal state and even movements in the visual cortex.
And then just earlier this year, after her PhD, Carsen moved to Janelia, as a postdoc with Marius Pachitariu and Karel Svoboda. And just a couple of months ago, Carsen published an amazing paper showing the high precision encoding of visual cortex, encoding by visual cortex neurons, at a level of precision that far surpasses the behavioral precisions seen in behavioral experiments. So all of this has, to my mind, opened more questions. But it takes brilliance to take a field and open it up.
And it is Carsen's combination of computational ability and insight that has done so. So we are delighted to welcome Carsen, and to hear from her and learn from her. And I'm proposing to do that too, myself. So thank you for coming, Carsen.
CARSEN STRINGER: Thank you so much. That was a really, really kind introduction. So, yeah, I am Carsen Stringer, and a lot of these like scientific breakthroughs have been possible because of working on this tool, Suite2p, which allows us to do this automated segmentation of many thousands of neurons. So what Suite2p does is it takes these raw TIF files or HDF5 files, scan box, mesoscope, bruker tiffs, we have lots of data formats we support.
And it takes these raw movies that come off the microscope. It first does a motion correction step, which is this, where we need to correct this jiggling of the frame, because of the mouse's being awake, or other animals being awake during our recording. So there will be movement. And then we do ROI detection, where we find regions of interest where pixels are correlated across time. And that's how we find these little ROIs here.
We also have support now for cell pose inside Suite2p. For those of you who've heard of it, it's a way to do also anatomical segmentation of our neurons. So that might be useful for a couple of things, and I'll show a couple of use cases for that later. And then from these ROIs, we extract these traces, which are the sums of the pixels within each of these ROIs. And we get these traces, and then we also perform deconvolution on these traces.
And then, finally, we output. The default output is MPY files, which can also output Mat files and NWB files, and you can load all of these different file types into our graphical user interface, which you can use to explore the cells. And so today, in the tutorial, I chose to use Colab, because I think it's easier to get everyone up and running. But I do recommend, if you are going to use Suite2p for your own processing, that you do at least download the outputs of the data, and look at it in the GUI, because it's a nice way to see correlations across cells, and explore different spatial patterns in particular, which is part of the reason we do imaging is so that we have the spatial information about the cells we're recording from.
And so Suite2p is very fast. And we've done that on purpose, because, like when [? Regonka ?] was describing about these experiments, the reason we were able to do them was because we could quickly process our data and then come back with new hypotheses the next day, and test them on the mice with new images, or new different just stimulus sets in general. And so, motion, so this is an example of the speed of the runtime in the Google Colab you have. It should take around six minutes to do motion correction on 4,500 frames.
And it'll take less than two minutes to do the cell detection and the cell extraction. And so I'm going to quickly, so [? Regonka ?] mentioned some of the science. I'm going to very quickly go through it. And I can talk very fast. So I'll go through it fast, but you don't need to get all of it, but I just want to put it out there as types of science you can do with it. Then I'll go through in more detail and more slowly how Suite2p works, and some of the future directions that we have planned for Suite2p and population analysis in general in the field.
In order to record many thousands of neurons, what we do is we often do many planes simultaneously. And so this is an example of many planes we're recording in depth here. And this is a recording of a mouse in total darkness. And so here's a zoom-in of one of these planes. And then it's going to zoom out and you'll see the recording across all of the planes. And so this is visual cortex, there's no visual images because the mouse is in total darkness.
And so, despite that, there's all these different patterns of activity. And so we didn't really know what these patterns of activity were. But one thing we could look at was we could look at the behavior of the mouse, while we're doing these recordings. So we put infrared lights on the mouse and used an infrared camera in order to record what the mouse is doing while we do these large-scale recordings. And because we had, basically having all these neurons allows us to basically-- and then we extract them using Suite2p because otherwise no one is going to circle 10,000 cells, so, yeah.
Having so many neurons allows you to start seeing these multidimensional patterns. And that's what we initially saw. And we didn't know how to explain them, because, in the past, many people had used one dimensional variables like running speed, pupil, or whisking. But really there's patterns of activity that aren't just related to running speed. There's other different, diverse patterns in this neural activity.
And so this is a plot of neurons by time. And we've sorted it by our unsupervised embedding algorithm, which is similar to TeachMe, which kind of puts neurons that are correlated to each other next to each other on this axis. And so we also needed a multidimensional variable to explain this multidimensional activity. And what we found was the multidimensional variable we needed was the facial movements of the mouse.
And so we took the principal components of the mouse's face and used it to predict this neural activity. And so we built a model on some subset of frames, our training data, and then we looked at it on our test frames. So here's test data neurons by time. And here's our prediction from the faces where every neuron in this plot is replaced with its prediction from the face down here.
And so you can see we're predicting like this onset of running, that has this kind of shape. We're predicting these neurons during maybe different types of whisking that happen when the mouse is running, different types of whisking that happen when the mouse stops, when the mouse grooms in between running. There's lots of different patterns that the face drives within visual cortex, which is conventionally thought to be just a visually driven area.
And so, speaking of it being visually driven, the next question we had was, do these behavioral signals impair coding a visual stimuli, because you don't want to, just because you're running, you see things completely differently. And so the next question we asked and again, this is in much more detail on the paper, is are these behaviors related to the stimulus directions in the mouse's brain? And so we recorded activity where we had periods with stimuli and periods without stimuli, and we sorted this axis where correlated neurons are close to each other.
And so these are the neurons that are more driven by stimuli. These are the neurons more driven by behavior. You can see these neurons just that are driven by behavior just continue on, whether or not there's visual stimuli being shown. And then these visually driven neurons, these directions in space, that are visually driven, they don't have very much activity during these behavior periods, because they're actually orthogonal to the directions of neural activity during behavior.
And so this is a very complicated point. So I don't expect you to get it from the slide. But the punch line is that you have lots of neurons driven, for instance, when the mouse grooms, it drives a bunch of neurons. But those neurons aren't related to a specific picture or image. So every time I show a cat image, there are certain neurons that are firing. But these are not only the grooming neurons, for instance.
So there's this discrepancy, there's a difference between which neurons are firing for different behaviors, versus which neurons are firing for stimuli. And because this is a high dimensional space, it's maybe not so surprising that they end up being orthogonal, because any kind of random direction you take in a high dimensional space is not likely to be aligned. OK, so basically, we have these behavioral patterns, which drives some noise.
This is about over a third of the noise that we see is driven by behavior. What about the rest of this noise? Does this noise impair coding? OK, so the next question we asked was, does neural noise impair coding? So we have some of this noise from behaviors that we don't think is going to influence things.
But this is only about a third of the noise we see. There's like another 2/3 of noise that we're not explaining with the behaviors. So we wanted to know if this kind of noise would impair coding, because we do know, at a single neuron level, we do have noisy neurons. So when we show a bunch of visual stimuli of different orientations, we have this average tuning curve in white, and we have these single responses to single stimuli in gray around this tuning curve.
And so there's a lot of this variability in the firing. And this has been observed for a long time. It's not just something that we've seen. Another thing that's been observed for a long time is that the mouse's behavior is also noisy. So if I ask a mouse to choose right if the angle is greater than 45 degrees, they're supposed to look right, if it's 46 degrees, for instance, look left, if it's 43 degrees. A mouse's behavior is also quite noisy.
So the discrimination threshold of a mouse's behavior is around 29 degrees, which means 75% of the time the mouse can distinguish 70 degrees from 45 degrees. So this is a pretty big angle difference. That's much bigger than this three degrees I'm showing with these example stimuli. And so the behavior is noisy, and the single neuron responses are also noisy.
And so the question is, does the noise from these single neurons in visual cortex drive our noisy behavior. And so this is a kind of connection that the field has made for a long time, because, I mean, we think noise in one area might drive noise in decision-making. But one caveat here is that mouse visual cortex doesn't just have one or tens of neurons that we usually record from.
It has close to a million neurons. And so now, with these large scale population recordings, we're recording up to around 60,000 neurons at a time. We're really getting a large sampling of lots of neurons in this area, that we wouldn't get before. And so then we ask this question, how well can we decode the angle of a stimulus using the population neural responses? And we found that the population responses were very precise, that basically, from the neural population, we could decode 45.3 degrees from 45 degrees.
So this is a really small difference. This is much smaller than these stimuli. And also, if you ask a human to do an orientation discrimination task, they would get a discrimination threshold of around one to two degrees. So the mouse's brain's performance on this task is basically better than a human's performance.
So then we concluded in the study, so the single neuron responses that are noisy in visual cortex are not driving noisy behavior. We have some other analyses in the paper too, to support that, but rather it could be the case that downstream decoders of this information from visual cortex just aren't set up to necessarily do this task in a useful way, like they're not interested in decoding orientations, because maybe that's not biologically relevant for them in their day-to-day life. There's other hypotheses, too.
We honestly don't know why this isn't true, and it's an open question in the field.
AUDIENCE: Just a quick question. I think that's really interesting that mice seem to only be able to discriminate this 30 degree difference. That's only when they have a sort of an active response, that you have to train, right, which is different than, like, say that you have a reflexive measurement of whether the mice can tell us that two orientations are different, then maybe the difficulty in the information has to do with their training, rather than like their ability to tell the difference.
CARSEN STRINGER: Yeah, so like there's this relative angle difference that has to be done for this task, which is difficult. And I agree with you. And so mice do better on change detection tests, where they have to say if an angle has changed. But they're still, and their performance is better, but it's still not at the below one degree level.
But you are right. Definitely the way the task is set up definitely changes things, and probably also how you train the mice.
PRESENTER: I have a question, Carsen.
CARSEN STRINGER: Yeah, please.
PRESENTER: It is true that the average neuron in V1 has the kind of noisy, unreliable response as you show on the left. Is it possible that the very best neurons are actually just as good as the behavior, and there is a literature that says behavior really draws upon the most precise neurons potentially.
CARSEN STRINGER: No, that's definitely a good point. And that's actually, we did test that. And we used the best neuron. We actually do even a little better than the mouse's performance, but relatively close to the mouse's performance.
PRESENTER: You mean the individual best neurons are not better than the mouse's performance.
CARSEN STRINGER: Yeah, they're on the same level as the mouse's performance.
PRESENTER: On the same level, I see.
CARSEN STRINGER: Yeah. Rather than like building a more complicated decoder that takes into account more neurons.
PRESENTER: And does it matter, the layers and the location of the neurons that you are analyzing? Are these essentially layer 2, 3 neurons?
CARSEN STRINGER: We have 2, 3, and 4, but we didn't see differences. But we also haven't recorded deeper. And maybe there might be differences if we record deeper. But at least between layer 2, 3, and layer 4, we haven't seen differences in--
PRESENTER: But I agree that this is almost an order of magnitude difference. It is unlikely to be just the population that we are recording. It is something deeper than that.
CARSEN STRINGER: Yeah, I think it's encouraging. It means like there's a lot of this orientation information there, but there's another transformation happening into a space that I think is relevant for the mice, like they probably use these angles for depth perception, or for edge detection if they're running around walls and stuff like that. So I think there is another transformation that we still have to discover, as a field of these kinds of signals.
All right, and so the last thing is we found that there were these precise signals in response to orientations, but we used this one dimensional variable, because we knew we could easily build decoders. But we wanted to know, if you want to know the full dimensionality of neural responses, you would want to look at lots of natural images. And that's what we did in another study.
And so we wanted to know, are these neurons coding just orientation, or are they coding lots of diverse features of natural images. And so there were lots of studies in the past that said that neural responses are relatively low dimensional in cortex. And there was a nice review article, kind of summarizing all of these different studies.
And there's another nice study by Cowley et al that goes through these results, that these studies all, like many of them had very interesting scientific results, but their definition of dimensionality was limited by the fact that they were only recording so many neurons and so many stimuli. So if you only record, you only have 12 different stimulus cases, you're only going to have at most of dimensionality of 12. And, likewise, if you only have 100 neurons, you're going to at most have a dimensionality of 100. And it'll likely be lower than that, because there will be noise in these neurons.
And so it will be hard to find those underlying dimensions. And so using Suite2p, we were able to basically make this matrix an order of magnitude larger and record around 10,000 neurons by 2,800 stimuli. And then you're now able to ask these questions about dimensionality, I think, in a more rigorous way, rather than being limited by your stimulus set and your neuron set.
And so, when we did this, we were comparing two hypotheses, are neuron responses low dimensional, like these kinds of smooth y tuning curves, where you only have a couple of dimensions that are significant. These are Eigenspectrum plots of these population responses. We also considered this hypothesis, that neurons are high dimensional.
So they each, instead of having these y tuning curves, they each code for a single dimension in this one dimensional space. And so that's the hypothesis of sparse coding. You have lots of neurons encoding different directions.
And so that would create this flat spectrum. So these were the hypotheses we came into this with. And what we found was, actually, it was neither of these. It's kind of an in-between of the two, where the Eigenspectrum has this power law decay, where there's these low dimensions that have more variance than these higher dimensions. But there's a lot of these higher dimensions that are encoding these kinds of fine features of natural images.
And so we think that the neural population codes are balancing capacity, which is to maximize information. So the higher dimensional you are, the more information you can keep about the external world, about these visual stimuli you're seeing. So like, for instance, deep neural networks, they're pretty high. But they take the pixel level information, and kind of expand it into a high dimensional space to do computations.
And so we think that's a useful thing for networks to do in general. But at the same time, you probably also want to keep the smoothness criterion, that similar stimuli are represented similarly, which is having this decay and not being completely flat. So like if I was completely flat, like this one, I have each neuron is doing its own thing, basically, and encoding a different stimulus.
So if I move a little bit in stimulus space, I have a completely different representation from other stimuli. So you want these lower Eigendirections, these Eigenvectors, that have more variance, that kind of smooth across your space, so that you're smoothly changing across neurons as you move in your visual space, which might be useful for keeping continuity of the visual world, or keeping continuity of image classes, like, for instance, neural networks can easily be confused by images sometimes, when you do small perturbations. But in the brain, we don't want that.
We don't want a cat plus some random Gaussian noise to not be a cat anymore. And so we want these smooth representations in the brain. But we don't know much other than this. Like we don't know what this code is in visual cortex. So I think this is still an exciting, open question. We're trying to understand what types of features are being encoded, and what types of computations are being done in mouse visual cortex, with a high dimensional code like this, that has this kind of property.
AUDIENCE: Can I ask a quick question?
CARSEN STRINGER: Yeah, please.
AUDIENCE: I guess I actually have two questions. The first is, what, you said that these are natural images. Are they like natural mouse images? Like what types of images are these?
CARSEN STRINGER: Yeah, it's a good question. So they're all, they're from ImageNet and they're from, we kind of curated them from, I think, 10 different, a few different categories, like birds, cats, mice, grass, mushrooms, and a few more, I think.
AUDIENCE: And so they're sort of, yes, so they're, OK, cool. Thanks. And then the second question is, is this idea of smoothness, that you're saying, where like you need to have things that are similar be represented kind of similarly. And one thing that I'm curious about is what exactly you think it means to be similar. So like the Gaussian noise is maybe, like maybe that's one example. But maybe you have a rotation of an object, and actually the pixel values are very different between those two things. But it's the same object. And so I'm just curious if you have any thoughts about how exactly we should be measuring similarity, in this regime.
CARSEN STRINGER: That's a really great point. And I think it depends on where you are in the processing hierarchy. So I don't think, at least in primary visual cortex, we don't think mice have those kinds of invariances to those types of rotations. So the smoothness might be limited to a very small set of angles of rotation, rather than if the whole object rotates it still has this concept of an object, that would still be represented by similar neurons.
There's been a recent study from the Tullius lab. I don't remember the first author, that's looked at these rotated stimuli in mice. And they have some small effects, like there are some neurons that are coding for that. But that would be later in the hierarchy, so not necessarily in the recordings that we've done. Yeah, good questions.
PRESENTER: Carsen, I have a question as well. So the natural images in this study are natural images that are visible to the human eye, right? So they have a certain spatial frequency, a spectrum to them. And of course, due to physical limitations of receptor spacing, as well as receptive fields of ganglion cells, we believe that mice have much lower spatial frequency cutoffs.
And so what a mouse sees is likely a low pass filtered version of what the human experimenter has put up, right? So--
CARSEN STRINGER: Yes and no.
PRESENTER: No? Why no?
CARSEN STRINGER: Well, so they have fewer receptors. But I think they still have, I mean, like we saw with the angles. There's still this precise, at least in the population activity, there's enough of a continuum, that there is some coding of some of these precise locations, that you need that. So we've also done small stimuli where we just have a circle rotating. And so in that case, you're really looking at local information and those kinds of small changes in the stimulus.
So I think acuity is a little higher than people claim. But I do agree with you in general, because like we've looked at comparing to deep neural networks. And it does work better if we down sample the images, before we compare to the mouse brain. And the responses of the neural network look more similar to the mouse brain using downsampled images. So there is a decrease in acuity, but maybe not so stark as people have previous reported.
PRESENTER: That's interesting. Yeah, maybe we can follow up later a little bit.
CARSEN STRINGER: Yeah, I'm not sure, in terms of the images we showed, like if we should a priori low pass filter them, to give them the type of image we think is appropriate for them, or if we should give them the full image, which is what the world looks like. And then we, at the output side, do have to consider the fact that there is some low pass filtering going on.
PRESENTER: Is it is it possible to reconstruct, from the thousands of neurons that are responding to the image, what the image is that is driving the neurons?
CARSEN STRINGER: Yeah, so we've tried a little bit of that, too--
PRESENTER: Potentially could you use high dimensional information and reconstruct the image and compare it to the actual image. That might be a very direct comparison of what is it that neurons are really able to respond to.
CARSEN STRINGER: No, that is a great idea. I think we still, a lot of times they look very low pass filtered, I think, because in part, I think we also just don't have the right models yet. So you also have to have a good model to do that.
PRESENTER: Correct. Absolutely.
CARSEN STRINGER: I think, yeah, but even, I will say, even the models from retinal ganglion cells in monkeys look pretty low pass filtered when you try to predict from them. So, yeah, I'm not sure, if there's a huge difference between mouse and monkey.
Ned, did you have a question too? I saw you unmuted. But not to put you on the spot.
AUDIENCE: Oh, mine was similar to Janelle about the smoothness. But you answered it, yeah.
CARSEN STRINGER: Cool. I mean, I will say I didn't really answer it, because I don't know what these dimensions are so much. And I think it's still definitely an open question in the field. And so we share all of our data. And so all this data is available. And I think, in general, building better models of mouse visual cortexes is a good research direction.
So I'm going to go into the processing pipeline, and I'll try to go slower now. And so feel free to stop me anytime with questions. So the first step in Suite2p is to do motion correction. And so here's an example of a movie of cells firing in mouse visual cortex. This is an injection of GCaMP 6s.
And so to do the registration, we take, we first find a reference frame. I'm not going to go into how we find the reference frame. But assuming you have a nice crisp reference frame, we're going to take every frame in our movie, and we're going to try to match it to this reference frame and find the offset and pixels from this frame to this reference frame. You can see here I've plotted the difference between the frame and the reference image. And you can see there's this kind of difference that looks like I need to shift this plane to the left, in order for them to overlap correctly.
And so we could do this on the regular frames. But it actually works better to do it on the frames after they've been whitened. So if we whiten the images, this is what they look like. So here's the reference image and here's the example frame. And now what you can see is you can see these little edges. And now we can, we're really trying to get these edges to be right. And so this in general works better to get more precise estimates of our offsets.
And we will get this, so whitening is another step you have to do on top of all these things. But we get it for free, basically, because we're going to do the whitening in the Fourier space, and actually we're going to do this cross correlation also in the Fourier space. So this cross correlation, so I should explain that. So we have our target image. And we're going to compute the correlation of our frame at a bunch of different offsets with this target frame.
And so, sorry, we could do that by sliding the frame across our target frame. But the faster way to do it, actually, is to put the frame in the target and to take the Fourier transform of those, and in the Fourier domain, a cross correlation is actually just a multiplication. So we can take a multiplication of the FFT of the frame, and the reference target.
And that computes our correlation between the frame and the reference. And because we're going to convert to the Fourier domain anyway, it's really easy for us to whiten once we're in the Fourier domain. So if we whiten these images, and do this phase correlation, we get a correlation map that looks like this, where our offset from the center of this map tells us our shift, which looks like it's four pixels in this direction and one pixel, maybe, down in this direction.
And if we were to not whiten, our peak would look more like this, which is a lot less crisp. And it makes it more difficult to find the exact offset of these frames. And it's especially useful, this whitening step, if you have noisy frames. And we have some other strategies, if you have very low SNR frames in Suite2p.
For instance, you can add on temporal smoothing, and things like that that might help with your issues with registration in the low and a high noise regime. All right, so this is the whole frame, registering. But you have a problem that the whole frame might not move the same way, as like you might have little pieces of the frame that are going to move in different directions. So this will give us an overall picture of how the whole frame moves.
And what we're going to do is divide the field of view into these little blocks, and do this registration on each of these blocks separately. And so I take each of these blocks, and I compute its correlation with that corresponding block of the reference frame. And I get each of these little phase correlation maps.
And so you can see each of these little squares. And now each of these little squares for each of these blocks gives us an arrow like this. And so this tells us what direction each of the pixels in these blocks should move. And so this is at the block level, and now we need it at the pixel level. And so I have all these arrows, basically I have maybe 200 of these arrows. But I really need a 512 by 512 image of these arrows.
Does anyone know what step you would do to get from here to an image where you know where every pixel should move? Does anyone have any idea what they'd do? So I have these arrows for each of these different blocks. And this defines the arrow for each of these pixels in this area.
But what I can do is, I can do a bilinear interpolation from here to my full image. And this is an upsampling step we can do. And this will give us the pixel shifts in each of these little areas. And so some of you might have done upsampling before, or if you work with neural networks, there's like downsampling, upsampling layers.
And so we do this upsampling trick here in order to get a pixel shift for each of these little pixels here, and it allows us this smooth transition between, like, we want to have, especially like here, there's two different arrows here, and we want the pixels in between these two to have the average of these two arrows. And so we get a pixel shift for every single pixel in the image, and then we shift the pixels to these new positions, also using bilinear interpolation, basically, because you might have a subpixel shift, and so you're going to want to take the average of the surrounding pixels that are in that area.
OK, and then you get this upsampled, then you get your final motion corrected image, and what you can see, I put these circles here to see how well it's doing. And so this was a hard case, because, actually, we didn't screw the head-bar in all the way. So we wanted it to move as much as possible. So we didn't lock the head-bar in to try to get it to move more.
I mean the mouse was fine. It was mostly locked in. But we just wanted a bad recording to try to make it difficult on ourselves. So this is like our way of, qualitatively, this looks decent, but you might want also a quantitative estimate of how well we did on motion correction. And so this is tricky, because we don't have a ground truth.
We don't know how the mouse wiggled its head, how these different cells are moving within the brain. And so we have this metric in Suite2p that we use for this, that we have a trick for doing this, where we take the whole movie, which is x, y, and time. It's this big block. And we're going to take the principal components of this movie and see what they are.
So this is the registered movie. And so if I take the principal components of the registered movie, we hope that they're all activity-based. We hope it's different cells firing in different places, because that's when the cells are on. That's when this population of cells is on together, or this population of cells are on together.
But if we didn't correct all the motion, then some of these PCs might actually correspond to parts of the brain moving separately. And we do see this. And, for instance, if you have z-drift, one of your components will likely be the z-drift across the recording, across time.
And so we take these principal components, and we look at the difference between the top and the bottom. So you have this, so this is the V part of this matrix, which is like the time component. And we can take basically the frames when V is high and V is low, and we take the average. And that's like our top and our bottom of this component, that's changing across time.
And then we can look at the difference, which we have here, and also in the GUI we have, I'll show this, so it's a little easier to see, we have this play option where you can have it switch back and forth between the top and the bottom frame. And so I'm going to let you all look at this for a second, and see if you can see any motion between these two.
Or do you just see different cells turning on? So I think this is a pretty-- this is a case where there's not too much motion. There might be like a little bit in some of the edges. But you get used to doing this. So I've done this for a lot of recordings, and so you kind of get used to seeing these things. And you can look at the differences, and, in our review paper we've written, we have examples of what these differences would look like for simulations of like z movement, of rotations, of shear, of like other types of movements.
So you can get a sense of what these different types of differences would look like, depending on if you have z-drift or not, or other problems with your registration. And so the way we quantify this, because, again, we don't just want to look at pictures, is we register the top frame and the bottom frame to the reference image. And we see how far off they are from the reference image. And we take that difference.
And so the registration offset between these two frames is, basically, the rigid offset is 0. That's in green, for all of our PCs. So that's great. We did a good job with rigid registration. The non-rigid on average is relatively low. It's below, I mean, below one pixel. And then the non-rigid max, which is the maximum difference from all of our little blocks across the whole frame, is still less than 0.5. So that's pretty good.
So this is a case where registration did fairly well. But in the GUI, you can look at these, we have this plotted. And you can use the left-right arrow keys to go through them and look at these different movies. So you might have one that looks really bad, and you can look at it and see is, oh, maybe that's actually z-drift, and I can be careful about whether, how I interpret my signals, depending on those types of movements. So that's it for registration. So does anyone have any questions about registration?
AUDIENCE: I have two questions. The first question is, do you have examples of what it would look like if, say, that there was z-drift, like the difference animation that you have, just so we can get like a visual of it?
CARSEN STRINGER: Yeah, I should include one. I don't have an example here, but we should post one on the read the docs. So what it would look like, at least in a difference image, is you'll have cells coming in and out. Like you'll have a cell that will look like, so when a cell comes in and out of the plane, it's actually more like a sphere, that's surrounded with an empty hole, because the nucleus isn't often dyed.
So it's kind of this weird thing where you'll have it, it'll be smaller when it's like at the top. And then when you get it cut through the middle, you'll have more pixel weights on the outside, and you'll lose the middle. So you'll have a bunch of these rings in your difference image. And those are pretty clear signs that it's probably a z-drift component.
But we have images of this in our review paper, too, that you can look at. But I don't have an actual GIF of it, which would be nice, from a real recording.
AUDIENCE: And then the second question I had is, this seems to do great registration for like a plane. Do you do it through a volume at all, through Suite2p?
CARSEN STRINGER: No, we do not support volume registration at the moment. So some people who have used Suite2p for volume registration, they'll usually register each plane separately, and then find the shift between the planes afterwards, and put them together. What kind of recordings are you trying to do volumetric registration in?
AUDIENCE: I guess, I was just curious, if that's a capability you have yet.
CARSEN STRINGER: No, no, yeah, good question. But, yeah, so the one, people I know using it in that way are still using, are doing somewhat sparks recordings like [INAUDIBLE] [? peri-caligulis, ?] where that sort of 2-step approach works. But you might need something different, for instance, if you're doing really dense imaging in a dense brain, like zebrafish or something like that.
But, I mean, oftentimes, people are separating their planes by enough, so that they get different neurons. So then 3D registration wouldn't work. But there's definitely cases where people need 3D registration too. But, yeah, unfortunately we don't support that.
Well, so you can see all these things in the GUI. And we'll see in the notebook, in the output ops, a bunch of these things will show up, too, that you can look at, in terms of the shifts that you have, and also what these offsets are. OK, and so the next step is cell detection. And so there's two ways you can do cell detection in Suite2p.
And so one way is anatomically. So the best anatomical cell detection is still, I think, you aid it with functional information. So instead of using the mean image, you use the max projection image, which is the maximum value of intensity of a pixel across the whole movie. And so that's what this image looks like, for the recording on the previous frame.
And so we do, if we run cell pose on this, we get around 679 cells, which isn't bad. But with functional imaging, we actually get-- I will have it on the next slide, actually. But we actually get an order of 2 times more ROIs or more. So there's a lot of things that still won't show up in this max projection image, that might only fire a couple of times and might be kind of faint.
And so it's still, usually, it's the case that you want to use the functional algorithm. And so there are some use cases for the anatomical algorithm. And that's the case, for instance, if your baseline calcium intensity is high. So for instance, here's a hippocampal recording. And the intensity is relatively high for all the cells. And the segmentation looks pretty good.
It doesn't look like it's missed very many cells in this volume, or sorry, in this plane. There's also a case like, for instance, this is a deep imaging recording, and so these cells were pretty faint to begin with. And so the functional segmentation didn't work so well. But the anatomical segmentation on the max projection image actually looks pretty good.
And another case you might want to use it, if you're trying to do recordings across many days, and you're sometimes missing cells because they weren't detected in one day versus another, you might want to also try doing this anatomical segmentation. But the people I know still doing that, they often still do functional segmentation on each day. And then they'll ultimately make a big library of all the functional ROIs and then apply them across days again.
And so they'll ultimately still be using the functional ROIs. They might just have found them on a different day. So, generally speaking, I still don't always recommend it. But in certain cases I think it can be useful to have this tool.
AUDIENCE: I have a question, actually. So I've tried running the functional segmentation, and I would get a lot of what seemed to be like axon buttons that were detected when I wanted them to be cell bodies, even though I adjusted the diameter, whereas when I did the cellpose it did a much better job at getting the cell bodies. So I was wondering if people encountered this issue before with functional detections.
CARSEN STRINGER: Yeah, I've seen that sometimes in low SNR cases, and also so you can set also the spatial scale. So in the new version of the algorithm, actually the diameter is not used anymore. It's a spatial scale. But I have seen that happen in a case where, particularly in viral injections, those axons or buttons, yeah, actually, I would think it's more dendrites that are very bright. So maybe I haven't seen it like yours.
AUDIENCE: Yeah, it might be dendrites as well. It was in a transgenic animal that I was using. But I still had that issue.
CARSEN STRINGER: Hmm. Yeah, I guess some other things, maybe spinning more or less in time might help, and so making sure that you set the time scale right. You can also high pass filter in time a little bit. There's a few other things that you could try, that maybe we can talk about. But, yeah, it's hard, also it's kind of hard for me to know without seeing the data, so--
AUDIENCE: Mm-hmm.
CARSEN STRINGER: Yeah, it's a good question. I think like what would be really nice too, and I haven't done this, is like there's lots of people using Suite2p that, honestly, I also don't know about. And it would be cool if they would post pictures of their segmentations and their parameters somewhere, so we can see what works for some things and what doesn't work for other things.
Yeah, good question. Yeah, so if it isn't working, then you should, it's great that the anatomical was working better for you. So I'm glad. And I do, yeah, it is nice to have it as an option. Oh, and another thing, too, is cell pose is a deep neural network. So it's trained on image data. So if you have more cells, if you have images that aren't getting segmented so well, but you think they should work, you can also submit images to cellpose, and we can add them to our training data set.
Or you can retrain cellpose as well. So it's different than the functional case. The functional imaging case I'll go through now is a clustering algorithm. But this one is actually a deep learning algorithm. So it needs to see data that's somewhat similar to it, to actually perform well. So, in our training data, we do have hippocampal neurons, and we have cortical neurons from visual cortex.
So those instances of data are there. But other types of data aren't. And then we have plots of cells from all over the place, but not any other 2p specific data sets.
AUDIENCE: How much data do you need to train up cellpose.
CARSEN STRINGER: That that's a good question. So we used 600 images. But there were lots of cells in each image. So we had around 70,000 cells. And, yeah, on the order of-- so you need, ideally, you need kind of an image that's somewhat in the same class as the images that you're trying to segment. And then it works fairly well.
So you need at least like, I would say, probably like 200 to 300 images, to get a good baseline model. But then, on top of that, you need to have some images in your training set. You don't need to have a hundred of them from your type. But you need a few from your type to tell the network what it should expect from those types of images.
So it can learn the baseline flows from a few hundred images, but then it needs to see some images that are sort of similar to the types of images you're using, to do fairly well. But we did it, like on purpose we tried to find as many different types of images as possible, so it actually does work pretty well out of the box, for lots of different types of images. But there are certain images that aren't very related to our training set that don't work so well, as well.
But, yeah, so also all the cellpose data is online, too, so you can download it at like cellpose.org/dataset. So if you want to play around with that data, you can, too, because no one should have to circle 70,000 cells again, like our lab technician did.
OK, so the functional segmentation, we're going to take this movie. And it's difficult to work with this movie, which could be many thousands or hundreds of thousands of frames. And we bin it. So the first thing we do to make it more manageable is we bin it in time. And so the bin size is equivalent to the GCaMP time scale.
So like, for instance, if I am recording at 10 Hertz, and my time scale of GCaMP is one second, I'll bin in bins of 10 frames. And so that's because, on average, we expect the signal to be the same for those 10 frames, because the GCaMP's decaying. So we expect that each of these bin samples is an independent sample that we can use for our extraction.
And so we take this bin movie, and then we high pass it in space and time. And so that gives us these things sticking out. You can see these cells firing inside this movie, that we're going to try to find. And so now we take this movie, and we're going to basically filter it in space, like it's this kind of down sampling technique that we use.
So we filter in a 3 by 3 bin, and get these kinds of smaller little pictures of the cells. So this is kind of weird. So the reason we're doing this is we expect, so the first bin size is like 3. And so that's going to bin in space across pixels that, basically, are within three pixels. So that's kind of small things.
And so, in those maps, there will be peaks there, wherever there's an object that's like three pixels or greater, and the three pixel ones will probably be the bigger ones. And then if we start down sampling the smaller maps, we start, basically we're binning even more. And so, if the signals are different for different pixels and different maps, they're not going to be as bright, basically, as they would be at the top level.
So you could have, I bin again, so I've got three, I've got something that's three pixels across. I bin to six pixels. But the thing next to my three pixel thing is totally different. Then those aren't going to add up to be anything big. I won't see anything big in that map.
But say I have a bunch of cells that are a diameter of 6 pixels, and I've done this binning of six pixels. Then I'll have peaks in that map, wherever there's approximately a cell, because I have objects of about that size. And so you can keep downsampling and keep downsampling, and you will have the biggest, we expect the peaks to be different sizes on each map depending on how much we bin. But basically you're trying to approximately find things at different scales in each of these maps.
And we have this overall, this is a map combining things of different sizes. And we basically find peaks within this map, and that's kind of where we seed our ROIs. So the idea is, it's kind of confusing, but it's like we have this whole movie that's like 512 pixels by 512 pixels by time, by bin-time. And we're just like making these smaller versions of the movie.
And we expect signals to be bigger if we binned parts of the movie that are correlated to each other. So like if I've got a cell here, and I bin the 6 by 6 pixels around the cell, that signal is going to be pretty high throughout. But another area where there's no cells, I bin that, it's just going to look noisy.
And so we have these different maps at different scales, that have these kinds of traces at these different sizes, that we can use. And so then we basically find peaks in these maps across time and space, basically. Like we say whichever map, it could be we find a peak that's biggest in this map, and we start with that one. And that would be seeded with a 3 by 3 map. And maybe we find a pixel in the 24 by 24 map that's biggest.
And we seed it with a 24 by 24 block. And then we start making our cell. Let me show the initialization, so that's more clear. So, yeah, so we start with a block here. Say we found this pixel that had a lot of variance in my 6 by 6 pixel map. So I'm going to start with a 6 by 6 square.
And that's what I'm going to seed as my cell. And I'm going to say I'm going to grow this cell, if this is like a doubly sparse algorithm. So we're trying to find these sparse peaks, basically, like in this map, that are correlated. And we also are only going to grow it based on when the cell is active.
We expect the cell to be active like kind of sparsely through this signal. So we only take, we have a threshold that we define, which is this threshold scaling parameter in Suite2p, which you can change. And that tells us how many frames are active for this ROI. And we only use those frames to determine the shape of this ROI.
And so that might sound kind of weird, like why wouldn't we use all the frames. And that's because we might have overlapping cells on top of each other. So we're only going to use these frames that where this particular box is super-active. And you could have a box that's slightly offset, that could be active at slightly different frames.
And so this is a comp, like we still have some problems with merging, that we're thinking about how to solve. But this helps to reduce the number of merges we have of cells, like kind of becoming large blocks, and keeping cells distinct from each other. Yeah, so we basically, we have these different pixels.
We have these different frames, and we basically take the average activity around this box, basically, and say, what is the average activity on these active, what we call active frames. And that kind of makes our cell shape. And actually it wound up being the case that we found something slightly smaller than this box, where the cell, where the pixels were significantly active. And then we created this little ROI.
And in this case, it didn't keep growing. In this case, we started with this box, and it got a little bit of some dendrites going on next to it. And then here's another case where we got some dendrites coming out of it. And in the end, you'll get a picture that hopefully looks like this, if it's working correctly.
But you can see like there's a lot of-- there are some parameters, like this threshold, for instance, and also, basically, we try to define this threshold based on the map where we see the most peaks. And so you can force what size that should be with the spatial scale parameter, say, you think it should be 24. You can set the spatial scale so that it should be a size 24.
So that will define how big your threshold has to be. But, yeah, so there's a few parameters in here that can change how things run. But our hope is that, in general, it should work to try to find sparse signals in your imaging, and hopefully not merge too many cells. So there's an example that hopefully things look good if they look like this. If you have dendrites, with this growing procedure, we can also find some dendrites in your data. And, yeah.
AUDIENCE: I have one question. I was wondering if you can do detection of multiple channels. So say you have RCamp and GCaMP. Can you do it for both?
CARSEN STRINGER: Yeah, that's a good question. So right now, basically, we have support for it, but only in a notebook. So there's no support in the GUI. And the way it works is you will run it once for both channels. And then you'll swap which channel is the functional one, and it won't run registration again.
And you'll run it on the other channel with the function, with the other channel, the other RCamp, for instance, as your other functional channel. But there's no, and I don't think we would want this, but there's no interfacing between the two channels during the detection, because I think, generally speaking, you'll have different things being expressed in different channels. But I'm not sure.
AUDIENCE: There's a frame rate, where effect is a functional fraction. Frame rate.
CARSEN STRINGER: Yeah. Yeah, so the frame rate, yeah, definitely. So the frame rate defines how much you decide to bin. And so you want to set the frame rate so that the binning is, once you set the frame rate, for instance, so the frame rate is 10, and your GCaMP scale is one, then the binning will be 10. And so, if you think the binning should be different, I'm not sure, I think there's a parameter that you can change for that.
But then you could also, I mean, you could hack it and say really my frame rate is 60, and I should bin more to average more across time. Or I should have bin less, but generally that should be captured in your time scale, like tau. If you decay faster, it should be smaller, and in that frame rate. Does that make sense?
AUDIENCE: Well, if I use a GCaMP6f, should I adjust this GCaMP time scale?
CARSEN STRINGER: Yeah, so I think, I'm not sure what the default is in Suite2p. But I think around one is still good for-- but you could try doing like 0.5 or something like that, if you think you're missing some cells. Yeah, it's also, yeah, it's like tricky too, like how much do you want to average across frames, so because there might be noise in your recording.
And so something we've recently added, to hopefully help with some noise, that you could account for by binning instead, is to add this denoising step, which I'm literally just going to flash up, because I realize we're out of time. But we do this PCA denoising where we divide the frame into these little patches again, which are basically the same patches we used for non-rigid registration. And we take the top 32 PCs in each of these patches, and we recreate our movie, that little like binned movie that you saw, with the PCs.
And so we get this smoother looking thing, which will then give us, it's a little hard to tell on this image. But actually the ROIs look better. It's better to see here. You can see it really follow this dendrite here, a lot.
So by default now, PCA de-noising is on in Suite2p. And so, if you want to turn it off, you have to set ops, I think it's ops de-noise, to 0, or ops PCA, no, I think it's ops just de-noise, to 0. But, in general, we think it helps. And we have the classifier, and actually I ended up having to change the classifier, because I added all these nice cells with these pretty dendrites.
But then the classifier was really angry at me that I had done this. So now we, in order to compute some of the statistics we use for our classifier, we erode the dendrites, in order to basically cut off these dendrites when we compute the statistics of compactness that we use in order to determine if an ROI is a somatic ROI or a dendrite. Yeah, but I should skip over this, because we're out of time.
This is another way to try to improve the classifier that's not yet implemented in Suite2p, so you can just basically not care about it yet. But hopefully we're going to improve the classifier soon, with some ground truth data. And then the last thing is, here's the Suite2p GUI, which, again, I recommend you try it out, especially if you have big recordings where you can look at things across space. And so I can help you to export your data.
And so, in summary, I showed you that registration is very accurate, that, in cell detection, we have cellpose can help in some cases, and PCA de-noising also can help in some cases. I think it's most cases. So now it's on by default. But this is a word to the wise, if you think it's not helping, you can also turn it off. I didn't show this, but we have a classifier built on some ground truth data.
And so I think that's working better. But I haven't added it yet to the code base. It needs a bit more work. And so what's next in Suite2p is we're going to add de-mixing, overlapping cell signals, and so this is something that's pretty close to being done. But there's also this fact that we do end up in some cases with some merged cells, so we're thinking about how to combine this demixing step with some splitting of some of these merged ROIs that we sometimes get, and also building a better classifier.
And so I had, yeah, so then my last thing is your add-on here. And so we're happy to take suggestions in GitHub, but we're also like just as an issue. But you can also contribute code. So feel free to make pull requests, we're very happy to accommodate them. And I'm happy to give tips if you're not sure where to start, if you need to change something in the code.
And finally I'd like to thank, so the main collaborator on all this is Marius Pachitariu, and so most of the data I showed today was from his lab. And a lot of the code development was also with him. And then also Nick DelGrosso and Chris Ki helped us last year on the code, and refactoring Suite2p so it's a little bit easier and a little more modular for people to run different stuff separately.
And so, yeah, thanks, everyone, for your time. I'm sorry I went a bit over.
PRESENTER: Thank you very much, Carsen. [CLAPPING] Let's give Carsen a big hand.
AUDIENCE: I have a question. So what would be your suggestion for decreasing time it takes to go in and manually curate the ROIs?
CARSEN STRINGER: Yeah, that's a good question. So there's two things. It depends on what types of ROIs you have to curate. So if it's clear that it's a size thing, or something like that, you could do this in the Jupyter Notebook. You could say, let's throw out all-- if you're doing post-processing with some statistics, you could say, let's throw out all ROIs greater or less than this number of pixels.
Likewise, you could do the same thing with compactness, and draw these hard boundaries, so the classifier is a logistic regression classifier that uses, sorry, that uses all of these different-- that uses the skewness, the compactness, and the number of pixels in the ROI. But it kind of trades off between all of these. So, like, if you have something very compact, that's not too skewed, it might still keep it, because it's like kind of an average size.
It might just be a noisy cell that's on the boundary of your recording sample. So there could be cases where those sorts of things happen. And then you just want to, in your post-processing, draw those cutoffs, based on, in the GUI, you'll see the statistics. Have you seem like you can see the statistics for each ROI you click on?
So you can think about, as you look at those things, where you would want to draw those cut-offs. So that's the first thing I'd recommend, if that's feasible. Yeah, so the other thing I would recommend is, if that's not enough, is to try to train your own classifier on the data. But if you're not adding more statistics than just skewness, compactness, and number of pixels, I don't think the classifier is going to change too much, depending on the specific data.
So generally, I usually recommend the first approach first, to try.
AUDIENCE: Right, I actually don't have any pictures at the moment. But I had a quick question about running the code from the notebooks. And I was trying to use one of my own classifiers that I built from just classifying a bunch of cells for my specific images. Actually I can probably, if you give me a couple minutes, like later I'll pull it up. But, just in general, where do I implement that, because I was trying to then use that classifier so that it would automatically choose the right ROIs based on what I'd already been choosing myself, and could not get it to work, so--
CARSEN STRINGER: Ah, OK, yeah, we should talk about that. So like the ops classifier path thing didn't work?
AUDIENCE: Yeah, and I may have been implementing it incorrectly. But I saved like an NPY file with the classifier that I made it, but then couldn't get it to then use it.
CARSEN STRINGER: OK. Yeah, we should talk about it. Yeah, maybe we'll have a little time afterwards, or like, yeah, ping me. I think I saw the issue on it too. But I didn't--
AUDIENCE: OK.
CARSEN STRINGER: Yeah, I didn't actually implement that part, so I wasn't sure how to answer it. But, yeah, you should ping me afterwards, and we can talk about it.
AUDIENCE: OK. Sounds good.
CARSEN STRINGER: Yeah, because this is a problem that everyone is having then. So--
AUDIENCE: OK, sounds good. I'll also look for a video like that I probably made the classifier from and it might clear things up.
CARSEN STRINGER: Cool. Yeah, that'd be cool to see. Here I have a couple of things I changed. So something that often happens is the batch size, if you're running on your laptop, the batch size is default by is 500, and so you might run out of RAM. And so decreasing that usually helps. And then the threshold scaling, I actually increased it here, so to find fewer ROIs, because I had a bunch of these like dendritic ROIs that were being found that I didn't really care about.
And I wanted to make it run faster for this. So I basically just said, let me set the threshold scaling to 2, which by default is 1. But if you want to find more cells, you would set it, for instance, to 0.5. And there's also another thing here, I don't have here, but I think it's, let me see how far I've run up to. If you're getting to this point, sometimes you'll get, is it n into a huge-- which I see it is.
Yes, so there's the classifier path that's supposed to work. So anyway, if you're getting to a point where you're hitting, there's a number of iterations key that I can't remember what it's called, that's also in the documentation. So if you're hitting, this is something that happens to people sometimes, they hit 5,000 ROIs. And they keep hitting 5,000 ROIs. And it's because you need to let it run for more iterations, basically.
And so that happens sometimes, too, if you have really dense recordings. I'm just going to say one more thing, which is that we have the ops fs parameter, which is the sampling rate of the recording, that and the tau, the time scale that you kept, are the pretty important parameters. And then otherwise you're pretty much good to go. Here I'm setting the DB, and that's pretty much it.
Cool. So the people who are using Suite2p, are you all doing cortical recordings, or are there different brain areas?
AUDIENCE: I'm not doing cortical recordings. I'm going subcortical, like through a lens.
CARSEN STRINGER: Ooh, cool. What area?
AUDIENCE: LDN. The thalamus.
CARSEN STRINGER: Oh, awesome. I haven't seen bin [? LancerCort. ?] Is it a [? Grinlands? ?]
AUDIENCE: Yep. Yeah.
CARSEN STRINGER: Cool.
AUDIENCE: So it's pretty easy to use, and Suite2p has had pretty good success. But I'm not sure that I'm doing everything right. I also have tried to adjust the classifier myself, and like didn't know if I was doing it right. So I'm interested in seeing how you're actually supposed to do it.
CARSEN STRINGER: Yeah, let me, so I think you are supposed to be able to, in theory, set this ops classifier path to the right path. And so right now it's 0. But if I were to set it to the right path, but I'm not sure, I would have to upload one, right now, basically. And in theory, if I upload a classifier, then I could use the path.
But you would have to run it in a notebook, not run it from the GUI.
AUDIENCE: Wait, are you saying there's no way to change it through the GUI, or--
CARSEN STRINGER: So you can run it on the GUI, if you like load a file and then say apply this classifier. But if you want to have it run the same, like this classifier every time you want to run your recording, you would have to run it in a notebook. Or in the GUI you could also say, load ops, and you could have an ops where that path is saved, and you have to upload it there, because it's not revealed as one of the parameters you can change in the GUI, basically.
AUDIENCE: Correct me if I'm misunderstanding, I think like my impression was that Suite2p does like all of the hard work of motion correction and cell detection and all that. And then later the classifier just like just sorts things that are ROIs into things that are cells and things that are not, right? So that part is not actually that hard, computationally, too.
CARSEN STRINGER: Yeah. Yeah, it isn't computationally hard, and, oh, yeah, and also if it's your default classifier that you have set, that you've been changing, then that should run as well. And you should see when it runs. And I have the text here actually for this run. You can see, oh, actually, I could have stolen this path here.
You can see the classifier. It's running. So if that path is to the path of the classifier that you've been changing, like you can go to that file and check that that's the one that's been updating. Then you know that you're good. So it would change the classifier underscore user, I think, if I was on a normal, if I had opened the GUI, it makes those files.
Actually that's something I should change, because if you run it first in a notebook, I don't think it will make those files for you by default. If you're running it just in a notebook, then you're probably not curating cells. So this notebook has some, it shows you what the outputs are, and it shows you how to load them, which is nice if you're not usually using Python, and kind of visualize them, and visualize here are the offsets in x and y from the registration, just something you might want to look at, if you're having issues with that.
You can look at the register frames here with this little scrollbar. I'm not sure if I need to run this again. There it's a little slow. But it's faster in the GUI, basically. So everything in terms of exploring the data will be better in the GUI. But if you're going to run post-processing analyses and stuff, then you're going to want to do it in a notebook.
And here is loading in ROIs. They have all these different keys. And this is where I might add some code, like if I have my iscell, are all my zeros and ones, I might say, for I stat and enumerate, stats, I might say if stat compacts less than 1.5, I want to now say iscell is always 0. So this is, I could set now here all of these hard boundaries that I was talking about for the classifier, and say my iscell is now set to here. And then, once I'm done with this, I'll save my class of my outputs to the same path where they were before.
Basically I would say save this. I would resave this output, and then I can load it back in the GUI. And the GUI, as long as you've saved it to the same path where you loaded it from, the GUI's going to take this file again, and you can look at, if you're happy with these automated-- or sorry, these manual kind of boundaries that you draw for your ROIs. And you can also, here's a way to make the cells.
I don't really have anything else exciting in here. It's, yeah, hopefully some useful code for people to use, but otherwise not too exciting.
AUDIENCE: How would you go about adding a new classifier, and like, say, these boundaries are something that you want to use consistently? You would just save that into its own, like dot PINV like file, and then upload it?
CARSEN STRINGER: Ah, that's a good point. No, so we don't currently have that available.
AUDIENCE: I see.
CARSEN STRINGER: So the only way, yeah, so the classifier, a classifier file is actually not, it's actually a file which keeps in memory. It keeps your statistics, and whether they're a cell or not. And then we apply the logistic regression, we fit the logistic regression model to that data, every time, basically, because that's very fast. And then we apply it.
So there's that logistic regression won't learn, basically, these hard boundaries. So we would have to add on. That's something you could ask for in an issue as an enhancement, that that's something we could add as an option. If you have those keys in your classifier, we could use them in our pipeline. But right now that classifier file is just stats, and if that cell is a cell or not, so that we can match the stats to your ground truth.
Yeah, it's a little weird. But, yeah, so that's what's inside your classifier files that you're making, is basically. So if you also wanted to make them in a notebook, you could make them by concatenating a bunch of iscell files. But you probably don't want to do that manually. Ooh, I like that.
AUDIENCE: So here I'm actually looking at mostly dendrites. And this is in visual cortex of mouse layer 2, 3, or so. And so when I used the spatial scale, like parameter, to detect. And so I get a lot of like non-specific ROIs, which is fine, because then I made that classifier, that sort of sorts it into these looking structures. And then I have another classifier that picks out some of what I'm imaging at more of the cellular level, because this is apical dendrites.
CARSEN STRINGER: Cool.
AUDIENCE: And so, yeah, it works quite well, and like I get these good activity in the dendrites, so it's neat to see. But it's basically, like it would just be helpful to implement the classifier, because then I wouldn't have to like-- basically I start off by just making everything not a cell, and then just clicking over a couple of dendrite, because if I'm getting sparse labeling, I can just click over the couple of dendrites that there are. But--
CARSEN STRINGER: No, we should get that working, because that would be good. It doesn't--
AUDIENCE: I actually tried trying it again, because I did, so I started looking through the notebooks and trying to figure out where it was I supposed to put the file path. And I think, if I played with it a bit more, I could get it to. But it's helpful to see this notebook, that was just showed on Colab, because I think where you pointed out, I can make sure that I have it the same in my own data, or in my own attempts.
But this is essentially what I'm trying to do, because then there's like different videos in which I'm also looking at video projects, but just different layers. And it'll look completely different than this. I don't know. This doesn't really play, I guess. It doesn't-- if I pull up the video, maybe in ImageJ it might do something more exciting, but this is more like--
CARSEN STRINGER: Oh, yeah, you can also do the-- have you used the register binary feature?
AUDIENCE: No.
CARSEN STRINGER: OK, Actually go up to the top. Yeah, all right. I'm going to make you do it now. If you go up to the top and you see the registration menu and click, yeah, there. This is this something I didn't show. But, yeah, so this, you can press play, and you can play your movie here, too.
AUDIENCE: Oh, where to play. Ah, I never knew how to play it in the GUI. So actually another thing is I don't, so I've already registered all of my videos using an ImageJ macro. And so I actually haven't tried the Suite2p algorithms for this yet. But that's something that I can also go back and see if that makes any difference in the analysis. But here's the video. I guess, so is this the ROI that's currently selected, the one that's here.
CARSEN STRINGER: Yeah, you should be able to like click around the frame. And you might be able to click other ones, too.
AUDIENCE: Oh, OK. Oh, that's cool.
CARSEN STRINGER: Yeah, I'm making you demo Suite2p for me.
AUDIENCE: Yeah, so like there's this active dendrite, oops, I meant to go this one here.
CARSEN STRINGER: Oh, all right, so here's, actually, you have a good case where we can say whether-- so the registration actually looks pretty good from this. You could also open up the registration metrics in the same window. And you could see what the view registration metrics look like too.
AUDIENCE: OK, but if I didn't do the registration in Suite2p, then would that--
CARSEN STRINGER: Oh, then, you won't have the registration metrics, if you turned it off completely.
AUDIENCE: Yeah, so I did, I set that to zero for the purposes of this analysis for now.
CARSEN STRINGER: Ah, yeah, so you won't have the metrics.
AUDIENCE: Yeah.
CARSEN STRINGER: But a good way to check, especially if you're doing like dendrites, if you see a bunch of like, you'll see if the registration is poor for things like dendrites, and faint cells, you'll see a bunch of things found next to them, too, that are kind of like offset, because there were frames where things were messed up. So if you're not seeing a lot of that, then it's probably pretty good.
But if you're seeing some things like that in some of your recordings, you might know that, OK, maybe I need to run the registration again, I think.
AUDIENCE: OK. Yes, I think that's a good thing to check, like now that I'm looking at this like playing video, and also with the ROIs drawn on it, you can kind of see better if it's shifting a lot.
CARSEN STRINGER: Yeah.
AUDIENCE: Wait, here's the big dendrite.
CARSEN STRINGER: Cool, that's not, all right, that's better than I expected, looking at that video, that it found those.
AUDIENCE: Yeah, and so it actually works surprisingly well. And also what I've been using is to then generate, I think there's a bit of code in the original folder, when I downloaded it, that generates the map of all the ROIs. And I've been feeding that into another algorithm, and I'm analyzing that in another GUI, Astrocytic, GCaMP data as well. And then correlating these two signals, and so the fact that it can generate these really detailed ROI maps is very helpful for my analysis.
CARSEN STRINGER: Awesome. Cool.
PRESENTER: So we are at 2:30, or actually a couple of minutes over. I had a quick comment, Carsen. So your pipeline is impressive for identifying neurons and pixels that have significant signal over noise, to put it simply. And, of course, in our lab, we also have Katya and Tyler doing spine imaging, where I'd be curious to see how there we have a complementary problem, where we know where activity should be. We know the bumps that are the spines, or there is a dendrite that is linking them, for instance. Or here the wire on which there are bumps, and these are all synapses.
How do we pull out activity in a reliable way. And so I'm not going to take up any time. But I know you're going to meet with Katya, and I appreciate your giving some thought to this. Or maybe you already have solved this problem.
CARSEN STRINGER: Yeah, I don't think we've solved it. But at least adding the denoising step I think is helpful. Like those cells that I really briefly showed, but like with those longer dendrites finally being found, I think that denoising stuff is helping with that case.
PRESENTER: I see, well, OK, well, I'll look forward to hearing more about that, later on, then.
CARSEN STRINGER: Yeah, good question.
PRESENTER: OK, with you. Maybe we should continue this conversation. All right, well, thank you very much, Carsen. This was amazing. Your work is amazing. And we are delighted that you were able to spend some time with us and talk to our students. Thank you.
CARSEN STRINGER: Thanks. And one last thing from me, is if you have questions, post them on the GitHub, because then everyone gets to see my answer to them. And I'm sorry, I usually troll the GitHub like once every month and a half. But I promise I will get to your question eventually. So yeah, thanks, everybody, for listening.