Leveraging the Allen Brain Observatories
Date Posted:
August 11, 2021
Date Recorded:
August 10, 2021
Speaker(s):
Saskia de Vries, Allen Institute for Brain Science
All Captioned Videos Brains, Minds and Machines Summer Course 2021
Loading your interactive content...
CHRISTOPH: A PhD with Markus Meister at Harvard, and then she continued to work at Stanford in invertebrate vision, and then came to us with all this expertise. And she's really been the key person driving these experiments. She'll now talk to you about this large-scale survey, as well as all the infrastructure, as well as the outreach, and the Python program-- the Notebooks.
She gives many lectures and tutorials. And she's also one of the organizers of our own-- just like you guys-- at MBL, we teach our own summer courses-- also I'm happy to say in person here on Friday Harbor, on one of the islands north of Seattle. Saskia?
SASKIA DE VRIES: Thank you, Christoph, for the introduction. Thanks for letting me join you this morning. As Christoph said, I'm going to be telling you a little bit more about some of the observatories that he introduced you to, and digging a little bit into the data, as well as showing you some of the tools for accessing the calcium imaging data, specifically.
So as you heard from Christoph's lecture, we have a number of parallel pipelines that we've been building over the last several years to systematically record in vivo physiology in the mouse visual system. And they leverage slightly different modality. So we've got calcium imaging as well as the dense electrophysiology.
And we have some that have active behavior, and some that are passive viewing. And other than those key differences, which are very big differences, but everything else we try and keep as consistent and as stereotyped as possible so that these data sets can be as interoperable as we can make them. So I'm going to be telling you mostly about the visual coding 2-photon data set, which was the first data set that we released that uses calcium imaging to record the activity of hundreds of cells at a time in response to a battery of different visual stimuli.
And this was set up to create, essentially, a physiological survey of the mouse's visual cortex. And what we did was, we sampled data from many different cortical visual areas. So this is an image of the surface of the mouse cortex. This actually shows you some of the areas that are defined through the common coordinate framework.
You can see the barrel cortex up here at the top. Here's primary visual cortex-- this big area back here. We've collected data from primary visual cortex, as well as several of the higher visual areas that surround it. So we've got data currently from five of these higher visual areas that are near it.
We've also leveraged a lot of the genetic tools that are available in the mouse in order to record data from specific cells. And so we can use genetic tools to express the calcium indicators. And we use GCaMP. We use a genetically encoded calcium indicator.
And we can target it to specific cells. And so we can either express it-- we have some drivers that are what we call pan excitatory. They're expressed across all of the cells, all of the excitatory cells across all of the layers. Others are specific to excitatory cells that are found in specific layers, and sometimes, even in subtypes of specific layers.
So for here, for instance, we've got two different drivers that both target different populations in layer five-- so cortical thalamic projecting neurons, as well as cortical projected neurons in layer five, for instance. And then, we also have drivers that allow us to target the inhibitory subtypes of neurons-- VIP, somatostatin, and parvalbumin-- that allow us to get these specific interneurons exclusively. So these tools let us sample different types of cells based off of transgenic characterization, as well as layer specificity in order to expand this survey in these different dimensions.
And finally, as I mentioned, we use a battery of different visual stimuli that include some classic stimuli, such as drifting gratings, or sparse noise stimuli, as well as naturalistic stimuli-- flash natural images, or natural movies, in order to really densely survey the visual responses of these neurons in the populations. And so by having these different axes of our data set, it allows us to set it up in order to ask questions about how visual information is represented and transformed through this circuit, whether we see differences in any visual responses or in physiological profiles across these different cortical areas or layers and cell types, and whether the stimulus statistics of the stimuli that we show can affect the encoding properties of the neurons in the populations. So again, I mentioned this is 2-photon calcium imaging.
And you saw a similar video during Christoph's talk. But just to refresh you, we can image a few hundred cells at a time. This is one imaging session that is being played back on a faster speed. And you can see in this panel here that different cells-- they fluoresce at different times.
So when the cells fire spikes, calcium floods into the cell. And that causes the calcium indicator to fluoresce. And so we see that different cells light up at different times. And this can be related to the stimuli that we're showing, which I'm showing you on the far right panel, where you see noise, or movie stimuli.
It can be affected by the running activity of the mouse. I'm just going to play this again. As you can see, this mouse-- at different times, it chooses to run. Sometimes, it just stands still.
We know that can have an effect on some of the activity in the cortex. And then, we also have a camera that records the pupil, or the eye of the mouse that's watching the monitor. And so we can record both the pupil area as well as the position. And you can see this red dot that we've superimposed on the stimulus that corresponds to the position of the pupil during the experiment.
So these are all of the data that we're recording simultaneously. And we have a pipeline that process and packages this data. And what it does is, we have some algorithms in order to identify, to segment out the different ROIs that we believe are different cells in our field of view.
We can use that to extract the fluorescence traces for each cell. That represents the activity of the neuron. We've got all of the information about which stimulus is shown at what time. And we can get those temporally aligned. Those are indicated with these shaded colors in the plot, here.
We've extracted the eye position and pupil area, as well as the running speed of the mouse. And all of this data, as well as a number of other pieces of data, get packaged together into an NWB file. This is a standardized data format for physiological data. So you can see here a list of most of the components that are in this file for each individual session.
That includes these fluorescence traces, the masks for the ROIs, the stimulus information, the actual stimulus templates-- these are the images and movies that are shown-- the running speed, as well as a lot of metadata about the animal and about the experiment that can be useful for people to have available when they to analyze the data. So using this pipeline, we're able to collect over 1,400 hours of data from over 250 different mice. And so you can imagine that that creates quite a lot of data, and requires quite a lot of infrastructure for processing all of that data.
And just looking at these fields of view, you can see there's a lot of differences in the density of the neurons, which yields differences in the signal noise of each field of view. And so all of our data processing methods have to be very robust to this data. And like I said, we've collected over 400 hours of imaging.
This table shows how this breaks down across the different transgenic lines that we have. We have 14 different transgenic lines-- that we've collected data across six different visual areas. So primary visual cortex here, and then these five higher visual areas.
So in total, we've collected data from over 63,000 neurons in 456 different experiment containers. Now, each container consists of three different imaging sessions. So we return to the same field of view, the same set of cells, on three different days in order to sample our full visual stimulus set.
So we have many more hours, but for 456 containers. And just again to unpack this for you, I went over this a moment ago, but just to reiterate-- these different transgenic lines, they map onto both excitatory and inhibitory cells. Some of them are broad, pan excitatory lines. Others are layer-specific.
Then, our inhibitory ones are specific to these interneurons. The vast majority of our data is collected using GCaMP6 fast. There's a little bit of data that was collected using GCaMP6 slow, primarily the data for parvalbumin-Cre line, but a little bit of data with one of our pan excitatory lines, this SLC line.
And you can see that not every Cre line was sampled across all of the different visual areas. And so this table allows you to see how that sampling was done, as well. Right.
So I mentioned that each field of view-- we returned to the field of view on three different days, in order to sample our full stimulus set. So the first session will include, for instance, drifting gratings in natural movies. And then, the second session will be static gratings and natural scenes. The last session has these locally sparse noise.
Each session has at least five minutes of spontaneous activity, as well as five minutes of one of the movie clips that we repeat in each session, so that we have one piece of stimulus that gets repeated across all three days. Each session is packaged in its own separate Neurodata Without Borders file. And so for each experiment container, there are three different MWB files that capture all of that data.
And so then, we have our software kit, the Allen SDK. This is a Python analysis toolkit that allows you to access the data. It allows you to find experiments based off of metadata. It allows you to extract the activity traces, as well as other pieces of data that are in the NWB file.
And there's a lot more documentation about this. This is a Python toolkit. You can install it using pip install allensdk. And I'm actually going to take a few minutes to actually give you a brief demo of how you can start pulling out these data so that you can start working with it.
And so I want to point you-- I've created a folder. We have a repository called Brain Observatory Examples up on GitHub. And we have a folder here for your course, BMM 2021, where we've got a couple of files in there that I'm going to show you. This is one of the Notebooks in this folder.
Let me actually just jump over to the repository. This is the 2-photon visual coding tutorial. I've also provided a neuropixels visual coding tutorial that my colleague, Josh Siegle put together.
And so if you're interested in accessing the neuropixels data, this is a great place to start to just see how to access the data and how it's organized. Also for both of these data sets, we've created a cheat sheet. So I have a physical copy, just a two-sided piece of paper that you to the dimensions of the data set, as well as key functions for the SDK.
So if anybody's using these data and trying to figure it out, that's a really useful resource for that. And so those are located in this repository, as well. All right.
So let me jump over to this Notebook. This is a Jupyter Notebook. And like I mentioned, you need to pip install allensdk in order to use it. But other than that, getting started is pretty easy with these Notebooks.
And I'm sure you're pretty familiar with it. We don't have very many dependencies. We try and keep it a little simple, here-- but a few dependencies to install-- or not to install, to load.
And then, the key thing here for getting started with the allensdk is to instantiate the brain observatory cache. And so again, you are importing the brain observatory cache from the SDK. And then, if you only pay attention to one thing during this demo, it's going to be this little piece right here.
I'm pointing out right here to a manifest file. And this is a file that I have on my local machine. And all of the data that I access will get downloaded relative to this manifest file. And so if I download an NWB file, it might take a few minutes for it to download it at that instance, for that first time.
But once it's on my machine, if I'm pointing the cache to this manifest file, it's going to be able to find the NWB file that's previously been downloaded. So this path, as you can see, is unique to my computer. It's a local manifest file on my machine. If you're running this for the first time on your local machine, what you want to do is actually just remove this whole component here, and just say boc equals brain observatory cache.
And it'll create that manifest on your machine in your working directory. And then in the future, you want to put the path to that file in this location. And it'll always point to the right place.
So that's the only little trick there. But once you have that figured out, everything works really nicely. Because all of your data is organized. If you access any analysis files or the event files, all of those will be organized in that same place.
All right. So we instantiate the cache. And this gives us access to all of the data for the 2-photon visual coding data set. And so I want to start by showing you how you can use some functions for the boc to get oriented to the parameters of the data set that I've already unpacked for you a little bit.
But we'll use the SDK here. So I'm going to use this function under the boc that's called get all targeted structures. And this gives us a list of all of the visual areas that we have data from. These are all cortical areas for the 2photon data set.
You see this p is primary visual cortex V1. Then, these are all of the other, higher visual areas that surround it. And so we have some similar functions that give us a lot of the other information that we might use to pick an experiment.
So get all Cre lines gives us a list of the transgenic Cre lines that we use. These are the drivers that drive the calcium indicator expression. But we also have a list of the reporter lines. These are the actual GCaMP reporters.
And for the most part, there's pretty much one-to-one mapping between if you're using a Cre line-- it only was collected with one reporter. And so there aren't many situations where this is super important. There's only one case with this, as I mentioned before, with the SLC 17 where we did collect some data with GCaMP6 fast-- this Ai93 and a little bit with the Ai94, which is the GCaMP6 slow.
Otherwise, it just had to do with which tool worked best for a particular Cre line. And it might be relevant for things that you're thinking about. It might not. But that is there as well.
We can get a list of all of the stimuli that are shown. So this is all the different stimuli that are shown across all of the different sessions in a single experiment container. And this is a list of all of the session types. And you'll see here that we list four different sessions that start with the prefix three session.
And that's because partway through our data collection campaign, we made a modification to our session C. This is the session that has locally sparse noise. And so in the early data that we collected, we used this the three session C that had the stimulus locally sparse noise.
Then, we noticed that the small pixels weren't working very well to drive responses in the higher visual areas. And so we added a larger pixel size. So we modified that-- one modification, we changed it now to three session C2.
This now has locally sparse noise four degrees and locally sparse noise eight degrees in place of just locally sparse noise, which had four [? degree ?] pixels. So that's a subtle detail. But just want to point that out.
All right. So this gives us some of the parameters. And we can use some of these parameters to find data that we're interested in looking at. So I'm going to pick one visual area, and one Cre line that I'm going to specify.
I'm going to use this function that says get experiment containers. And this is going to return a list of experiment containers that were collected for this visual area, this p, primary visual cortex, and for this Cre line, Cux2. This is an excitatory Cre line that's in superficial layers.
We imaged it in both layer 2, 3, and layer 4. And so we can see here this list of about 16 experiments. They all have a unique ID. And as I mentioned, for this particular Cre line, we imaged them at different depths.
So some of them were imaged at 175 microns. Others were imaged at 275. But they're all in one targeted structure. They're all at the same Cre line.
They have different donor names. And that's a unique identifier for the particular mouse. But you might see, for instance, here's one experiment container collected from a mouse that we also collected from the same mouse at a different imaging depth.
So we use mice for different multiple fields of view. Or we might image them in different visual areas to get as much data from the mice as we can. All right.
So let's lee. There are 16 experiments for this particular visual area and this Cre line. This is out of-- what was it-- a total of 456 experiment containers. And so you can see that you could pick different targeted structures, or Cre lines. Or in order to find experiment containers for different-- based on what you're looking for.
This is just a visual to remind you that our experiment container consists of three different sessions-- session A, session B, and either session C or session C2. And so we can pick one of these experiment containers from the list above. I'm just going to pick the very first one. And let's get a list of all of the individual sessions.
And so now, you see we have three sections that were imaged. They're imaged from the same mouse. So the donor ID is the same.
And you can see that they're three different session types-- session type A, B, and C. And they're imaged three different days. So this is the age of the mouse when the data was collected.
So you could actually see these were collected in reverse order. We collected C first, and then B, and then A. And then again, each session has its own identifier that's different from the experiment container identifier. And this is what we use to access the NWB file.
So I want to get this ID for the session that has the natural scene stimulus, just so we can look at that. And so I'm going to use this function that says, get ophys experiments. So instead of get ophys experiment containers, I'm looking ophys experiments.
I'm specifying the experiment container ID that we already have selected above. And now, I'm specifying I want the session that has the stimulus that's called natural scenes. And this gives me a unique identifier for this particular session.
Now I can take this unique identifier that I'm calling session ID. And I'm passing it to a function called get ophys experiment data. And this will allow us to access all of the data in the NWB file. So I'm creating an object that I call here data set. And we can use this data set object to now access all of the data for this particular session.
So I'm going to show you many of the different pieces-- not all of them. There's quite a lot. But this will get you oriented. So for instance, we can look at the max projection.
And so I'm using this function called get max projection. I think you've probably noticed a theme, that all of our functions are get this, get that. And so once you start poking around, it becomes pretty intuitive. So the max projection is an image of our field of view.
It's the maximum value for each pixel in that field of view across the entire movie. And this basically gives you a really nice snapshot of the cells that were in the particular location that we imaged. So you see a lot of nice-looking cells in this field of view. One of the other things we have is ROI masks. So I use the function get roi mask array.
And this returns an array that is-- 512 by 512 is the size of our imaging field of view. And we've got a single plane in our array for each of the ROIs. And so you can see that that first dimension tells you how many cells are in this particular field of view. There's 174.
And if we just flatten this, we can plot all of these masks here. And you can compare this with the max projection. It'd be more effective if we put them side-by-side. But you can see that these ROIs line up really nicely with those cells that you see in the max projection.
All right. So now, we want to start looking at the activity. So we're going to look at the traces. One of the things that I want to point out is that there's a lot of different versions of the fluorescence traces that are available.
So if we just do this tab complete, it gives you a list of all of the functions, all the things that you can get out of this data set object. And we've got, for instance, get fluorescence traces. This will give you the raw fluorescence that we extract from the movie.
If any of you are familiar with calcium imaging, though, you know that you have to do some processing to the fluorescence traces before you can use them. So there's extracting the fluorescence from the movie. We want to correct for any neuropil. If there's processes from neighboring neurons that are passing through that ROI, we want to correct for any activity from neighboring cells that might be getting picked up with the mask.
And so you can also access here the neuropil traces. So we create a second mask that surrounds the cell mask, the ROI mask. We extract the fluorescence from that. That's our neuropil trace.
We have a correction method that subtracts this neuropil cell trace with an r value. And so we have functions here that give you the neuropil traces, that give you the r value. And so if you want to look at those traces and those r values for each cell, they're available.
Once we've done the neuropil correction, we've got what are called corrected fluorescence traces. So these are cells that have-- the neuropil subtraction has already been performed. But another step after that is that sometimes, you've got two cells where the ROIs overlap a little bit. And we can demix the signals of those cells, even where those pixels overlap.
We can figure out what activity belongs to this cell, and what activity belongs to that cell. And so that's a step we call demixing. And so the demix traces are the traces that have gone through both the neuropil subtraction and this demixing algorithm. But then, what we usually end up working with is, after we've done the demixing, we calculate the change in fluorescence-- the delta F over F, where we calculate the change in fluorescence normalized by a baseline fluorescence activity using a sliding window.
And this allows us to see how much that fluorescence is changing across the session. And so this is the function that we use. So I've actually put it in here-- get delta F over F traces. This returns both timestamps and this delta F over F array.
You can see, again, we've got 174 cells. And then, these are the number of time points that we've imaged at. So here, I'm just going to plot the trace for the first cell. We can zoom in on this window over here.
So you can see with a little bit more resolution. But you see, there's times where the cell is really active. And we see these nice peaks. Other times, it's fairly silent. And then, its baseline is close to zero.
All right. So I'm just going to plot-- that was one cell. Here's the first 50 cells in this experiment. Let's just plot them a raster plot, here.
This is the entire one-hour imaging session. The x-axis here is the imaging frames for the 2-photon movie. And you see these different cells from this particular session. And different cells are active at different times.
And we see bursts of activity for different cells in different places. All right. So different cells are active at different times. What can this possibly be about?
Well, let's start to look at the stimulus. So we can pull up some information from our experiment about what stimulus is being shown when. I'm going to start very coarsely by just getting what we call the stimulus epoch table. And this just tells us which stimulus type is shown, and when it starts, and when it ends.
So we're not talking about individual trials. We're just saying we've got 10 minutes of static gratings followed by 10 minutes of natural scenes, 5 minutes of spontaneous activity. And then, we show natural scenes again. And so what we actually have done is, we've interweaved the different stimulus in amongst each other so that we don't just show all of one stimulus at the beginning and all of the other at the end. And then, if there's differences in the responses, maybe it's because the cell is dying, or things like that might be happening.
So I can take the stimulus information. I'm just going to shade in the plot above of the delta F over F traces for those first 50 cells. And we're going to color in the epochs of the different stimuli. So you can see that we've got-- blue, I think is our static ratings. And then, this orange color is the natural scenes that we see-- three different epochs.
And now, when we look at the activity of these 50 cells, we start to see that some of these cells are really active. So this cell here-- it's really active when we show the natural scenes. But it's less active for the static gratings, the spontaneous activity. But the natural scenes come back. And the cell is active again, but then shuts down again when it goes back to gratings.
So we're starting to see that there might be some stimulus specificity for some of the neurons that we're recording from, here. Another piece of data that we have here in our NWB file is the running speed. So I use this function to get running speed. I'm going to plot the running speed of this mouse.
So it has this really long, continued burst of running late in the experiment with a few little bursts of running earlier. I'm going to add this to our plot. Excuse me. So we've got the activity of these cells, the timing of the stimulus epochs, and now the mouse's running speed all together in one figure.
And we can pull up some individual cells from this. I'll just show you three of my favorite cells from these examples. So this one, this is actually the top row here. This is, I think, a really fun cell, where we see it's not super active, except for during this spontaneous activity, as well as these two bursts here.
There's little 20- or 30-second gaps between our stimuli between, for instance, the natural scenes and the natural movies, here, where we show mean luminance gray, which is the same thing that we showed during the spontaneous activity. So the luminance is the same as the whole hour. But there's no patterns.
There's no stimulus on the monitor. But so whenever we switch from what is the natural scenes to the spontaneous activity, we see this big increase of activity. The cell's really active during this whole spontaneous epoch. But then, it gets quiet again when we return to the stimulus.
But after that, in these little gaps between the stimuli, we see these bursts of activity. And so this is a cell that seems to actually be suppressed by pattern stimuli, but show big bursts of activity when we release those patterns. So it's a type of response that we call a suppressed by contrast response that's a pretty interesting phenomenon that we see.
Here's another cell taken from the plot above where we see that the activity of the cell seems to follow the running activity of the mouse. It's not a perfect one-to-one match, but we see that when the mouse is running, there's more activity from the cell. And when the mouse is stationary, the cell's pretty quiet.
Another cell from that same experiment that's almost the opposite-- it's not that it doesn't respond at all when the mouse is running. But it's less active when the mouse is running than it is when the mouse is stationary. And you can see, it's pretty well driven by these natural scenes. But those response amplitudes are much smaller when the mouse is running than when it's stationary.
And you see this interdigitating of running an activity during these other stimuli that suggests that this is a cell that is anticorrelated with running. So just very coarsely pulling up stimulus timing, activity traces, running speed, we can already start to see some pretty interesting phenomena in the data. All right.
Just a few more things that I want to show you before I go back to telling you some things that we've learned with the data. But I've been showing you the delta F over F traces. We do also have event times that we've extracted from these fluorescence traces using a method that was developed by Daniella Witten and her student Sean Jewell. They're at the University of Washington here in Seattle.
And these are the events that we've actually used for our analysis and in the work that I'm going to show you. These are available through the SDK. They're not in the NWB file currently. And so the way that you access them isn't using the data set object, but is actually going back to the brain observatory cache where we have a function called get ophys experiment events.
And again, you pass [? it ?] the session ID for the individual session. And it returns an array. It's the same shape as the delta F over F-- so the number of cells, the number of time points.
But these are the extracted event times of event magnitudes. And so here, I'll just show you for one cell. Here is the delta F over F trace. There are the events that we've extracted for that trace. And so some analyses are easier to do with these events than others.
And so those are available here. I can make the same plot I made before-- this 50 cells with the stimulus and the running speed. And so we see the same thing with the events as we do with delta F over F. But they're somewhat a little bit easier to use for some types of analysis. All right.
And then, I showed you the stimulus epoch table. This basically says, this is the 10 minutes that we're showing stimulus type A. But we also have more specific trial information. So we have the function get stimulus table. And then, you give it the name of the specific stimulus.
And so we can get a table that gives us-- for each trial of the natural scenes here, it tells you what image was shown, and then when it started, and when it ended. And these again are the indices that map into these fluorescence traces. And we have a similar one for the static gratings stimulus.
But here we tell you the parameters of the stimulus, because those are created programmatically-- these gratings stimuli-- so the orientation, the spatial frequency, the phase of the grating, and then again, when it starts and when it ends. The frame number here, the image number for the natural scene stimulus-- and this is also true for the natural movie-- this maps into our stimulus template. And so I mentioned, this is something that's included in our NWB file.
And so this function, get stimulus template, returns an array of all of the images for the natural scenes. We have the same thing for the natural movies, as well as for the locally sparse stimulus. And so this is an array. It has 118 images.
And that's the number of images that we show. And so for instance, we can pick a random scene number. Here's scene number 101.
And this is the actual image that we showed to the mouse, this burrowing hole in the ground. The first trial on this table is frame number 81. So we can look at that. Scene number 81 is this fence with a shadow on it.
So we have a lot of different images. This lets you dig into them and see what they are. And you can build that into your analysis. And then, we can use, for instance, the stimulus table to say, we want to look at all of the times that we showed this image, scene number 101. We're going to look at the response of cell number one for all the different times that I showed image 101.
And now, I'm plotting all these different trials. And see, this has a really robust response, although the amplitude of this response is quite variable across all the different trials for this particular cell. So now you can start putting these pieces of information together in order to start looking at the different activity and responses of these different cells.
I'm going to jump back to my talk. I'll just tell you we have this function, get metadata. This is where you get all of the metadata about the animal-- the sex of the animal, the age of the animal, as well as which actual device we collected the data on. So we have a number of different microscopes.
Some of this might not be super relevant for you. Other pieces of metadata here might be super important. So that's just something to point out.
I have some more stuff in this Notebook. There's a little exercise here on looking at signal correlations as a function of the distance between neurons. So if you're interested, this is a great way to get more familiar with the pieces of data that we've pulled out. But I'll leave that to you to do. It's all in this Notebook, so it should be pretty self-explanatory.
So that should give you a bit of an insight into how to access the data and use it. So I'm going to tell you a little bit about what we've done with these data, some of the analyses we've done, and what we've learned from it, just to give you a little flavor of some of the stuff that we've been able to plot out.
But as Christoph has mentioned, there have been a couple of dozen papers from external groups. There's just a lot more that can be extracted from these data sets. So this is work that's in our platform paper. This work, I co-led with my colleagues Jerome and Michael in collecting and analyzing these data.
So I do want to give them a shout-out at this point as we dig into some of the data. And so to start, I want to show you one of the cells in our data set. So there's 60,000 neurons. Here's one of those 63,000 neurons.
This is a [? parametal ?] cell in layer 5 of V1, showing responses to the different stimuli that we show. So for instance, we show a drifting grating stimulus that has sinusoidal gratings that move in eight different directions, and at different temporal frequencies. These are the speed of the grating.
And we have this plot down here. We call this a star plot that summarizes the cell's response. So each arm of this plot corresponds to a different direction of motion, and each ring corresponds to different speed, with the slow speeds in the center, that fast speed in the outside. And at the intersection of the rings in the arms, you see a number of dots.
And each is a single trial. And the color of the dot corresponds to the strength of the response on that trial of this neuron. So if it's a dark red dot, that means the cell had a strong response. If it's a white dot, it means that the cell didn't respond.
And so for most of these conditions over here, for instance, you don't see any dots. Because they're all white. And so there's still plenty of trials being shown of all those conditions. There's just no responses.
But we see, there's really reliable responses up here, which corresponds to these horizontal gratings that are moving up, that are moving vertically. Weaker responses for the downward motion. But some sort of response for the downward motion, but much stronger responses for these upward-directed gratings.
We also showed static gratings. These are, again, sinusoidal gratings at six different orientations. But there's no motion. These are just flash gratings.
There's six different orientations and five different spatial frequencies. And these correspond to the period of the grating, the width of the gratings. And so again, this plot shows-- each arm is the orientation. Each ring or arc here is the spatial frequency, the low spatial frequency in the center, the high spatial frequency on the outside.
And we see that the responses here cluster at, again, this horizontal grating-- maybe slightly off axis. We see these responses over here, and these intermediate spatial frequency values. We used a locally sparse noise. These are white and black spots that are flashed in different locations on the monitor.
And we can look at the responses of the neuron to the position of these spots. So we do this separately for the white spots and the black spots to look at on subunits in the off subfields. This particular cell, we didn't see any off subfields. But we had this nice spatial receptive field for the on stimulus, the white stimulus-- on this side of the monitor, as we showed it to the mouse, so over to the left.
We show 118 different natural images. What we call a coronaplot-- each ray of this plot is a different image. So there's 118 different rays.
Again, each is a single trial. There's 50 trials of each stimulus. And so where you see these long rays of red dots are the stimuli that have consistent responses across many trials. So there's about four images that this cell has a pretty reliable response to.
And the other images-- it might respond once or twice, but it really doesn't have a very robust, reliable response to them. And then finally, we showed a couple of different natural movie clips. And the responses to this movie-- this is essentially a raster plot that we've looped around in a circle like a clock.
So each row here is a single trial. These red tick marks correspond to the events. And you can see across the different trials. There's a couple of times where we see pretty reliable responses across the 10 different repeats of the movie.
The outer ring in blue is the average across the 10 trials. And that really hits home, these really robust responses of these couple of time points down here. So if we take all of these different responses, and we start to piece them together, we think about-- the cell has this spatial receptive field.
It responds to light things on the side of the image. We saw that it responded to horizontal, maybe slightly angled edges, particularly when they're moving upwards. These were the four images that the cell responded to that we saw from the coronaplot.
I went and pulled out these images. And so we can start to think about this light, maybe a slightly angled edge. And we can start to think that maybe it's the lightness with this the contrast of edge right there that could be driving it, or maybe the edge of the tiger's back that could be driving this one cell.
This is the movie clip at the time when it responded. You'll notice, there's this bright street organ that comes into view right on the side, right where that light, spatial receptive field is. And so we can start to piece these together and think about, all right. This cell likes it when there's this light, maybe with a contrasting edge, maybe at a slight angle on the left side of the image.
And put all these together in this way. And we can start to try and create, what are the features that drive this cell the best? And so one of the things that we wanted to do with this data set was to test this idea of what's called the standard model.
So for many decades, people in the field-- we've been recording from individual neurons. And we could record their spatial and temporal receptive fields. And we've come up with the standard model that consists of a linear filter that has both spatial and temporal features.
This diagram just shows the spatial, but it's both a spatial temporal linear filter, the output of which gets passed through a static nonlinearity. And we can predict the responses of a cell based off of this feature, this spatial temporal feature. And how well a visual stimulus matches this feature will predict how well the cell responds to that particular stimulus.
So these examples-- this is from a nice review paper from Carandini et al. From a decade ago. These are examples of retinal ganglion cell, or LGN. We've got these concentric circles where there's a light spot surrounded by dark. And so when you have features in the stimulus that align with that particular location, with that particular light and dark combination, we get a nice, strong response.
But if you have the opposite-- so say, if you have a dark spot on a light background-- you'd see no activity at all. But the light spot, you'd get a nice, large response. Moving into the visual cortex, we've got our simple cells.
These now have elongated oriented receptive fields that then pass through this half wave nonlinearity, the rectification. And so if you've got a dark edge with light flanking it on the sides at a particular orientation, you get this strong response. But the opposite-- a light edge in that particular location-- would not drive the cell.
So it's pretty phase sensitive, whereas complex cells show that same, elongated oriented structure. But they're now phase insensitive, because they've now got this squaring nonlinearity. And so we've got the phase invariance of the complex cell.
All right. So just to reiterate this-- I'm sure this is pretty familiar to all of you. So this is a simple cell receptive field. If we've got a grating stimulus that lines up with the orientation, where the dark falls in the dark, and the light is on those two other sides, we're going to see a nice response from this simple cell.
And we expect that if we were to show, for instance, this natural image, where the stem of this flower falls in that same location, we'd expect to see that same type of a nice response from this cell, as well. And so this is what I was just talking about when I was trying to draw together all of the features of these different stimuli that are driving this particular example cell really well. And think about, all right, you can start to think about a spacial feature in this area that is driving the cell.
Now, this is one cell. And we can't go through each cell like this laboriously. And so we wanted to test the standard model in our data set.
And so we implemented the standard model using a 3D wavelet projection, dense wavelet basis that captured both spatial and temporal features at the level of the mouse visual acuity. So we're using spatial and temporal features that are appropriate. We follow this with both linear and quadratic nonlinearities so that we can capture both simple and complex types of responses.
And then, we can fit the weights for all of these different spacial temporal features with either of these rectifications in order to find the combinations that best predict each individual cell. And we can train and test this, either using the natural stimuli-- these are the images or the movies-- or using the artificial stimuli. These are the gratings and the noise.
And we can build the models that best predict the responses of each cell. We also include the mouse's running speed, because we know that this is an important factor for many cells in the visual cortex, that the running can modulate these responses. And so for the example cell that I showed you before, we can look at how well we can predict this.
We do a nested six-fold cross-validation. So we're doing this with cross-validation here. And so we can look at how well it predicts the responses. And we see, this does pretty decently.
We can build the features. We can find the features that-- we get R values here for both artificial and natural stimuli, where R values are about 0.5, 0.6, which is pretty good. We consider this to be a well-performing model.
But if we look across our entire data set, it's not the norm. So most of our data, actually, is very poorly predicted using the standard model. So this is a density plot. So the color corresponds to where the bulk of the cells are, which is down here very close to 0.0.
So the majority of our cells are really just not being well predicted by this model. So we can start to think a little bit about why this might be, why the standard model isn't doing a great job. Now, just to step back for a second, we've known that the standard model isn't perfect.
This isn't a new thing. There's actually a lovely book chapter called, "What Is the 85% of V1 Doing" that points out a number of the deficiencies of the standard model. So it's not that we're completely shocked by this. But we're maybe a little bit shocked by this.
I think we were expecting more of our cells to be up here, like the example, and not so far close to 0.0, as we found in this data set. But to think about what might be going on, I'm going to pull back to this visualization. This is basically the same plot that I made in the demo a few minutes ago, where this is just the activity of 50 cells from one experiment with the stimulus epochs shaded above it.
And I pointed this out during the demo as well, that there's some cells that are active during some stimuli, and then not active during the others. This is an example, here, of this cell that is really active during the static gratings. And it switches to natural scenes, and the cell gets a lot quieter and stays really quiet until we come back and share the static gratings again. So it's not like the cell died.
It's not like we lost the cell. But it has these really robust responses for the static gratings that we don't see for other stimuli. You can see similarly, here is a cell that shows a nice response for the natural movies. These movies are shown 10 times in a row. And so you see this nice little periodic response here.
But when it switches to the natural seems, which have the same spatial parameters, the same spatial structure, the cell stops responding very robustly in this particular example. So we see that different cells really seem to respond to some stimuli, and not others. And they do so with different reliabilities-- how consistently a cell responds.
So if there's one image that a cell responds to, say that image of the tiger waiting in the water, and if that's the image that drives the individual cell the best, if I show it 50 times, how many times does the cell respond to that particular image? And we can think about that percentage of trials that it has a significant response for its preferred image as the reliability of that response. And so we compute that reliability for these different stimuli.
And you can see here, I'm doing a pairwise comparison looking at the correlation of the reliability of the responses of a cell to all of the different stimuli that we show. So we've got here, static gratings, drifting gratings, natural scenes. And then, these are all natural movies.
And you'll see, there's natural movie 1A, 1B, and 1C. And this is the movie that gets repeated in each of the three different sessions-- in session A, session B, and session C. And so this is really important. Because this is the same stimulus being shown to the same cells, but on three different days. And this gives us a little bit of a benchmark as to the variability that we might expect from day to day of the same neuron's response to the same stimulus.
And so there we see, when we're looking across natural movie 1A to natural movie 1B and 1C-- we see R values that are about 0.4, maybe 0.5. So this is some of the higher values that we're seeing. But mostly, what we see, for instance, if you look at drifting gratings, if you look at the correlation of the reliability of a cell's response to its preferred drifting grading condition compared to the reliability of its response to the natural scenes, that correlation is quite low. What this tells us is that knowing whether a neuron responds to drifting gratings reliably doesn't tell us anything about whether it responds reliably to natural scenes, or natural movies, or static gratings.
It doesn't mean that it won't respond reliably to those stimuli. So it's not anticorrelated. But it just tells us that we don't know if it responds reliably to one stimulus type. We really just don't know if it's going to respond reliably to any of the other stimulus types that we see.
AUDIENCE: Can I ask a quick question?
SASKIA DE VRIES: Yes, sure.
AUDIENCE: So what we're looking at is the correlation of a single neuron?
SASKIA DE VRIES: Yes. We've averaged together-- this is the average across all of the neurons in our data set. But we've computed it per neuron. Yeah.
AUDIENCE: And then, when you talk about the different sessions, are those over the course with the same day? Or is that across different days?
SASKIA DE VRIES: Those are across different days. Session A, B and C are on on three different days. Yeah.
AUDIENCE: And what is the drift like in these type of experiments? Is it possible that you might be getting signals in the same part of your matrix, I guess, but it might be a different neuron and not the same across the different days?
SASKIA DE VRIES: Great question. So yeah. That's a really important point. So we segment the cells in each session. And then, we match the ROIs from one session to the next.
And so if we image a cell in session A, we may or may not even find that cell in session B. So if a cell is really not active in session B, it might not even show up in our segmentation. We might not have an ROI.
So not every cell is matched across all three sessions. And this is an important thing when you're working with the data to keep in mind. So I think it's about 30% of the cells are matched across all three sessions.
Another 30% are matched across two of the three sessions. And then, there's the 30% of cells that we find only in one of the three sessions. And so this analysis here is only done for cells that we've matched across that all of the sessions.
AUDIENCE: Thanks.
SASKIA DE VRIES: Yeah. OK. Great question. Thank you.
This here is just doing a pairwise comparison across the two different stimuli. We can now dig into this a little bit more deeply by doing a comparison across all four stimuli. And so we did a Gaussian mixture model, to look at the relationship between the reliabilities of the responses across all four stimuli.
So this is drifting gratings, static gratings, natural scenes, natural movies. And so what we've done now is, we've collapsed all of the different natural movies that we show. And we just take the most reliable response of an individual cell. And we use that as its reliability for the natural movie class.
And so we do this Gaussian mixture modeling. And we get these 30 different clusters that you see in this heat map, here on the left. And what this shows you, what we're showing in the heat map is the mean reliability for that cluster for each of the four stimuli. And we've set this color scale so that things that are red are what we consider to have a reliable response.
And I'll explain in a second. The ones that are blue are ones that we think are not reliable responses. And we set that threshold by looking at this first cluster, here. So we know that there's some cells-- and we've known this just by looking at the data and working with the data-- we know that there's some cells that just don't have reliable responses to any of the stimuli that we show.
And so the cluster that has the lowest mean reliability across all four stimuli, we assign to this none cluster. And we then set the threshold based off of the values of these reliabilities. We take the maximum value, here. We add one standard deviation. And that becomes the threshold that we use to set this color scale.
So that's the white, here. Any reliability above that, we consider to be responsive. And it gets colored red. Below that, it's unresponsive. And it's colored blue.
So we've got this one cluster that doesn't respond to any of the four stimuli. And then, we find clusters that have different patterns. So these two clusters have reliable responses to the natural movies, but none of the other stimuli.
These here-- and you can zoom in on this one here-- it's got reliable responses to the drifting gratings and natural movies, but not the static gratings and natural scenes. And so we can group these different clusters, now, based off of which stimuli they respond reliably to, and which ones they don't into different classes. And we can start to see the patterns among the classes.
Now one thing that's I think is important to point out, again, is that even though we have cells that show consistent patterns in terms of reliable responses to both drifting gratings and natural movies, if you look across the clusters within this class, there's still a lot of heterogeneity, where some of these clusters, the cells respond really reliably to the drifting gratings, and less reliably to the natural movies. Others, they're pretty equal. There's still that heterogeneity, where how reliable a cell responds to one stimulus tells us very little about how reliably it responds to another stimulus type.
So we repeat this clustering analysis 100 times with different initial conditions in order to evaluate the robustness. We can now look at how many cells fall into each of the possible classes. So there are 16 possible classes based off of the combinations of these four stimuli.
And so I'm showing you here the percent cells that fall into each of these classes. And so it's important to note, there's many classes where we just don't have any cells. There's no cells that, for instance, only respond to drifting gratings and static gratings together and none of the other stimuli. That just doesn't happen.
Our biggest class-- and this was very surprising to us-- our biggest class, though, are these none cells, the ones that don't respond to anything. 35% of the cells in our data set fall into this class. We've got 10% cells that respond to all of the stimuli, that have reliable responses to all four stimuli. And you see them down here in these clusters, there.
But it's only 10% of the cells in our data set. A couple of the other big classes-- this one here is the drifting gratings and natural movies. These are stimuli that both have motion features to them that we think are important for driving these cells. And then, the natural scene, natural movie-- this purple bar-- these are the ones that have the spatial statistics, the natural scene statistics in both the natural images and the natural movies, where we think that these cells are particularly attuned to those statistics.
So coming back to our standard model, we can start to see how these classes map onto our model performance. This was the plot I showed you before comparing the model performance for natural stimuli and artificial stimuli. The cells in our none class now-- if we just show those cells-- that really is that bulk of data down there at 0.0.
And this makes sense. If you don't have reliable responses to stimuli, you're not going to be able to predict those unreliable responses. And so now, it makes a lot more sense why we have such a bulk of data close to 0.0.
The cells in our natural scene natural movie category-- these have actually decent predictions for the natural stimuli, but not for the artificial stimuli. And so we see this tail of cells that their-- the R values, the median R value here is about 0.3. It's not stellar. But it's not terrible.
And you can find a lot of cells like that in the literature. And so yeah. We can get reliable responses to the stimuli that-- we can predict the responses to the stimuli that the cells have reliable responses to. And then, the cells that were in our all category, that responded to all four stimuli-- here now, we've got R values of 0.4.
We see a nice, balanced performance between both the artificial and the natural stimuli. And these look a lot more like what we expect based off of the textbook idea of simple and complex cells with the standard model, where there's features that, regardless of whether it's artificial or natural, if you have something that matches those features, it's going to drive a response. And we can then predict it reasonably well.
And so now we see, there is some kernel of that textbook standard model in our data set. But it's a little bit hidden by some of these different types of responses that are a bit more prominent. Coming back to the functional classes, this is the plot I showed you before-- just the number of cells, or the percent of cells in each of our categories. I've just rotated the plot. We can look at how these are distributed across the different areas that we've collected data from.
And so here here's V1, and then the 5 higher visual areas. We see that, for instance, the fraction of cells that don't respond reliably to our stimuli increases as we move away from V1 into the higher visual areas. And in RL, we see that almost all of our cells-- it's like 85% of the cells in RL fall into this none category.
And so this suggests that as we move away from the V1, that the responses either are becoming less visual, or they might be becoming more complex, not in the sense of simple and complex, but more sophisticated. They're responding to features that aren't represented in our stimulus set. So we have a finite stimulus set.
And it could just be that we're not hitting the right feature for some of the cells in our data set, particularly in these higher visual areas that are further in the hierarchy-- might be selective for more specific features. Another thing that I find really interesting, though, is that in these higher visual areas, up until RL-- so if we just ignore RL, here-- the percent of cells that are responsive to the motion stimuli, the drifting gratings and natural movies-- remains fairly constant. While the ones that are responsive to the natural scene natural movies-- that actually becomes a smaller and smaller fraction.
And if you consider then, also, that the fraction of responsive cells gets smaller, we're actually seeing of the visually responsive cells in these higher visual areas a larger percent of them are now motion selective compared to in V1. And so we see that there is this enhancement of motion information, particularly in these two medial areas-- AM and PM. And this map's a bit onto our ideas of dorsal and ventral streams, and how they map onto the mouse visual system.
And then finally, we can also look at this breakdown across the different cell types in our data set. And so this is just looking within V1 across these different transgenic lines, and across the different layers. I won't believe this. But if you can focus in here, we have Sst, Vip.
These are two of the inhibitory inner neurons. Emx is one of our pan excitatory. And you can see really big differences here, where for instance, Sst, this somatostatin cells-- they have very few none cells.
And actually, about half of them respond to all of the stimuli. They have very robust visual responses, whereas the Vip cells-- if you only use a drifting grading stimulus, especially high contrast drifting grating stimulus, you're going to think that Vip cells don't have visual responses. But actually, we see they respond really robustly to the natural stimuli that we show, both natural scenes and the natural movies.
They fall into that purple class, here, the majority of those cells. So we do see some differences across some of the Cre lines. The biggest, most interesting differences are between the inhibitory populations.
The excitatory populations are a bit more similar. And you can tell that. Just up here in these excitatory lines, they all look roughly the same.
So to conclude this part, so there's about 10% of the neurons that look like textbook cells that are reasonably well predicted by the standard model for both artificial and natural stimuli. But it's only 10% of the cells. And it's a lot more in V1 and fewer in the higher visual areas. And so there's a lot of other things that are going on in the mouse visual cortex.
And there's a lot of neurons in our data set that don't respond to the stimuli. And so we have a lot of interesting questions now, about what they're doing and whether it's about more complex visual features, or whether it's integration of different types of sensory modalities. We know that there's a lot of modulation by motor activity, or by other sensory stimuli-- auditory, or whisking stimuli in the visual cortex. And so there's a lot of questions about what that might be doing within the visual cortex. And then finally, we also know that the standard model-- we know there's deficiencies in this model.
And so there's this challenge of, can we develop better models that capture these neural responses, especially as we move across these different areas and think about the interactions across the different layers in the different areas? All right. So I have a few more minutes. And I want to very quickly tell you-- so for those of you who might be interested in using these data, or honestly, using any data of this flavor, I want to point you very quickly to a paper that we just published that I published along with Josh Siegle and Peter Ledohowitsch.
So Josh was one of the leads on our neuropixels data set that Christoph was telling you about. And we took our two pipelines, and our two data sets and did a head-to-head comparison of the calcium imaging data and the neuropixels data, the extracellular electrophysiology. Because we want to understand how these two modalities-- what they're doing, and how we can relate to the data and the conclusions that we draw from these two modalities, both with our data, but also in the literature as a whole.
And here we're really leveraging the fact that we've got these complementary pipelines, where these are our CAD drawings of the neuropixel rig, and of the 2-photon rig. The position of the mouse, the position of the monitor-- all of these things are exactly the same between these different experiments. The only thing that's different is whether we're imaging or using electrodes.
And those are big differences, right? So these are very different types of recordings. Some of these differences are obvious, right? So the electrode is going through all of the layers, whereas our calcium imaging is in a single plane collecting a lot of cells across a lateral space. We've got spike time resolution with the neuropixel recordings, whereas we've got a longer kinetic response for the calcium indicator.
So those are some known differences. But we want to see, what effect does the modality have on the types of metrics that we look at, and the types of conclusions we draw from our data sets? And really, I want to point you to this paper. But I'll just highlight a couple of the key things that we discovered.
So in this comparison, we found, when we look at preference metrics-- this is essentially, what stimulus drives a cell the best? What are the tuning properties of a cell? In this case, I'm showing the preferred temporal frequency. What speed of grading drives a cell the best?
We actually see that these match very nicely between the two modalities. So here in green is the results with calcium imaging. In gray are the results with neuropixels across five visual areas. We see really similar distributions. And we've calculated a Shannon-Jensen distance, which is very low, in this case, between these two distributions.
And so looking at just what stimulus drives a cell the best, we're going to get pretty much the same answer with the two modalities. When we look at differences in responsiveness, we actually see pretty big differences. So here, we're asking essentially about the reliability of the response to the cell.
We say that a cell is responsive if 25% of the trials of the cells preferred stimulus condition-- so that image that drives it best out of all the natural scenes. If it has significant responses on 25% of trials of that particular condition, we consider it responsive. There's other definitions for responsive, but this is the one that we use. And we use it consistently across the two data sets.
What we see is that we always get more, higher responsiveness in the ephys data than in the calcium imaging. So it's 50%, 60% of cells with calcium imaging. And it's about 70%, 75% of cells with ephys. And again, this is that Shannon-Jensen distance, which is now actually a substantial distance.
And this difference in responsiveness, we think, stems from selection bias in the ephys recordings. So when you're doing these extracellular recordings, the electrodes are most likely to pick up cells that are large cells with large spikes, and lots of those spikes. And so if we subselect our calcium data to only look at calcium data from cells that have a high event rate-- so basically, what we think is happening is that calcium imaging, we have a lot more cells that have less activity, that are just more quiet overall.
And so if we select out the quiet cells, and only focus on the cells that are really active, we see that the responsiveness in our 2-photon data set gets larger. Right? And so this is where, if we include 100%, this is our responsiveness across the five visual areas, these different colored traces of the different visual areas-- legends over there. But now, if we exclude the least active cells, and we exclude more and more of the least active cells, that responsiveness goes up.
And our Jensen-Shannon distance goes down. And so we see the closest match between the ephys and the calcium imaging when we're only including 20% to 40% of our calcium imaging data set. So there's a big chunk of our calcium imaging data set that we think is simply being missed from in our ephys data set because of this selection bias, where we need the lots of spikes in order to sort those spikes.
That's the other thing. There's the recording, and there's the spike sorting.
And so in order to reliably isolate out the spikes of a unit, you have to have enough spikes. And they have to be big enough to really pull them out of the noise. And so that selection bias at ephys, we think, is eliminating a lot of these quieter cells that are in the cortex.
Another component of this is that there can be some contamination in the ephys between neighboring units, where if there's a cell that has a big burst of spikes, those spikes might get sorted in with the spikes of a unit that's being recorded from probes nearby. And so we did a similar selection, where we selected the ephys units based off of the ISI violation.
This is one of the QC metrics that is computed for each of the units in the data set. And as we get stricter and stricter with that ISI violation, then we get fewer responsive neurons. And we get that best match when we have that strictest criteria that we can put on the ISI violations on that ephys data set.
So we think that there's both, where we're missing some white cells. But there's also a little bit of contamination between the units that contributes to this difference in responsiveness. And then, the other big difference that we see between the two-- and this is a huge difference-- is in selectivity metrics.
And these are metrics that basically ask, how much does a cell prefer one condition to another? So for instance, direction selectivity-- when cells respond to gratings that move in one direction and don't respond to gratings that move in the other direction, we can compute a metric of that selectivity, of how strongly it prefers the one direction to the other. Orientation selectivity is a similar type of thing. And lifetime sparseness is a more generic version of that type of selectivity metric.
And so what I'm showing here is the distribution of lifetime sparseness, in this case for the drifting grading stimulus. But we've done this analysis for all of the stimuli. And you see that for the calcium imaging in green, where we compute the selectivity metrics, the distribution is actually quite high. And our cells are highly selective.
They respond to very few stimulus conditions very strongly, whereas for the ephys data in gray, we actually see that distribution skews quite low. And that Jensen-Shannon distance is actually quite high, here. So these distributions are very different.
And this difference, we think, underlies a lot of what's going on with our calcium indicator. And to unpack this, we used a forward model, where we took the spikes that we recorded with neuropixels, and we passed them through a model to model the fluorescence we'd expect to see for them. So here's the original spike train in blue. And here, we've simulated the delta F over F that we would see for this particular spike train.
And we were able to simulate this because we've done some calibration experiments. And we have a whole other body of work about this calibration, where we image cells while we have loose patch recordings. So we have spike times and fluorescence from the same cells. And we can then parameterize this forward model with amplitude, decay time, and the noise level of what the fluorescence traces-- how they relate to the spikes that are recorded.
We also, in the paper, we've got some figures where we span a wide range of these different values to show how they can influence it. But the key point here-- we can simulate the delta F over F. We do the same event extraction that we do for our analysis of our calcium data. And so you can see how well that matches that original spike train, in this case.
But what you can see-- so this is the original spike times for the cell, showing us responses to the gratings that move in different directions. And on the right, we've taken the mean response across the trial-- so basically, counting up the spikes across each trial. And you can see that there's conditions that drive really strong responses. So these are different directions. There's also different temporal frequencies.
That's why you see this a little bit of a structure, here. When we pass this through the forward model, again, the preferred direction stays the same. This is what I showed you a couple of slides ago. We still see the largest responses for this downward-angled direction.
But the thing that I want to point out is that when you've got trials with a few spikes-- not zero, but a few spikes-- these are getting quashed through the forward model, whereas the trials where you've got big numbers of spikes, these have a nonlinear boosting of the amplitude of these responses. And so we're seeing this super linear boost of strong responses, and then a quashing of these single spike, or just a few spikes that are getting washed out through what we think is happening with the calcium indicator, here. And so if we take all of our ephys data and pass it through this forward model, now our distribution of selectivity actually matches our calcium Imagining much closer than the original electrophysiology data did in the first place.
And one thing that I want to point out is that this is very sensitive to the parameters that you use for your calcium event extraction-- what your selectivity looks like with calcium data. And the reason for this is that I could come up with a different event extraction that would add in a lot of false positives. I could just add in a whole bunch of extra spikes.
And it's going to lower my selectivity metric, because there's going to be a lot more spikes for the conditions-- the non-preferred conditions. But it's not because those are actual, real spikes. It's just adding noise to that data. And again, if you look at our paper, we've got some nice figures that unpack this a little bit, and compare a couple of different methods for event extraction.
One of the reasons we like our L0 method that we use is because we have a fairly low false positive rate in that, as you can see in this example. It's a pretty conservative, but a pretty accurate estimation of the events. So those are some key differences between these two modalities that I think are really important to keep in mind. Because they can have a lot of implications on downstream analyses, such as the clustering analysis that we did of the responses across different stimuli.
The bottom line is that neither calcium imaging nor extracellular electrophysiology captures perfectly the activity of the neurons in the brain. They're both missing things in different ways. And so it's really important, when you're analyzing the data and interpreting the results, and also when you're looking at work in the literature-- it's really important to keep those limitations as well as the advantages of these different methods in mind for really being able to properly interpret what we think is really going on in the neurons underneath them.
So with this, I'll wrap up. Our website will point you to all of our data sets with a lot more documentation. Again, our SDK that allows you to access the data-- just pip install allensdk.
And also we have a forum where you can post questions, if you run into problems using the data, or you want to understand more about it. It's a great place to-- there's already a lot of questions there that you might find somebody's already asked. But it's a great place to post questions.
And we will all be happy to answer them. And so with that, I want to thank everybody on our large team. There's lots and lots of people who contribute to all this work on running these pipelines and analyzing the data.
It's a wonderful team. And they all have done just great work, here. And again, thank our founder Paul Allen for his vision and support for creating such exciting open data sets.