Neurons that structure memories of ordered experience in human
October 27, 2021
October 26, 2021
All Captioned Videos CBMM Research
Gabariel Kreiman (on behalf of Jie Zheng), Children's Hospital Boston, Harvard Medical School
Abstract: The process of constructing temporal associations among related events is essential to episodic memory. However, what neural mechanism helps accomplish this function remains unclear. To address this question, we recorded single unit activity in humans while subjects performed a temporal order memory task. During encoding, subjects watched a series of clips (i.e., each clip consisted of 4 events) and were later instructed to retrieve the ordinal information of event sequences. We found that hippocampal neurons in humans could index specific orders of events with increased neuronal firings (i.e., rate order cells) or clustered spike timing relative to theta phases (i.e., phase order cells), which are transferrable across different encoding experiences (e.g., different clips). Rate order cells also increased their firing rates when subjects correctly retrieved the temporal information of their preferred ordered events. Phase order cells demonstrated stronger phase precessions at event transitions during encoding for clips whose ordinal information was subsequently correct retrieved. These results not only highlight the critical role of the hippocampus in structuring memories of continuous event sequences but also suggest a potential neural code representing temporal associations among events.
GABRIEL KRIEMAN: This talk was supposed to be given by Jie shown right here. Unfortunately, today, in the morning, she said she was not feeling very well. And so she asked me to give the presentation in her stead.
So I apologize to everyone. You're stuck with me. I'm going to be using her slides. And this is not going to be not even close to a nicely polished presentation as she would have done.
So hopefully you will all get to hear from the person who actually did the work at some point. But in any case, I'll try to pretend that I am Jie and give a brief summary of her work here. So the topic that we were investigating here was how cognitive boundaries are detected to structure episodic memory formation in the human brain.
And all of the work that I'm going to talk about was done, was conducted by Jie and was done in collaboration with a couple of people, most notably my good friend, [INAUDIBLE] shown here. So before I get into showing you some of the cool data that the Jie collected, I want to start by discussing the notion that it is now essentially possible to store our entire lives in a hard drive.
And just as the back of the envelope calculation, just to consider the sense of vision, because it's somewhat easier to calculate, let's say that we want to store everything we see in our lifetimes. Let's assume that we have about 10 to the 6 pixels per second. The number the exact numbers here don't really matter.
And we can debate about how many pixels per second you actually want to store, but let's say we want to store 10 to the 6 pixels per second. And we want to store each pixel with 3 bytes so that means that we have about 256 shades for each color. And let's say that we are lucky enough to live all the way to 100 years old.
So that's about 10 to the 9 seconds. So all in all everything we'll see in our lives amounts to about 10 petabytes. Of course, our life is not just visual processing. There are lots of exciting sounds, all the internal ruminating that we do, olfactory sense, touch, lots of other things that transcend visual processing.
But the main point I want to make is that it's not inconceivable that basically we could potentially store everything. And yet, our memories are very, very far from this kind of complete storage of information. Our episodic memories are fragile. They often deceive us.
And, in fact, we forget most of what we see and most of what we experience. A large fraction of our lives, we actually end up forgetting. So we're very interested in trying to understand how our brains distill information from our daily experiences to form episodic events, and how these episodic events are defined, and what are the filtering mechanisms that condense and compress information to form episodic memories?
So the particular question that I'd like to focus on today is, what defines an event? So our lives are of course continuous. And yet, most of our episodic memories are somewhat discrete. We tend to remember particular events that are somewhat isolated from other events in our lives.
So there has to be some sort of a mechanism that will detect boundaries in our daily experiences. And I'm trying to illustrate that here with these scissors, basically some sort of boundary in our continuous moving or continuous live stream that will limit the onset of one event and give rise to the onset of another. So we set out to try to investigate what defines an event is specifically in the context of episodic memory.
So we refer to this as cognitive boundaries that lead to the formation of discrete mnemonic episodes. So here is the flavor of the type of experiment that we run to try to better understand the formation of these episodic boundary bins. So we work with subjects that were presented with very short video clips about 8 seconds in duration.
And then after watching these video clips in random fraction of the video clips, they were asked some simple questions about the content just to make sure that they were paying attention, and they had actually seen and attended to the contents of the video clip. And importantly, we had three different types of video clips. In one of the video clips, there was no boundary.
So this was a continuous sequence that lasted for 8 seconds. And what I mean by continuous is just that the transition between any frame and the adjacent one was minimal as illustrated here. In the second category, we had what we refer to as soft boundaries. So this is the way typically Hollywood movies are filmed, where every now and then you have a sharp transition between one frame and the next.
If you've never paid attention to this when you watch a movie, next time you watch a movie just reflect on what's actually happening. And you'll see that every three, four seconds basically in the movie Hollywood directors like to insert a cut. And therefore, there's a transition.
The narrative of the story is basically continuous, but at the visual pixel level, at least, there's a major transition from one frame to the next. And we refer to this as soft boundaries to distinguish them from the third category, which we referred as hard boundaries.
So these are cases where we simply took a two completely distinct movies. And we just concatenate it together four seconds of the first movie and four seconds of the second movie. So we refer to this as a hard boundary, because there is no continuous narrative between the first part and the second part.
So after subjects watched all of these video clips, they were tested in terms of what they remembered about the clips using two different tasks. In one task, which we refer to as the scene recognition task, subjects were presented with one frame, a single frame. And they were asked to indicate whether they had seen that particular frame during the video clips, yes or no, that is whether the frame was old or new.
So we randomly either selected a frame from the video clips or another frame that they had not seen. So with 50% chance, the frame was either old or new. And the second test was a time discrimination task, where we presented two frames, in this case, two frames that were actually from the video clips. And the subjects had to indicate which of the two had occurred first, whether the one on the left was first or the one on the right was first.
OK. In addition to that, we also asked for confidence level. So not only they had to say old or new or left or right, but they also had to indicate in a scale whether they were very sure, very unsure, less sure, or completely sure about their answer. OK. So here are some of the behavioral results. So here I'm showing the results for the scene recognition task.
Remember, they are shown one frame. They have to say whether it's old or new. Chance is 50%. So here is accuracy. So subjects performed with an accuracy, that's slightly below 80% correct, mostly irrespective of the type of boundary, whether it was a no boundary, a soft boundary, or a hard boundary.
For all of these clips, people were almost at 80% correct in this task. Their reaction times are shown here. They are also very similar across different video clips, as well as the confidence levels.
Next I wanted to show you a performance as a function of the distance between the target frame and the previous boundary. So for all everything that we're going to do comparing soft and hard boundaries, we're going to in the no boundary condition, we're going to align everything to the middle of the clip that is at four seconds, which is where most of the-- which is where all the hard boundaries were and where most of the soft boundaries were. So we're aligning the target frames to the previous boundary or, in the case of no boundaries to four seconds.
So here we are showing a performance for all the accuracy as a function of the time from the past boundaries in the case of number boundary. There was no correlation. However, in the case of soft boundaries and hard boundaries, there was a negative correlation, meaning that subjects remembered slightly better those events that happened close to the boundary compared to those events that happened a few seconds after the boundary.
So this soft and hard boundaries do seem to play a role at the behavioral level in dictating what people will or will not remember. OK, so now I want to show you the behavioral data for the time discrimination task. So here is the accuracy. And here there was a difference between the different types of boundaries.
So for the number boundary in the soft boundary conditions, subjects were, again, somewhat below 80%, probably around 70% correct in discriminating which frame came first. But interestingly, in the case of the hard boundaries, people were almost at chance. So it was very hard for people to discriminate the older of the two events when there was no continuity in the narrative.
So there's no logic basically to which frame comes first. And in this case, people were essentially a chance. It also took longer for subjects to try to recall which frame came first. And also subjects had lower confidence in these hard boundary conditions.
OK. So now we repeated all of these tasks in patients that had invasive electrodes implanted for clinical reasons. These are patients with pharmacological intractable epilepsy. And they are implanted with electrodes in order to localize where the seizures are coming from for potential surgical resection.
So we worked with neurosurgeons that implanted these electrodes. And what I want to talk about today is data that was recorded from this high impedance microwire. So there's this depth electrode that's implanted. And through the lumen of this depth electrode, there are eight microwires that are passed through.
And these are high impedance microwires that sometimes allow us to record the activity of individual neurons. So we have a pre-implantation MRI. There is planning of where the electrodes are going to be targeted based on clinical considerations.
The doctors insert the macroelectrodes. Then they insert these microelectrodes. And then we have a post-implantation MRI to try to localize where those electrodes are. And this is one particular sample of one of these very nice recordings that we obtained.
This is not necessarily typical. For many of these microwires, we get absolutely nothing. That is we get basically noise, but every now and then we get lucky, and we get beautiful recordings of spikes of individual neurons in the vicinity of these microelectrodes. So here's a sample of the data that the Jie collected in especially in these seven different regions.
And for the purposes of today's talk, I want to focus on the medial temporal lobe, specifically in the recordings of 343 neurons in the hippocampus, neurons in the amygdala, and neurons in the hippocampus, because we think that the medial temporal lobe might be particularly relevant for the formation of episodic memories, and to limit these cognitive boundaries. Should time permit, I'll show you data from these four other regions as well. OK. So here's an example recording.
This is an example from a single neuron. So what I'm going to show you now is the neural data aligned to this boundary events. So what you're seeing here is one point-- each of these dots corresponds to a single spike from this neuron. This is a raster plot. Each line corresponds to one movie-- one video clip.
So here these are not repetitions. Each video clip is unique subjects see each video clip only once. So here are the 30 video clips that had no boundaries. Here are the 75 video clips that had a soft boundary and 30 video clips that had a hard boundary.
And what you can see here is that this one here shows that there was an enhanced firing rate from this neuron, both in the soft boundary condition, as well as in the hard boundary condition, but not in the no boundary condition. Again, in the number boundary condition, time 0 corresponds to four seconds. That is the middle of the clip.
So this is a response was quite remarkable and robust. As you can see here, this is the raw data. So you can see that almost in every single trial, you see that there is this neuron fires closely after boundary.
So we refer to this as a boundary cell, as a B cell. And now I want to show you an example of a different type of neuron that the Jie also found, and that's shown here. So this neuron that she calls an events cell or an E cell fires exclusively during the hard boundaries, but not for the soft boundaries. So even though at the visual level, there's a major transition from one frame to the next. This neuron only fires when there's a discontinuity in the narrative of the story and not in the soft boundary case.
AUDIENCE: Gabriel, can I ask you a question.
GABRIEL KRIEMAN: Yes. Absolutely. Yeah, sure.
AUDIENCE: So this is relative to the boundary. So at the onset of the movie and the offset of the movie, what happens?
GABRIEL KRIEMAN: Great question. And I'm pretty sure that if that's not the next slide, it's coming up in a couple of slides. So if you can hold-- that's a fundamental question. I'll get to that.
And it's actually pretty mysterious, I think. But I'll show it to you anyway. So we were worried that this were purely visual transitions, that if you were to recur from the retina, you may also see a large abrupt changes, particularly in the context of a neuron like this one.
And the question that Carlos-- just to pre-empt what I'm going to show you in a couple of slides, the question from Carlos is quite apropos, because the fact that this neuron will not show a response at the beginning and offset of the video clip suggests to us that it's not just a pure visual transition that we're seeing here. And moreover, this particular cell in the E cells, because they don't fire in the soft boundary conditions, we think that this is not just a reflection of a visual change between one frame and the next. We were still concerned that maybe our boundaries are different from some soft boundaries.
Maybe there's a larger change in contrast, a larger change in color. There may be many, many visual features that are distinct. So we spend quite a lot of time-- and if anybody is interested, I would be happy to show you some of the results. We spent a lot of time building models to try to predict and differentiate soft boundaries and hard boundaries.
The short answer is that there is no obvious change that distinguishes soft boundaries and hard boundaries in terms of low-level visual properties or anything that we can easily detect with any kind of computer vision model. So we think that the main distinction has to do with the discontinuity in the narrative here, rather than the change in particular visual features.
OK. So I think this is what's-- I think this is-- oh, no, before I get to Carlos question-- so I showed you one example cell. So here I'm showing you the activity, the average activity of 42 different B cells and 36 different E cells, all of them.
So here's a color map on the right here. You see the color scale map. So yellow means high-firing rate. Blue means low-firing rate.
So here you see the activity of all of these cells. Again, in the B cell, they respond basically, align to the time 0 both for the soft boundaries and hard boundaries. For the E cells, mostly firing is restricted to the hard boundaries, but not to the soft boundaries.
Here are the responses. Here getting that Carlos question. So we were interested in what happened at the onset of the clip and the offset of the clip, again, reasoning that if we were recording the activity of a neuron in the retina, we may also see large changes due to these transitions.
So here's the responses of the B and E cells when, instead of aligning them to the boundaries, here we align them to the onset of the clip. And basically, nothing happened here. This is the now the activity aligned to the offset of the clips.
Again, they're basically throughout the visual system. They're are many neurons that respond very strongly to the onset of a stimulus, as well as to observe the stimulus. And, again, we don't see any obvious response from these cells to the clip onset or the clip offset in stark contrast to the responses aligned to these internal boundaries during the movie.
OK. So this was, I have to say, that this was quite surprising to me, because I would argue that I would contend that the onset of the movie is also some sort of episodic boundary, something happened. In fact, I would argue that a change in the narrative as well. And similarly, I would make the same argument about the offset of the clip.
So I don't have a good explanation. This is the observation. These neurons do not seem to care too much or respond very strongly to the onset and offset of the clips, much to my surprise and the surprise of all of us and the surprise of all the reviewers that actually commented on this paper.
But somehow these neurons do not respond to this. There are lots of other neurons. It's not that we just don't are completely unable to detect response to the onset and offset of the clip. There's lots of other neurons that do respond quite vigorously to the onset and offset of the clip, just not these ones.
OK. So next. Yeah. So just to show the dynamics of the response, I just wanted to point out that on average-- and, again, these are non-overlapping populations of cells, the E cells and the B cells. And on average, when we compare the responses to both types of boundaries, the boundary cells seem to respond earlier compared to the-- the B cells seem to respond earlier than the E cells.
OK. So here is the latency. The peak firing time for the E cells is almost 100 milliseconds later than the peak firing of the B cells. Again, all of these numbers, 300 milliseconds and 197 milliseconds that's aligned to the boundary. So the last point I want to make is-- we have to, remember, I told you that we conducted this memory tests.
So after they watched all the video clips, we asked people what they remember in the scene recognition task, as well as the time discrimination task. So we can ask whether the activity of these neurons is predictive of whether the subjects will get it right or wrong for any one video clip. So here's the example B cell that I showed you before.
And now I'm going to separate the trials into those trials that are subsequently correct or subsequently incorrect in the scene recognition task. OK. So here it is.
Remember that people, subjects were about close to 80% correct. So there are many more trials in the correct category compared to the incorrect category. And quite remarkably, the firing that we saw that aligned to these boundary events was much more prominent and almost absent in the incorrect trials.
So here's the average firing rate. Again, here you see the increase in firing rate in response to the soft boundaries and the hard boundaries compared to the number boundary condition. And the empty bars here correspond to the incorrect conditions, where you see that there is a much weaker increase in firing rate with respect to the number boundary condition.
So somehow these neurons show a distinction that activity of these neurons during the encoding of the movie, that is viewing the movie, was correlated with subsequent memory performance in the scene-recognition task. This was not the case for the E cells. But in the case of the E cells what Jie realized was that there was a correlation between the timing at which the neurons fired and the ongoing fit oscillations and behavioral performance.
So some of you may know very well that there has been extensive work, especially in the hippocampus, but also in other regions within the medial temporal lobe, showing that the specific phase at which a spike occurs with respect to the ongoing oscillations in the local field potentials. And the theta frequency band can be used to decode information, specifically in the relevant literature, this phase has been strongly correlated with the position of the animal and has been used to do to decode the navigational cues for the animal. So what Jie did was compute from the same microwires in addition to high-pass filtering and getting the spikes, we can low-pass filter the data, get the local field potentials, filter those local field potentials in the theta frequency band, and then align each spike to the ongoing local field potential oscillations, and compute the phase of the spike with respect to the theta oscillations.
So that's what is showing here. So on the y-axis, now you have the face of the spike as a function of time with respect to the boundary as well. And we see that there was this concentration of spikes somewhat near the 180 degree phase. And this effect was, again, most prominent for the correct trials versus the incorrect trials in the time discrimination task, but not in the scene recognition task.
So this histogram that you are seeing here is a distribution of the face of the spikes with respect to the theta oscillations. And there is no consistent information in the timing of the spikes you would expect to see a uniform distribution, which is what you see here for the number boundary condition, which is also what you see here essentially for all the incorrect trials. However, in the correct trials for the soft boundary case and the hard boundary case, there's a more predominant concentration of the spikes near the 180 degree phase.
So the phase of the spike with respect to the ongoing theta oscillations correlates with whether subjects will subsequently get the time discrimination task correct in the task, but not with the scene-recognition task. So we quantify this by a typical metric, which is the mean result and length. Basically, you can plot all the faces in a circle, add up all of those and compute the mean result in length. If all the faces are distributed randomly, the resultant of this vector addition will be very small.
If all the faces are perfectly aligned, these vectors will adapt. And then we'll have a large mean result and length. And that's what we're showing here on the left.
Again, in the case of the number boundary, we see that the mean resultant length is basically close to 0. In the incorrect trials, here the empty bars, they are also pretty low. And there's a stronger mean result and length, both for the soft boundaries as well as for the hard boundaries.
AUDIENCE: Gabriel, and just to confirm is the phase at all aligned to the stimulus? How does one get that alignment of the phase and the boundary as well?
GABRIEL KRIEMAN: Yeah. So this is showing, given a spike, we can ask where it is with the boundary. So that's the x-axis. And then at the same time for that spike, we can say, what's the phase of that spike with respect to the LFB?
So that's what it's being shown here. So, for example-- so this is a spike that happened, I don't know, 250 milliseconds before the boundary. And it had a phase of, I don't know, 300 degrees. So here we're showing all the spikes that happen between minus 0.5 and 1 second. And the y-axis is showing that.
AUDIENCE: OK. Thank you. And, Gabriel, at these spikes, the aggregate of several experiments or is this two? So are we looking at for each of these panels that all the spikes pulled from all the experiments?
GABRIEL KRIEMAN: So if you look at these-- so this is a single-- this is a single example. So there's no aggregate in here. OK. All right.
AUDIENCE: So this is the time elapsed from the presentation of the stimulus?
GABRIEL KRIEMAN: Time elapsed-- so this is a single neuron, single microwire looking at the time from the boundary. I'm not sure what you mean by this. This is the boundary.
All the spikes are aligned to this boundary event. And this is the phase. So then to quantify all of this, just to put a number to all of these, we compute this mean result and length.
So this is just going to give you one number for this neuron, indicating for each one of these different conditions how much alignment there is of the spike with respect to the theta phase. And then here we're averaging across the different cells to show the mean result and length for all the B cells for each of the different conditions. Does that answer your question?
AUDIENCE: Yeah. It's just that, when I look at the plot, especially the correct for the hard boundary, that resembles the phase precession plots that we usually get when rodents are walking through a place field. It might just be coincidence, but there's this seemingly progression as time advances that you get at a smaller phase.
GABRIEL KRIEMAN: Yeah. I think let me see if I can annotate this. So first of all, I should say that all of this is directly inspired by [INAUDIBLE] work and the work in rodents, of course. I think what you're talking about is this basically.
AUDIENCE: Right. Exactly. Yeah.
GABRIEL KRIEMAN: OK. Yeah. Yeah. So, yeah, I think that that's right.
So in this case, there is no movement. So in your case, you would say this is as the animal is moving in the arena or in the maze. You see this type of relationship.
So in this case, there is no physical movement. So you could argue perhaps that there's some sort of movement in cognitive space, whatever that is. So I agree.
I think this is quite interesting. Yeah. Is that the comment you were making?
AUDIENCE: Yeah. Yeah. I'm trying to think exactly about the conditions, right. There's no movement. And that's why my initial question was in reference to the stimulus, again, thinking about position for us, but just trying to think about--
GABRIEL KRIEMAN: Again, so the time 0 here is the boundary event. So that's what we're aligning everything to. So we could align all of this also to the onset of the clip and the offset of the clip. We may have done that. I don't actually remember if we have or not.
So we are always aligning to the boundary events. I would be happy to talk more about this. All of this [INAUDIBLE] directly inspired by the work in rodents for sure. Yeah.
So that's mostly what I wanted to show you in terms of the data. There are a few other analysis that I didn't want to show. If anybody's interested, I'll be happy to show the paper, which is now in bioRxiv.
So just to summarize, these event boundaries enhance the accuracy of recognition of nearby events. So the closer you are to the boundary, the easier it is to remember the events. The hard boundaries also show-- lead to an impairment in the memory for the temporal order of the events.
So when we have two events that are disconnected from each other, it's hard to remember the order when there was a hard boundary in between. We showed that there are some boundary cells that detect sharp transitions between adjacent events. We don't think that this boundary cells are purely visual neurons, because we don't see them responding at the onset of the clip or they observe the clip.
And we also cannot quite predict their activity directly from properties of contrast, color, shape, et cetera. Perhaps even more strikingly, we have this event cells that detect sharp transitions when they're disconnected with each other. And we think that this may be candidates-- to go back to the very beginning of my presentation, these may be candidates to detect this and segment our continuous narrative, our continuous live stream into discrete events.
So these events cells do respond during these disconnected transitions, but not during all transitions. And then the firing of both the B and the E cells correlates with the subject's memory performance in the case of B cells. There was a correlation between their activity, their firing rate, and performance in the scene-recognition task. In the case of the E cells, those correlated in the timing of their activity with respect to the ongoing oscillations and subsequent performance in the time-discrimination task.
OK. That's all I wanted to say. And, again, just to emphasize with one more time, all of this work that was done by Jie. And I am doing-- just being an ambassador here and describing her work, because unfortunately she couldn't be here today.
Associated Research Module: