Efficient representation, learning, and planning through abstraction: clustering cognitive spaces into submaps
September 15, 2021
September 15, 2021
All Captioned Videos Brains, Minds and Machines Seminar Series
Abstract: Episodic memory involves fragmenting the continuous stream of experience into discrete episodes. Not coincidentally, the hippocampus, which plays a central role in both episodic memory and spatial navigation, represents large spatial environments in a fragmented way even when explored in a continuous trajectory. In non-spatial and non-memory contexts too, humans report sudden contextual re-anchoring or re-orientation when reading garden path sentences (“Time flies like an arrow, fruit flies like a banana.") or watching a movie with viewpoint changes. In this talk, I will describe a theory for the online and real-time generation of fragmented representations and contextual re-anchoring from continuous experience that resemble those obtained by principled but offline and computationally complex information-based algorithms. The resulting fragmentations closely match those observed from neural recordings in animals navigating through complex environments. I will discuss the utility of map fragmentation, as a form of state abstraction that enables representation fidelity, flexible and rapid learning through reuse of existing fragments, and many-fold improvements in the ability to plan and navigate through complex environments relative to more global representations.
TOMMY: --grid cells are a component of the navigation system of mammalian brains. I will not mention all the awards she got, but I will say that I think what she has been doing is how computational neuroscience should be done.
Too often in these times of machine learning and deep learning, researchers take off the shelves model networks developed by engineers who do not have any idea of the brain and then hope that they will explain data from the brains and neuroscience without even asking whether their components make any sense biologically.
This is dangerous. I think personally it's extremely unlikely that a model developed to leverage gradient descent and GPUs magically would be exactly what the brain does and what evolution has discovered. That's my personal opinion, but I think it's even more dangerous to be just fitting data without critical experiments at every step of the modeling, something that Ila's been doing in all her work.
This love affair with deep networks reminds me of the love affair that neuroscience-- actually, vision scientists had 30 years ago with Fourier channels. Some of us remember that, and the people who were really pushing this view were good biologists who were not theoretically sophisticated and got in love with a very narrow piece of mathematics. So much enough that they wanted everything to be explained by it. So at the time, it was Fourier channels, now it's deep networks.
Ila is a new member of CBMM. We are welcoming her to be part of CBMM, and I hope she'll be instrumental in steering CBMM towards the new focused goal of CBMM, which is really the science of intelligence. We leave the science and engineering of intelligence, which was the original vision of CBMM, to the quest. And I would like CBMM to focus on the science of intelligence that is understanding the computation the brain is doing and the circuits, the neural circuits, underlying them.
Ila, great to have you.
ILA FIETE: Thanks so much for that, Tommy. That was fantastic, and thanks to all of you for being here. This is my first in-person seminar in 18 months, and I'm really honored to be giving the first CBMM seminar back in person as well. So thanks, Tommy, for both the introduction and the invitation, and it's very much a pleasure to be part of the CBMM community.
And I had to make a tough decision when deciding what to talk about today because there are some really, I think, interesting-- Speaking about the science of intelligence. --directions at the nexus of neuroscience, and intelligence, and machine learning, and cognitive science that various people certainly in my group are interested in that I won't have a chance to talk about. And if you want to talk about some of those things, my lab members are over here, many of them. So shout-out to them.
The things I won't be talking about are the spontaneous emergence of modularity in systems. So bottom-up pressures for modularization, and how with very little training data, you can get spontaneous modularity, how that can help with compositional learning. And also questions about biologically plausible learning rules and proofs that there are alternatives to back propagation that can do pretty efficient learning that may be implemented by the brain.
So there are many projects that I'm sorry I could only talk about the one today, but I hope that in the future, I have a chance to discuss with many of you. So what I will talk about today is a question that we came to first by looking at grid cells, but I think it applies to many different domains of cognitive science.
And it's this question of generating abstractions in the brain and specifically the question of how we take our continuous stream of experience, which is just a flow of experience, and segment it into things that we think about in memory like episodes, how we structure spaces, and how, in general, we construct these structured representations of the world around us. And then how we use those structured representations and segmented representations to do efficient things in learning and planning.
So that's what I want to tell you about today, and here's the first observation, which comes from neuroscience and specifically from grid cells that led us to think in some of these directions. So this is an example. What you see on the left are these spike plots of the response of a grid cell as an animal explores an open arena over there, and here as an animal runs through this hairpin structured maze. And this is actually the same grid cell that's recorded in these two environments.
So it's the same grid cell recorded in this open arena environment as well as in this hairpin maze environment, and what you can see here is it looks like a canonical grid cell. It's got these periodic responses. But if you look at the response here in this hairpin maze, I think what you can clearly see is an alternation in the patterns of activation.
So that in alternate arms, it has a certain pattern of activation, and in the other alternate arms, it has this other pattern of activation. So even though the animal-- This is a rat running through the space in a continuous way. It's making this continuous exploration trajectory through the space, and the space itself is a continuous space. Nevertheless, it pulls up these alternating representations for these two arms.
And now this is a two-room environment connected by a hallway. The two rooms are conceptually very, very similar to one another, so there's not any local landmarks or other things that make them very distinguishable. And so the animal can run around here and then exit through this narrow passage into this hallway, and then come back into the second room, and so on.
And it freely runs around. There's no task in any of these. The animal just runs back and forth foraging for food. And what you see here, I think, is that if you look carefully, you can see that whatever the grid pattern here is in this right side room, it's the same. Even the phase relative to, say, the right wall is the same in this left room. So the two grid patterns are identical.
So it looks like what the animal is doing is it's representing these two spaces with the same map, but if you look at the pattern in the hallway, it's not a simple continuation of the pattern in each of these rooms. It's something else altogether. So it looks like there's a disjoint representation, and you can also see that here between these two rooms, it's not a continuation of the pattern in the right room that's in the left room. It's actually just a reuse of that pattern from the right room and the left room.
So that's what seems to be going on, and our interpretation of what this is that in the neural space, the animal is constructing some neural representation or map of this arena over here. And so the blue is this neural space. Whereas here, it's constructed two different representations for the even arms and the odd arms, and it's just doing this transition.
It's jumping between these two maps that it's got internally in its brain, and it's transitioning between those two. Whereas in this two-room hallway environment, it's constructed a room map and then a hallway map, and then it goes from the room map, to the hallway map, and then back to the room map when it goes from the right room to the hallway to the left room. So it seems like it's basically doing these transitions.
So the question, though, that this brings up is how does the brain do this? How does it choose when to make these fragments? Considering that it's just continuously moving through these spaces, how does it decide to lay down these different maps? And also, why does it do that?
So one of the famous stories about grid cells is that they are maintaining this locally Euclidean representation of space, and they systematically update their phase as the animal moves around within a space. And so they're performing a velocity or path integration mechanism which allows them to do things like construct novel shortcuts through paths and do all the things that you can do with vector integration.
But once you've broken that, once you've made a fragmentary map-- So you've introduced this discontinuity in the representation, it's no longer possible to do those shortcuts across that discontinuity. So if the purpose of the grid cells is to do this shortcut path integration-type function, then why are there these breaks? So these are the two questions that come up.
And now I want to just point out that these questions or these observations about fragmentation and how we represent the world are not just confined to grid cells. They occur in all kinds of domains in cognitive science. And this is literature that I'm only learning about, but I was fascinated to learn that subjects are shown a movie of a person just doing something relatively routine, like in this case folding laundry. In this case, the person is building something with LEGO.
So these are just stills from the movie. But each one is an unfolding movie, and then of course, there are different episodes of different movies. Then it turns out that viewers will tend to describe-- If you ask them, they'll say OK, this movie had these parts, or segments, or there were these events, and this is where I might lay down some event boundaries. And it turns out that from human-to-human, the report of where the event boundaries lie are largely consistent with each other.
And moreover, if you look at the neural signature-- So if you don't ask them to explicitly report an event, but you just instead look implicitly at whatever the neural signatures are of segmentation, there are large signals that can be picked up in FMRI or in other recording modalities that signal the presence of segmentation of this continuous movie over here.
So basically, in various different fields, these things that I'm calling submaps or these fragments of these maps, these broken maps or fragmentations, are called schema or frames in cognitive science. And basically, a schema is an entire frame of understanding the world in the current context, and/or situation, or location, and this schema brings into play everything.
So once you put yourself into one schema, you are bringing in a baggage of all kinds of stuff, like semantic information, information about probabilities, information about relationships, and you can use all that baggage, if you will, as a way to do fast inferences about things that are ambiguous in the world.
So if you can only partially see some occluded hand movement, and if you know that the person is in the process of folding laundry, you can make some inferences about what that hand movement corresponds to. Whereas if that person, you knew they were doing LEGO, you might have a completely different inference for the same movement, the same partially occluded movement.
So clearly, there is a real utility to having these schemas because you can bring to bear all your past knowledge on the current context of situations. So it seems like we do a lot of that, and what I'm showing here is I'm just showing you another example from the spatial domain.
So schemas can also be useful, or these submaps can be useful, because in the world, there might be often structures where you walk through the hallway, for example, of our building. This is not a hallway of our building, but it looks like a hallway of our building here where many of the rooms that go off the hallways look very similar to one another.
And again, if you want to do a track, like a job, like if I were the custodian, I might have to remove the trash from all those rooms. And I roughly would know where the trash can is in all those rooms. So again, I would want to situate myself in the same schema or the same map each time I enter each of those rooms.
So here's another example of partitioning a continuous stream of experience and reanchoring yourself in a whole other framework or schema as evidence rolls in or in time as things go along. So when people are watching movies with viewpoint changes, someone walks into a building, then people will report an event boundary. Similarly when you parse sentences.
So here's an example that Roger Levy, my colleague, loves to use. And I just like the example a lot, so I'm using it. I'm stealing it from Roger. The woman brought the sandwich from the kitchen, tripped. So once you see the word tripped, you realize that your parsing of the sentence was a little bit off.
It's not that the woman is bringing the sandwich from the kitchen, it's not a subject, verb, object sentence, but it's actually-- It's the woman who brought the sandwich from the kitchen tripped, so the tripped reanchors your whole frame for your ability to understand that sentence.
Another fun example is this other garden path sentence. Time flies like an arrow. Fruit flies like a banana. So again, it drives a change in your schema or your map in trying to understand what's going on. So yet another example of how we segment the world, and then insert ourselves, and use those pieces to bring in all of this other information that we have about that segment of the world.
So what are some hypotheses for what are the main uses of these schema? So we already talked about interpretation of ambiguous or incomplete data based on this previously acquired information about each schema or each submap. And also, you could learn new maps. You could say this is an entirely new situation. I should learn a new schema or a new submap for it.
Another thing that you would do in working with submaps or schemas is to learn something about the transition structure between submaps or schema. So if you actually segment the world, a continuous stream of experience into chunks-- So now those chunks now are a form of abstraction. And now it's possible to now not just learn temporal ordering or transition structure at the fine scale, but now you can learn about transition structure at a coarser scale. So now you're able to do things much more abstractly like planning and organization in terms of these submaps.
And so when I talk about structured representations, what I mean is both a set of submaps and their anchors in the world which include the relationships of states within the submaps and also the relationship between different submaps and the global connectivity structure of the submaps. So that's what I mean by a fully structured representation. You've partitioned the world into submaps, and you've learned something about their connectivity and topology.
So I want to try to squeeze in four different parts to the story, and this is work done with some really, really fun collaborations with some of my students and post-docs, [? Sue ?] Sharma and [? Sartag ?] Chandra, and [? Mirko ?] Klukas, also a postdoc in my group, and the later two projects are also collaborations with Leslie Kaebling, and Josh Tenenbaum, and also two members of Josh's lab, [? Aaron ?] Curtis and Martha Kryven, and also [? Yielan ?] Du from Leslie and Thomas's labs.
So this has been a lot of fun to work together, and I hope I can do it some justice in my descriptions. And by the way, all of this is really new. None of this stuff that I'm talking about is published. So it's pretty unpolished, but I figured this is a little bit in the spirit maybe of fostering some discussion within CBMM.
So the first thing I want to tell you about is actually let's just talk about single item memories in the hippocampal-entorhinal system. And so here the thing that I want to convey is that the notion of structure here is coming from the prestructured and very rigid geometric grid states that seem to be conserved across environments, across time in animals. And so there isn't a lot of change or plasticity in the grid cell states themselves.
So those are prestructured states, and I want to describe how having these prestructured states can help with novel architectures for memory. And then I'll talk about three other things. So I'm going to talk about how we can then form these segments or build these submaps, or build these fragmentations of continuous experience and try to come up with a theory that is consistent with existing data. So we'll talk about a possible theoretical model for generating fragmented submaps and comparing them with data.
The third thing I'll talk about is the leveraging, these learned submaps, with their learned transitions to do large-scale hierarchical planning. And finally, I'll talk about a little bit about how we can reuse existing submaps to do much faster compositional learning and again planning in the real world. So let's just start first with talking about structured grid states and how that can lead to a new model or architecture for memory.
So here is an architecture for memory that's inspired by the circuit of the hippocampal-entorhinal circuit in the medial temporal lobes. So the idea here is that we've got-- Let's start with grid cells, and then we can go to the abstract version. So place cells receive input from grid cells in the entorhinal cortex, and they also receive-- Let's assume that they also receive from the lateral entorhinal cortex direct high-level sensory data. And the sensory data are the things that we want to store in memory.
So these are the things that are arbitrary inputs, and these are things that are prestructured states. So our memory architecture looks like basically projections from recurrent projections within grid cells that maintain these structured states, and then a recurrent connectivity between grid and place cells where there's a random projection from grid cells to place cells.
So basically, we can think about these structured patterns going into a dimension expanding random projection into place cells, and then a learned projection back to good cells that enforces consistency in the states between whatever the random projection drives in the place cells. And then the corresponding grid cell pattern that drove that pattern in the place.
So this back projection is learned. So there's a recurrent loop here but being grid and place cells, and then in addition, we assume learned connections from place to sensory cells and sensory cells back to place cells. So basically, place cells or hippocampus, we can think of as a conjunction of these structured grid states and learned projections from the sensory stream.
And the grid cell states, we can think about it as a large dictionary but a compact one of prestart structured states. So one property of grid cells is that they have multiple different modules. Each module has some set of states, and as a combination across all the modules with different periods, the total number of states is a product of the states in each of the modules. So it grows exponentially with the number of modules.
So this is some representation that has exponential growth in number with the number of modules, so it's a pretty compact representation for a large number of states. This is a large network, and this network can be larger still. So more abstractly, basically what you need is we're considering a memory system that has a small high-level network that generates fixed labels that have a modular structure. And then some kind of dense code over here built by random projection from these structured labels, and then these are the inputs that we attempt to put into memory.
The dynamics of this network, the updating will be so you have an input. The input comes in. There's learning that happens. The only learning that happens is the one-shot learning from features to hidden states and back.
And then now if you give a partial input, then the way we present it is we present a partial input, and then update the hidden states here in this network, and then with one pass. And then there's a recursion here between the label network and the hidden state network. So this is a recursion up to a fixed point, and then whatever the fixed point is that's reached now is used to reconstruct the features.
So that's the structure of this memory system, and I'll just assert-- And I'll show you in the next slide of this architecture yields, a high-capacity and robust associative memory. And moreover, unlike existing models of associative memory, it yields a memory continuum, which means that this model of associative memory will give you a trade-off between the number of memories and the amount of detail that is stored for each memory.
So this is in direct contrast with Hopfield networks that have basically perfect recall of every memory and then catastrophic drop-off. And it's very different from human memory where we're really terrible at memorizing high-dimensional detail. We can't memorize white noise on a TV set, but Hopfield networks will tell you that as long as you have few enough white noise patterns on a TV set, you can memorize all of them. And then if you have one more than that, it's over.
Whereas what we really probably can do is remember things in reasonable detail, and as we overload our memory, we just have a degradation in the quality of the recall. So let me just walk through then some of the main results. So let's just focus on the first part of this memory system, which is actually the structured part of it.
So we've got just this system that takes grid states and then makes a random projection to this place cell layer and then has learned projections back. So this system, what's going on here, again, is like the modules are each an attractor network. Each of the states is a fixed point of the dynamics within a module, but the modules are not connected to one another.
So in fact, the states that are cross-module states, like that combinatorial state, is not itself a fixed point of the dynamics but coupling with the hippocampal network makes it a recurrent network that has states that are fixed points. And in fact, the capacity of that code, if you see on this log linear plot, the capacity of that code is growing exponentially.
So then this is not just number of patterns, but this is number of actual stable fixed points that are patterns is growing exponentially with the number of grid modules for even a relatively small number of place cells. And here's the dependence on number of place cells. So with something like 100 place cells, it's possible to have a memory for 1,000 distinct grid cell patterns.
So 1,000 of the grid cell patterns are now fixed points of the dynamics of this recurrent system. Now, this is not a violation of Hopfield capacity results because these are not arbitrary patterns. These are structured patterns. So yes, you can have order of magnitude more patterns that you store than number of neurons in the place cell network, but it's because these are not arbitrary patterns. They're structured patterns.
So here is the structured patterns from grid cells. Here is the randomly projected driven patterns in place cell networks. These look really random, but if you actually look at the correlation matrix, you can see that the correlation matrix really echoes the correlation matrix of the input over here. So there's a lot of structure in the system.
So this is not yet an associative memory because we're not putting in anything that's arbitrary or user-defined. So now we're going to hook in the sensory input. So we're now hooking in that last part, which is we have this network over here. We've got the sensory input, then the hippocampal layer, and then these structured label layers.
And so what I'm showing you here now is that if you look at the capacity plot for this, network, so this is showing you as you put in more and more patterns into the network, how well does the network recall the correct pattern? What is the overlap of the recovery pattern starting from a noisy point? What is the recovery pattern, the overlap with the true pattern of the recovery pattern?
So an overlap of one means it's perfect recall. An overlap of less than one means that this is corrupted recall. So what you can see is that these are the standard Hopfield networks, and they do a great job up to some point at which point they crash. And so this is your very hard threshold Hopfield networks.
And this is an alternative architecture in green. This is just a simple autoencoder network that can be used as a memory network. So this beautiful work by Caroline Mueller and others here at MIT that characterize the performance of an autoencoder network that bites its own tail as a memory network. So this memory network performs, interestingly, also with a similar Hopfield-like dynamics where it does well. And then at some point, it just crashes.
And on the other hand, this network here has this graceful degradation of the overlap with the correct pattern even as you stuff many more patterns into it than are possible to recall perfectly. And so the interpretation here is the standard Hopfield networks, as well as this network, can do a great job. So these are all the different patterns that are stored as fixed points here, and as long as you're well under some number of patterns, you do a perfect job recalling all the patterns.
But in general, for Hopfield networks, if you start with an incomplete or corrupted pattern, and you have stuffed in more patterns in the capacity, then the system will move you to a whole other basin of a different fixed point, a different pattern. But in this network, the degradation seems to be that you just move further away. But within the same basin.
So it's an interesting and a different behavior over here. And if you want to look at incident overlap, if you want to look at mutual information curves, that has a very similar performance. So the summary over here is this memory architecture is something different. It's based on recurrence, and it has qualitatively different behavior which might be more similar to the associative memory that we have in animals and humans.
So the next thing I want to tell you about is actually what I'm motivated this talk by, the observation of these fragmented maps of space and how it is that a simple-- Are there simple theories or models for the creation of fragments that can reproduce neurophysiology data?
So here is, again, what you saw earlier, and the question is can we articulate a model or a theory that can reproduce the fragmentations that have been observed here? So we proposed a simple hypothesis, which is that online-- So first of all, when we go through the world, we go through and have to generate representations in real time.
We don't go through the world, and then store all our experience in some buffer, and then later segment the world or segment our experience. And in fact, the neurophysiology data seems to be consistent with this too. When an animal's first exploring an environment, they already generate segmentations of the environment in an online way as they're first moving through it. So whatever model we build must be an online model that allows these segmentations to happen in real time as you're experiencing the world continuously.
So the model has two components. So the first component is a short-term memory-based prediction. So the high-level view is that when we move through the world and make predictions about what we expect to see next, and when predictions are poor, that's when it's time to lay down to make a fragment, a fragmentation. So this is an online fragmentation based on short-term predictability or surprisal.
So for concreteness, here's an agent moving through this two-room environment. The past trajectory is this line, and then the agent's current location is given by this dot. And this is the observation model, so the agent observes the world around it. And in this case, it's got omnidirectional observation, so it has a 360 view. That's not that important.
So the agent makes its observations. We're just assuming a really simple occupancy map type observation where basically, the agent can observe just idiothetically centered, what distance all the walls are in the environment relative to its current position.
Those of you who know about spatial representations like boundary vectors cell type representations of the world. So this is the agent. This is the current observation of the agent at that location. They can see this part of the environment. And we assume that the agent has a short-term memory, and the short-term memory consists of recent past observations that the agent has made over its trajectory. And it adds new observations to that short-term memory by shifting them according to its movement.
So it's moved. It can integrate its velocity and compute how far it's shifted, and then it'll add to short-term memory the appropriate location what it's seeing now where it's seeing. So this is the short-term memory. It's just a simple linear filtered average of the past observations, and that's what it looks like. And you can see that you're seeing a lag in the short-term memory because it still has some of the stuff seen from earlier in the trajectory.
So the difference between what's predicted from the short-term memory and the observation is it tells you how much is explained or not explained by the current observation. So we can make a plot of how much is explained about the current observation from the short-term memory.
And when that dips below some threshold value-- And in this case, what we do is not just when it dips below the threshold value, but it comes back above threshold. So when the predictability has restabilized, then we make the decision to fragment the map. So these are our remapping events. So in other words, comparison of observation with prediction from short-term memory leads to fragmentations.
And this is very, very similar to theories out in the cognitive science literature. Event segmentation theory where events are segmented based on predictability. The luxury we have is that because we're looking in the spatial domain, there's a lot of nice neuroscience, detail grid cell representations, and things that we can compare with.
So the second component of the model is now you've made-- So you've made a fragmentation decision based on surprisal or predictability, and now the question is now that you've made that decision, what do you do with that decision to fragment? So you need to now pull up a map, and the question is which map should you pull up for this new-- or submaps should you pull up for this now that you have chosen to switch maps?
And so this model involves a long-term memory. So you have a long-term memory buffer that includes past observations, so Z refers to observations. So this is just a simple lookup table with a list of past observations, and these past observations are linked together with past estimates of the estimate of position at that observation time. So you've got observation and position.
So this is the full table of long-term memory, and it can fall off from the end. So it can be a finite buffer for long-term memory. And again, this can be instantiated by the hippocampal-entorhinal circuit that I just talked about in the first part of the talk. But for simplicity here, we're just going to keep it as a table.
I'll come back to this little part in just a moment. And so now given a new observation, zt, you query. You look through the memory and ask does this zt, is it similar to any of the observations I've made in the past? And also, is the estimated position, x of t, is that similar to any of the corresponding positions that I've visited in the past?
And then based on those similarities, you make a score. And then with a probability proportional to the match, you retrieve one of those existing maps. And with the finite probability, you actually create a brand new map that has not been created before. So this is reminiscent a little bit of a Chinese restaurant process or a distance dependent Chinese restaurant process.
So there's actually nice work from [? Jone ?] Sanders, and Matt Wilson, and. Sam [? Gershman ?] on also explaining some different aspects in the same environment when you might choose to pull up a different map using the Chinese restaurant process type work and also [? Rylan ?] [? Schafer ?] from my group has done some theoretical work on online reference3 with Chinese restaurant processes. So just a shout-out for that.
So anyway, so this is the idea. So you pull up an existing map that you've seen before based on similarity of your current observation position estimate, and you build a new map with some probability. So now that's the model, and we can now run it. We can take it for a test drive and see what kinds of neural responses it predicts in different environments.
So we can just make some synthetic complex and organic environments, and here is the agent trajectory. So it starts here, and then it moves through, and then the agent ends up over here. And now we're just plotting that prediction signal or explained curve, and you can see, again, that there are going to be some fragmentation decisions here. And these fragmentation decisions are happening at the red points shown over here.
And so that's what's going on. And if you look at the neural representational space, so this is the space of grid cells and where the grid cell states are, then what's going on is that the system is jumping between some representations in this great region and the high-dimensional neural space to this region and then jumping back.
So basically, there are actually four major regions in the representational space between which this agent jumps. And these are different grid cells of different scales, and you can see these are the predicted [INAUDIBLE] curves.
So even though it looks like maybe it could be consistent with a single grid cell uniform tiling of the space, but really, in this model, there are actually four map fragments that lead to it that correspond to these four different regions of the room. One, two, three, and four. These four spaces.
So this biologically plausible rule is producing== I want to make the point in the next slide to show you that fine, it does some segmentation. But first of all, how does it compare to neurophysiology? And also, how does it compare to maybe some more normative algorithms for clustering?
Because we can think about segmentation as clustering. It's a clustering problem. You see the world, and then you want a cluster of similar things next to each other and dissimilar things into different clusters. So we can apply also offline clustering algorithms and compare how well this online real-time clustering does compared to those.
So what you're seeing up here is real-time clustering, this algorithm for clustering, in real-time but short-term or long-term memory. And this is an offline clustering algorithm, the well-known DBSCAN algorithm. So you can see that run in the same environments, the clusters that are produced-- And these are the red parts where there's a fragmentation. You can see these look quite similar to one another.
And then to the question of the comparison with neurophysiology. Here are the results that the model produces when run in this hairpin-like maze. And you can see that basically, when the model agent runs through the hairpin maze starting left to right--
In other words, there's a head direction cue that points upwards that way. Then it chooses two distinct maps, and it just transitions between these. And it just goes across, and you can see that in the rate maps. And whereas if the agent actually runs in the opposite direction-- So you can think about that as the agent going right to left in the maze, or we can just think about flipping the whole thing over. Either way, it's a different pair of two maps that show up, and indeed, in the experiments, the agents do the same things.
And it turns out that we can run this in a few different environments, and we get similar fragmentations in the agent as seen in animals in the grid cell responses. So now I want to try to say what can we predict in the neuroscience, also, from this beyond reproducing the fragmentation that have been produced? And how does this contrast with maybe other existing algorithms for spatial representation or structured learning?
So here are the responses of three model cells in this model, and these two cells have a similar scale, C1 and C2. And cell C3 has a different period. So this is from a different grid module, and these are both from the same grid module. So what is going on here?
First of all, just to make the point that if you look at-- So this region, we'll call R for room right, and this we'll call H for hallway. So this is for the single cell. If you look at the cross-correlation of its response in the right room versus the hallway, the cross-correlation, this is the center. The origin of the cross-correlation would be right at the center of the circle, and you can see that these maps are not in register because there isn't a peak at the origin.
So clearly, there is a fragmentation between right room and hallway in the cell's response. But what's very interesting here is that now if you look at the cross-correlation between cell one and cell two, the cell correlation within the right room, then they have some offset.
These two cells are off-center phase. You can see that in their firing rate maps. They're offset in phase, so they don't have a peak in the origin. And they have a phase offset relative to each other of some amount shown by this pattern over here.
And now if you look at these same two cells and look at the cross-correlation of their responses in the hallway, then you see that the cross-correlation is, again, it's the same exact cross-correlation in the hallway as it was in the right room. In other words, the cell-cell relationships are unchanged, completely unchanged here, by construction. Because both maps are fragmenting at exactly the same place.
So this map is fragmenting in exactly the same place, and the grid cells are like this unitary grid network. So this is what's going on over here. And so the reason I bring that up is actually to draw a contrast.
So first of all, experimental data shows that cell-cell correlations between grid cells are preserved across environments and across environment fragments. They do not change. Even when there are fragmentations in the response of single cells. But the cell-cell responses do not change, and that's what we're seeing here. So there's a fragmentation per cell, but across cells, the cell-cell relationships remain.
So that's a fundamental property of the data, and we can look at other ways of doing structure learning of the world. So two popular ways to do this are to do successful representation or graph Laplacian So successful representation has been used in lots of reinforcement learning stuff introduced by Peter [? Dian. ?] Sam Gershman has applied it to grid cells and so on.
And so the idea there is that you construct a transition matrix, and then you can construct a discounted transition matrix that you add up over time. And then you compute the eigenvectors of that, and the eigenvectors have been suggested to be a model for grid cells.
So for example, this is one eigenvector of the-- I forget if this is successor representation or graph Laplacian, but they're the same under this random expiration policy. So anyway, it's either one. So you get one predicted grid cell response over here, another predicted grid cell response over here.
And what you see here is that if you look at the cell one to cell two, the cross-correlation between these two cells in the left room, then in the left room, they are out of phase. So they don't have a peak at the center, but in the right room, if you look at the cross-correlation, you can see that the response in the right rooms is the same. And so they have a peak in the center.
So under the successor representation model, there is actually a change in the cell-cell relationships based on the environment. So that is not quite consistent with the grid cell data. And also, these models don't predict something like a consistent and coherent breaking point between events. So if it's just eigenvectors, and they're diffuse, and different eigenvectors have changes and breaks at different points. There isn't a unique location that defines a fragment.
So onto then the question about neural mechanisms further. So in this segmentation story that I've told you about. it's been largely abstract. I haven't given you a detailed circuit model, but it turns out that it's possible, again, going back to that same model of grid cells, and place cells, and sensory cells-- So that same memory network framework I showed you before.
It turns out that if you have a surprisal signal that inhibits the connections from grid cells to place cells-- Basically it reduces the structured input coming into the place cells, and thereby increases by proportion the relative influence of the sensory cells onto the hippocampal cells, that induces a complete shift in the representation of this network.
So you get this discontinuous shift. So grid cells are doing this integration thing. They're driving smaller shifts and continuous shifts in the representation, but if the grid cell inputs are suppressed by a surprisal signal, then if there is a substantially different observation cue from the sensory side, it will rearrange the representation and produce the dynamics of remapping.
So this mechanistic model of circuit mapping and remapping can actually produce some of the similar dynamics that I showed you more in abstract a few slides earlier. So this is an agent that's moving from room one to room two, and this is the position of the agent as it's moving from one to room two. And this is the location in the abstract neural space.
And this is the representation in room one, and you can see it's smoothly incrementing as the agent's moving through by path integration. It moves through the doorway where the surprisal is high, and each curve here is a different path.
It's a different run of a different agent through the space, and then you see that there's a big jump in the representation space or jumps to somewhere. The green line, for example, comes here. It jumps to here and then increments smoothly from there on. So in other words, it's possible with even a neural circuit to get these consistent, smooth changes within spaces and then big fragmentations across spaces.
So the prediction is that there should be fragmentation decision cells or doorway cells in the brain, and moreover, these cells should drive remapping by suppressing the weights that go from grid cells to the place cell network and to do that across modules. So that's a prediction for neuroscience from this.
Similarly in this model, if the agent goes from room one to room two, and then turns around within room two, and then goes back to room one, these are the points of moving from one to two where there's large surprisal. So once again, we're seeing what we saw earlier, which is some representation continues to increments big transition.
And then the agent turns around, and then it decrements back to the same representation it had at the entrance to the room and then remaps again at the doorway. But it goes back to the original first representation. So this model is pulling up the same map for room one again that it had earlier.
So there are many functional advantages for segmentation in the context of spatial navigation. One thing is that if you're building a map, a local map, of space, and if the space is one large space, then building a map involves computing internal estimates of your position as you move around the space. And because velocity cues are noisy, you accumulate path integration errors.
And so the maps become misaligned, not consistent internally, self-consistent, and that's the whole problem with simultaneous localization mapping. So in fact, if you can just take a space and map local regions of it without making big loops and trajectories that cause path integration errors to accumulate, It's possible to build accurate local maps. And then you can connect them with other accurate local maps.
So that's one of the solutions to the build-up of path integration error in large spaces. And not coincidentally, in the field of simultaneous localization and mapping, they actually-- Segmented slam is a strategy that has been very successful. So these segmentations allow for now that you have this abstraction, which is these chunks or these rooms, now one can imagine planning trajectories at the scale of that abstraction, which is what I'll show you next.
So this is the part where we can say well, now given that we have these submaps that we can generate on the fly as we first explore-- as an agent first explores a room, how can it be used for solving hard problems? So this is a set of simulations where an agent can run through either of these organic environments of the type that I've showed you before. And the triangle and the square represent start and goal locations within the environment.
So you've got a start location, which is always the square, and the goal location, which is the triangle. And what we consider is a planning problem, and the planning problem is that if the agent is started off at the start location and needs to end up at the target location, how to compute a path through the space?
So an efficient planning algorithm that's used in engineering and literature, planning literature, is RRT. So these are random rapid tree search algorithms. And so these algorithms are like our control or our baseline, so these are typically used in high-dimensional state spaces for planning. And so this is if you have just the full global representation of the space, and you do mapping on a global level to do the path planning, then this is the trajectories that are built according to this efficient RRT algorithm.
But if you apply the algorithm to now work on the level of submaps-- So once within a submap, you do the RRT within a submap to find the way to the exit of that submap, and then you plan on the connectivity between submaps. Then these are the trajectories you get. We can do the same thing in 3D realistic environments. Same story here.
And what we find is that planning on global maps can take a lot more time or a lot more steps than planning on these fragmented maps. And if we look at the mean value, like the mean amount of time, number of steps taken-- So this is the distance or the steps needed between the start and the target location, and this is how much time it takes. The distance of the trajectory--
This is the distance between the start and the end locations, and then this is how many steps the planning-- the trajectory is. Then you can see that it's much longer in the global case than in the fragmented case, and it's the same story here.
And in fact, you can see that the difference here is actually, in this case, orders of magnitude. And here, it's already [? platfold. ?] So it's a big advantage to these kinds of planning and submapping. So again, going back to neural circuits, this was, again, abstract, the planning algorithm part.
But back to hippocampal representation, the idea is that the grid cells are constructing locally Euclidean subgroups within small spaces where you can do shortcuts and other things. And then moreover then, maybe CA1 learns the topological connectivity structure between these submaps. And then CA3 place cells can then learn things about the temporal relationships between the paths that you can take between them on the global scale. I'd like to do more global planning at that level.
SPEAKER 3: Can you explain [INAUDIBLE] for this?
ILA FIETE: I guess this is our hypothesis for how grid CA1 and CA3 cells interact. I think this is consistent with all the evidence out there. I wouldn't say that there has been an experiment that directly tests this hypothesis. So yeah.
Let me see. Do I have three minutes, Tommy?
ILA FIETE: Three or four minutes? So in the last three or four minutes, I want to go to the last part, which is the question of so now we've seen how you can construct submaps. What could be the algorithm to generate them? And they produce consistent segmentation of biology. We've also seen how in a single environment, if you construct distinct submaps, it can be useful for planning.
But now I want to make the case that actually, there's a utility to reusing submaps. So we've seen that in some environments like that two room with the hallway environment, the same submap is reused twice in both of those rooms. So what is the utility of that?
So the idea here, the motivation is that if you go to an unfamiliar city, like a new city that you haven't visited before, even though you haven't visited that city-- but many components of the city are familiar pieces to what you've seen before. I shop at Costco. Costco in Toronto will have the same layout as Costco in Boston.
And so I'll go in. I'll know pretty much where to go to find the toilet paper, especially in COVID. And you go to a roundabout, and once you've navigated a roundabout or two, you pretty much know how it goes. You know the structure of the roundabout, and you know how to navigate it. And so this is the idea that even new things are composed of maybe familiar parts that you've seen before.
So similarly, if you want to understand, again, the custodial example, or you want to know something about in a hotel what the rooms look like, you can see one room. And you know pretty much all the other rooms on the floor on the same side. It will look the same.
So the idea then, our hypothesis here, is that humans can use adaptable and compositional strategies for mapping, and they can use these for faster learning and for fast planning. And so the idea is that not only if we learn, say, this environment, then maybe we can also deal with navigating through a version of this environment where we enter it from the side here rather than from down here. Because then it's a rotated version of the environment.
Maybe once I know this environment, I can also very quickly figure out how this layout is because it's just a reflected version. And similarly, I could just consider removing a wall, and have shortcuts, and I can understand how to do things. Or maybe if there's a repeated structure, I could use it.
So this is a hypothesis that humans can very quickly use these submaps and do tasks very rapidly in those. And so this is something that Sue has been testing in human subjects, so I'm going to skip the slide. And so I just want to very quickly just show you the experiments, and I'm going to just stop there because the experiments are so pretty. I can't not show you.
So what Sue does is that she has designed a bunch of environments. So these are initial environments that she runs subjects in just to get subjects familiar with the virtual environment. So these are trajectories of subjects navigating through. This is a top-down view, but really, the subjects have a 3D view from a unity environment.
And so they just go through this just to see what it's like to navigate through the environments. Occasionally, there will be rewards. This black is hardly visible, I'm sorry, but that's a reward over here. And so the subjects can go around, and see what a reward looks like, and go get it. So this is just familiarization.
And now once they've done just that minimal amount of familiarization, subjects are now put into these series of environments, and these are their first trajectories through this environment. I'd just like to point out a couple of them because they're so interesting.
So look at this one. And I should just point out that the reward, for those of you who can't see the black dot. There's a black, which is a reward location, on the left corner here of every other room here.
So what the subject does is they come through here, they look in this room, they see the black, and they get the reward. And then they come in here, look around, and then there's no black dot. And then they come in here, look around, they see the black dot, and they look in here. No black dot.
At this point, they've already learned here that the black dots on the left because they're already looking left on here. And at this point, they've figured out that it's actually in every other room, and so now they start skipping. So it's pretty cool.
First they pass through the environment and similar story here. So they come through here. They look on both sides. There's a black dot only on the right side over here on these right rooms and nothing on the left room. So a subject comes, sees there's nothing in the left room, goes to the right room, gets a dot, goes here, and nothing in the left room. There's something in the right room, and then they just boom, boom, boom, and so on.
So now in collaboration with Aiden and Sue, we've been working on models of this behavior which involves reusing these submaps. So I'm going to just very quickly just show you some of the animation from this model. So this is one of the models that seeks to reproduce the behavior. This is a model that has no spatial submapping structure, and so it just does a naive exploration.
So the black is spaces where the agent can move. Red is obstructed areas where it cannot go, and yellow are the reward locations. So it's agents going through, looking for reward, finds one there.
They missed one of the rewards up because the rewards were always in the top rooms. So that's without this map reuse strategy. And here is now an agent that has the spatial priors as maps that it's constructed. So it found that one reward there, and it very rapidly goes through and gets each one in a very efficient way. So it doesn't even bother with the downstairs maps right now.
So that's some of the preliminary results here. This is human behavior and how long it takes the percentage of observed-- How much of the space you go explore to be able to get all the rewards in the space. So big is bad, and small is good. And so here you can see that human performance is pretty efficient, and this distributional model that also includes the reuse of submaps does pretty well.
So that's it. I will just here finish to say that there are many ways in which we use the world. We use transitions in the world to segment the world, and then we can use these segments to do lots of efficient things like memory and planning.
So thanks so much for your attention, and these are various names of my group members and collaborators who worked on these projects that I told you about. And of course, all the members of the group. So thanks.