Tutorial: Recurrent neural networks for cognitive neuroscience
August 30, 2021
August 30, 2021
Guangyu Robert Yang
Robert Guangyu Yang, MIT
In this hands-on tutorial, we will work together through a number of coding exercises to see how RNNs can be easily used to study cognitive neuroscience questions. We will train and analyze RNNs on various cognitive neuroscience tasks. Familiarity of Python and basic knowledge of Pytorch are assumed.
There will be a 30 minute lecture followed by a workshop utilizing Google's Collab Notebooks. Attendees are free to attend any length of the tutorial. Materials and code to complete the workshop will be made available the day of the event.
Robert (Guangyu) Yang is an Assistant Professor in the MIT Department of Brain and Cognitive Sciences (BCS), with a joint appointment in the EECS Department in the Schwarzman College of Computing (SCC). He received his B.S. in physics from Peking University and his Ph.D. in neuroscience from New York University working with computational neuroscientist Dr. Xiao-Jing Wang. During his Ph.D., he studied how distinct types of inhibitory neurons in the brain can coordinate the information flow across brain areas. In another line of work, he studied how the same artificial neural network can accomplish many cognitive tasks. He was a postdoctoral research scientist in the Center for Theoretical Neuroscience at Columbia University’s Zuckerman Institute. Currently, he studies how artificial neural networks can become more powerful by incorporating neural architectures discovered in the brain.
Exercises with explanations but no answers
Exercises with explanations and answers
GUANGYU ROBERT YANG: So today the idea is we'll go through some tutorials together. So essentially, the way it's going to work is I will talk for 20 minutes. And I'll show you some slides and show you some code.
But then really, the majority of the time would be, hopefully, you would be able to do some exercises. And we have nine exercises. And it's OK if you don't have a lot of experience with Python and PyTorch, because these exercises are separated from easy to hard ones. And we have nine grade TAs would be able to help you with the exercises. And they would be stationed in relevant breakout rooms that you can-- so and you can also talk to other students who are working on the same problem in the same breakout room.
OK, so today, really, I want to show you a practical guide of using recurrent networks to study some cognitive neuroscience questions. And in particular, I want to show you how easy it is to do it, how straightforward. It's, in fact, embarrassingly straightforward, that I think once you know how to do it, it's almost as simple as, like, applying PCA, which is very, very common now.
So but before we go into the technical things, so I just have three slides to talk about some high-level conceptual ideas. By the way, there is a YouTube talk where I go through these slides in more detail. And you can check it out if you want to.
So first is, why do we care about using artificial neural networks? So I mean, of course, if you deep learning, if you're an engineer, there is a good reason, because these networks are powerful. And they can do stuff. But for neuroscientists, cognitive scientists, we really care about understanding brains and minds. So why should we care about these networks?
I mean, one is you can use them for data analysis. That's very, very useful. But for many modelers like us, you want these networks not just as tools, but also really as computational models of brains or different parts of brains. And they're convenient compared to traditional models in several ways. One is that they help you model complex behavior more easily. So we'll see some examples.
In the past, if you have a complex behavior that you want to model, then you have to build and design a very clever model to do it, whereas here, you can let-- do some machine learning methods to train a network. The machine learning methods in itself may not be realistic, biologically speaking. But the end result, like, the network could get after you train the network could conceivably be relevant to the brain. There is no guarantee, but it could be.
And these neural networks are very complex. And they can be very complex. And that can be a downside. They're hard to interpret. But on the upside, they help us better understand why the brain has complex neural activity. If you record from prefrontal areas, they're pretty complex.
And finally, they give a way to think about problems from an optimization perspective. So instead of-- just besides thinking of how a network is doing a task or what is the network doing, we can think about why the network is doing something the way it is, because in a neural network, often, you don't build in the solution. You design the architecture. You design the optimization method. And you design the objective.
And so some people call it the deep learning perspective. It's also very similar to the evolutionary perspective, which is in biology. And it's just a different way of thinking about the problem. And it can be satisfying.
OK, so in particular, so here, today, I will talk about recurrent neural network for cognitive neuroscience. And this really has a very long tradition. So it's dated at least back to the '80s.
But there is this iconic paper from Mante, Sussillo that I recommend you to check out if you're interested. If you haven't read it and if you're interested in this business, really, this is probably the most relevant paper. And you can see that these works usually have three components where you define a problem, a learning problem. It can be a cognitive task.
It's usually a cognitive task. And then you define an architecture, which is usually a recurrent neural network. So these are rate-based networks that are initially randomly connected.
And then, you would train them with some machine learning techniques, some stochastic gradient descent. But really, you can do whatever you want. But it's often very convenient to do some stochastic gradient descent based method to train these networks, which means to adjust the connection width. And after training, they can do the task.
And then you can do your favorite data analysis that you would do on actual brain data. So you can apply it to the network. And then you can do side-by-side comparison between the data and network.
So today what we will do is I will first show you some code where we would define a network, define a learning problem, and train it with some gradient-based method. And then we will do PCA. So hopefully I will do all that in the next 15 minutes or so. And so you would see that it's not, like, crazy to do it.
And this is really a classical paradigm, where you have a single task, single RNN, back propagation. Some of the exercises would extend beyond it. I have this talk where all I talk about is how do you extend beyond it. But today, we will start with this classical paradigm.
And of course, many other people have worked on this. So we have done-- together with Francis Song and Xiajoing Wang, we have done a little bit of work in this area as well. So we essentially played that same game, but for a different task, so a perceptual decision-making task.
You can train a network on this task and then compare-- do some analysis and compare it with the analysis that people did on actual data. And then you can do that for other tasks as well. So today we will actually do-- yeah, so we will do this perceptual decision-making task, or very stylized simplification of it.
But it's nowhere near the complexity that animals actually experience. They are not just doing the task. They also are receiving high dimensional input about the world and the room. But we have to start somewhere.
So like I mentioned earlier, I have the talk. Also I wrote a primer with Xiaojing that you can check out if you're interested in more technical details. Lots of people have written about using deep networks to help model the brain. So you can check these out as well, which are interesting.
OK, now let's-- for the next 15 minutes, I will show you essentially this code, just give you a quick tour so you can get started with the exercise more easily. We will provide a link. Maybe someone can put the link of the Collab notebook the GitHub IPython notebook in the chat so everyone can see. So first, while you install some packages and some just common packages-- by the way, you can run this notebook and Google Collab. You don't have to install anything locally.
First let's define a recurrent neural network. So if you learn machine learning in, like, a class, so they would teach you things like LMN recurrent network and LSTM. LSTM is most commonly used. And we're not going to worry about the internal mechanism of LSTM.
Usually, really, the most important thing in many of these work is to know the shape of your input and output. And then the internal mechanism, of course you want to understand it. But the first thing is to know the shape.
So what is the shape that goes into a neural network? So here we just define an LSTM. It's a one liner in PyTorch. And the shape is this. It's essentially a tensor that the first dimension is sequence length, so which is, for example, we show a sequence of inputs. It's like a time series of inputs.
This, you can think of as different time points in the trial. And so this is a sequence length. There are multiple trials. It doesn't have to be one trial. So sequence length-- and then you have batch size, which usually, you can show the network, essentially multiple examples or multiple trials simultaneously.
So you can think of it as the network just simultaneously independently processing all these different examples. And you can set batch size to 1, if you want. Then you have kind of a more classical way to do it in, like, neuroscience. And then input size is, you can think of it as the number of neurons that represent the input, or just more abstractly, the dimension of the input. So it's a three tensor, sequence size, batch size, input size.
And then if we run this input through the network, it will generate some output. And you can see, if we run this cell, you can see that the output shape is also sequence length, batch size. But I transformed the input size to a hidden size. And this hidden size is what we set here.
This is essentially the number of neurons in the LSTM. And you can see this is the output. This is the network activity of the LSTM. And it transformed a sequence of input to a sequence of output.
So many recurrent networks can be thought of this way. And sequence to sequence, there are many, many one liners. You can generate LSTM this way. This is an LMN, RNN, Gated Recurrent Unit, GRU, or even multiheaded tension that is used in transformers.
And they are very popular in natural language processing. It's sequence to sequence. So they all generate output that is sequence length, batch size, and some dimension.
But today we will use not these recurrent neural networks that use the machine learning. We will use something called a continuous time recurrent neural networks, which is essentially kind of the simplest network, but in continuous time. So the way it works is you first define a kind of a continuous time dynamical system this way.
So essentially, this is how the activity of a bunch of neurons change over time. Well, if you do nothing, it will decay back to zero. That's what this means. And but it's also driven by the recurrent connectivity and also the external input. And then there is some bias that, you can think of it as threshold. And then there is the nonlinearity that allow the network to do more complicated things, which here, we will just use ReLU-- Rectified Linear Unit.
And of course, when you do it in a-- well, a simple way to simulate this continuous time network in our computers is to discretize this network in time using Euler method. So essentially, you set a time step delta t. And then you can just, like, think of it as writing this as, instead of dr dt, it's delta r delta t. And then you multiply two sides by delta t. And then you divide it by tau. Then you get your delta r.
So well, any case, this is-- after doing that, this is what you get. This is the activity at the next time step. It should follow this. And then you can do some simple transformation. And you get this.
So here is a code for this network. So the important thing is this recurrence function, which is where you run the network for one time step-- and on your own time, you can check that these two lines are exactly this line. So there is a one-to-one mapping here. And this recurrent network is used the same way as these networks. So here is an example.
Well, actually, you can also composite together at an output layer. So here, I have a bigger network that takes this continuous time recurrent network and then add a fully connected output layer. So then the output of the recurrent network would get processed by this fully connected layer.
And then you generate an output. And then you can call this whole network. So you set the input size, you set the hidden size, output size. And then you call it. And then you see that it transforms an input that is shaped sequence size, batch size, input dimension to sequence length, batch size, output dimension neurons here, which is 10, as we said.
So this is a network that we will use. And you can customize it. So one good thing about writing this network out explicitly like this is that you can customize it more easily. We will see some examples.
So next, we need to define a cognitive task. And because today the main point is about training these networks, I won't go into details about the cognitive task. So I have contributed to this little package called neurogym where you can just get some like predesigned tasks or environment.
And so here, we get this task called PerceptualDecisionMaking-v0. And you can set some parameters. Like, the discretization time step should be 20 milliseconds.
And the stimulus period should be 1,000 milliseconds. And then you can make the environment using OpenAI Gym. And this is all boilerplate. So you don't have to remember it. And if you're interested, this is, like, the actual task just written out explicitly.
So now let's visualize the environment so we better understand what it's doing. Again, this is a very simplified representation of perceptual decision making. So what it does is it-- so x-axis here is time. And here, this is showing the input for two trials.
So in the first trial, we essentially have three input neurons. And the first input neuron represents fixation. So it tells you, when it's up, it tells you that you should fixate. And then when it's off, it tells you that you should make a decision.
And then there are two stimuli represented by two stimulus neurons. And so one of them essentially is higher on average. But as you can see, it can be pretty hard to tell, because there is a lot of noise. So it essentially forces the network to integrate information over time, because you cannot really make a robust decision based on a single timestep.
So you have these two stimulus neurons. And you have to determine which one is higher. And for example, let's say in the second trial, we can see that, OK, this one is higher. So and this is the action that the network should be taking. And you can ignore the blue one, because it's just a random agent choosing things.
So the green is the ground truth. So what you can see is that the ground truth-- and this is a target that we will use to train the network. The ground truth for the most of the trial is just, you should be fixating, because fixation input is up.
And then you should be fixating. And then when the fixation input is off, you should choose one of the two choices. And here in this case, you should choose this one, which corresponds to this stimulus. It's a very, very simple task, but it does force the network to integrate. And you can do more complicated tasks, if you want.
Now we have this task. Let's convert it to a supervised learning problem. A supervised learning problem is where you have input and target output. And the network should try to generate output, given an input that is similar to the target output.
And you can also train this task with reinforcement learning, which is where you get a reward if you're doing something right. But we're not going to talk about that today. And it's substantially more complicated.
So for supervised learning, so this is, again, boilerplate. So you don't have to worry about it. You can just copy it.
And we get a data set object. And the way you use it is as you just like call it like this. And then every time you call it, it will generate a new set of input and target.
And remember that the most important thing is shape. So I keep like printing out shape. So the shape of the input is sequence length, batch, and dimension.
So this is very familiar. This is what we have been seeing. It's what the network likes to take.
And then the target here is sequence length and batch. There is no more dimension. The reason is these are all integers.
So this is an-- so example like sequence of targets. So you can see, it's all zero, which correspond to the fixation. And then it's one, which corresponds to the choice, one of the choices.
So you see that-- so this is a target that we're going to give it to a network. So now we have a network. We have a task. And we just need to train it.
Now, if you have done any PyTorch, for example, like, tutorial, you would recognize the following code. This is essentially the same code that you would use to train [INAUDIBLE] or whatever for supervised learning. So here, we just have a little train_model file-- a train_model function that takes a network and takes a data set. So the network, we initialize it here. So it has 128 neurons in it.
And then we will use this Adam optimizer which is, like, a powerful variant of stochastic gradient descent. And we will use a cross entropy loss which it really just is about to make the target similar-- the output similar to the target. And now we will train the network for 1,000 steps.
And well, you need to do some boilerplate code where you convert the output of this data set which is a NumPy array. So you convert it to PyTorch. And then this is all boilerplate PyTorch.
So you have your optimizer. You have to set the gradient to 0. Otherwise, it keeps accumulating the gradient. And then you run your input through the network. And you generate some output.
Here you do have to reshape it, because otherwise, it won't fit into your loss function. So you reshape it. And then you just give it, give the output and the target to your loss function. You generate the loss. And then this step would essentially compute the gradient through back propagation. And then you just take a step, optimize your step, all very standard stuff.
So now you have trained the network. And now here is the remarkable thing. It's really fast. Like 20 seconds, and you're pretty much done. I mean, you can keep training it, but this is, as we all see next, this is actually pretty good.
So you will train it. But this is telling us loss. This is not performance. We want to know the performance. So here, I have some code that we can use.
What it does, is essentially, it will compute the performance. So again, this is a binary decision task. So the chance level is 50%. And the network is doing 86%, which is respectable, especially given that the only trained it for 20 seconds.
So essentially the way it works here-- this is, again, a lot of biolerplate that I won't go into details. But you just like go through the trial one by one. And then you feed each trial to the network.
You get the action. And then you compare it with the ground truth. You compare it with the ground truth here. And then you get whether or not it's correct.
You can also get the activity this way. So we can plot the activity next. So in the last three minutes, I'll just show you how to plot the activity, some code for it. I mean, again, you don't have to understand every line, because a lot of this is just, like, boring kind of reshaping things or like putting everything so that they have the-- putting things together so that the right shape.
But really, the important thing is we're just calling from scikit-learn, which is an incredibly useful package, PCA. And then we have put our activity together. And we just fit our activity using PCA. And boom, that's it.
Then before PCA, we have all the time points and neurons. And then we essentially map from high-dimensional neural activity space to some low-dimensional PC space which is two dimensional. So we can go from this high-dimensional space to this two-dimensional space. Then we can plot it.
So but we won't plot all the time points together, because that's going to be hard to see. So what we will do here is instead plot individual trials. So essentially, we will take each trial, the activity of each trial, transform it with a PC projection that we just computed, and then just plot it. And we will plot it according to the ground truth. So you have some that are red, some that are blue.
And this is what you see in this PC space. So this is, like, 100 trials. This is just three example trials. So what you can see here-- I mean, again, if you try to play with this yourself, you can see that not every network gives you the exact same PC.
I mean, actually, there are some commonalities and some differences. But one thing that is very common is that the blue and the red are separated in this PC space. So if the ground truth is that you should choose left, then they tend to occupy this part of the space. And if the ground truth is right, they tend to occupy this part of the space. So black is where they will start.
And so if you want to look at it more carefully, you can look at how individual time periods is represented in this space. But I'm just going to stop here. And what I want to show you, really, I want you to get away today is that-- so you look at this paper, I mean, big Nature paper. I mean, really, I mean, they did a lot of hard work, because this is a very hard thing for a monkey to learn.
And they designed some clever analysis. But right now the tools are so simple, straightforward that we can really reproduce a large part of this. I mean, it's not the same task, but I also have the same task coded up in neurogym.
You can reproduce a large part of the results in just like a few minutes of training, and like, a couple lines of code, maybe like 300, 400 lines of code. So yeah, so today hopefully you would get some hands-on experience and build some confidence that this is really approachable. So we have nine exercises. Again, you should pick ones that you think is reasonable. You can start with simple ones and move on to difficult ones.
So the way that I hope that this would work is that we spend one hour about coding. And people just go into different rooms and work on the problems at their own pace. And then for the last 30 minutes of this workshop, hopefully we can get some student volunteers who would like, each would spend a few minutes, talk about the main ideas, how did they solve each exercise.
And if people just end up being too shy, we can have the TAs, because we have one TA for each exercise. So we can have the TAs show us their solution. So we'll spend on the last 30 minutes on that. So with that, I'll stop.