The neural architecture of language: Integrative modeling converges on predictive processing [video]
October 25, 2021
October 25, 2021
All Captioned Videos CBMM Research
Graduate student Martin Schimpf (MIT/CBMM) and Prof. Evelina Fedorenko (MIT) discuss their latest publication in PNAS on predictive computational modeling and neural processing comparisons.
[MUSIC PLAYING] MARTIN SCHRIMPF: My name is Martin Schrimpf. I'm grad student in Jim DiCarlo's lab and the MIT Brain and Cognitive Science Department. And in Jim's lab, we mainly focus on modeling the parameters of ventral stream, which is part of our brains that does optical conversion. And we've developed some tools and methods that we thought would go beyond just vision.
So in this work, we try to take those tools and push them to something that is maybe a bit more cognitive. And namely that is human language processing. And vision and sensory areas, more broadly, we've recently had a lot of success using deep neural networks. So that is networks that are trained to use with that progression, has many layers that transform d in this, case pixels-- into some representations that it can then perform optical conditions out of. Those deep networks are able to predict neural presentations as well as behavioral output to a reasonable first extent.
EVELINA FEDORENKO: Yeah. So I guess this was now about three years ago when Martin and Josh Tenenbaum have started discussing the idea of applying the kinds of approaches that have been highly successful envisioned to the domain of language, and I thought it was absolutely crazy. I was certain that this would not work.
But on the other hand, I was also kind of getting stuck in my research, because it was unclear to me how to go beyond the descriptive verbal hypotheses of what these brain regions that we've been studying are doing. And of course, ultimately, I want to understand what the representations are that they build. As we process input, what are the computations that they perform?
MARTIN SCHRIMPF: So in any kind of modeling the brain work I think there's always two sides. On the one side you have the models. In this case, these are transformer-based models, such as GPT and BERT, as well as LST and Glover from the natural language processing community. So basically models for machine learning, or AI more broadly.
Now on the other side there are experimental data brain recordings. So in this case, fMRI and ECoG recordings as subjects listen to sentences and stories. So what we then do is essentially we run the same experiment on the model by presenting the same kinds of stimuli-- so sentences and stories-- to the models and then capture internally what kinds of representations that they build.
And then we test how similar those representations in turn to the model are to the neural recordings that we obtained experimentally. That gives us a score that tells us how close are the models to predicting those data. And in this case, what we found is that, remarkably, the model is actually pretty close to the experimental noise setting in a lot of the data sets that we tested.
EVELINA FEDORENKO: In contrast to earlier attempts to model human language processing, which has been typically to take a single model and to see how well it captures behavioral or neural responses, here, Martin took a whole range of high-performing language models and looked for trends, looking at why is it that some models may perform better than others in capturing human neural responses.
And a critical result that came out of this is that models that do better in their ability to predict an upcoming word in a text, which is a critical training objective for these models, also best capture human neural data. And it's not just the case that bigger models do better on this test, because you can take a random word embedding of the same dimensionality, and that does quite poorly suggesting that it's something about the architecture the structure of the model.
And it's not just any language test that predicts the fit to human neural data. For example, judging how grammatically we'll form the sentences does not lead to better fit to human neural data. So that suggests that perhaps optimizing for predictive nonlinguistic representations is a shared objective of both biological and artificial language models.
MARTIN SCHRIMPF: One thing I found fascinating with this work and the results we have is that we really discovered a lot of relationships that I find quite crucial. So one models that are better predicting than experts are also better able to predict neural responses in human brains. But then the models that are able to predict neural responses are also better models of behavior, in this case, self-paced reading times. And then again, models that are better at predicting the next word also better predict human reading time.
So really just to sort of bring these three things together a normative task, neurons, as well as behavior, because really we don't study neurons for the sake of neurons but rather because this behavior is this approach. Given, this a fairly simple behavior, but going forward, we were planning to take a lot of different neural data sets as well as behavioral data sets from labs worldwide, put them together on a platform-- like we already have Brain Score for Vision maybe there's going to be a Brain Score for language.
And then make this accessible to the community and work with the community to really concentrate and guide the next generation of models that can be built for human language processing.
EVELINA FEDORENKO: There is a bunch of exciting directions going forward. So one important thing is to try to make our data better. Even though you'll hear many people say these days that we're drowning in data, the kinds of data that you need for building these precise models of what's happening, as you're processing a particular visual stimulus or a particular sentence, are very different kinds of data than what we've been collecting for the last couple of decades.
And so we're pushing on this front by collecting the highest quality data using intracranial recordings and selecting sentences in ways that would allow us to best discriminate among different models or among layers of a particular model, and so on.
Another important direction is to try to understand why untrained models do so well in capturing human neural responses. This is a surprising finding, but there's now some similar findings that have been observed in vision. So something about the information flow through these units is already well suited for capturing something about how our brains process language.
So one possibility is to now create minimal comparisons of models varying in particular aspects of architecture to see which features, which architectural motifs, are critical for leading to this good productivity with human neural data. And finally, the ultimate goal is, of course, to build integrated models of the human brain.
And for language specifically, the downstream target of the language system is presumably the system that supports structured thought. And just like there has been advances in natural language processing and developing these very successful language models, there is parallel ongoing efforts to develop models of structured thought-- reasoning, common sense reasoning, or reasoning about logical inferences and so on.
Now that we've made some progress in relating language models to human neural responses, hopefully, we can do something like that for models of structured thought and then understand how human language understanding happened in this complexity.
MARTIN SCHRIMPF: So while this now lays a pretty critical foundation of having some first models that predict even any kind of variance in the data and, in this case, even a remarkable amount, I don't think this means that we're anything close to being done. It just means that we now are encouraged to collect even better data or even more constrained data that then pushes the models to become even closer matches to the brain's language system.
Associated Research Module: