Using Multimodal DNNs to Study Vision-Language Integration in the Brain

TitleUsing Multimodal DNNs to Study Vision-Language Integration in the Brain
Publication TypeConference Paper
Year of Publication2023
AuthorsSubramaniam, V, Conwell, C, Wang, C, Kreiman, G, Katz, B, Cases, I, Barbu, A
Conference NameICLR 2023
Date Published03/2023

We leverage a large stereoelectroencephalography (SEEG) dataset consisting of neural recordings during movie viewing and a battery of unimodal and multimodal deep neural network models (SBERT, BEIT, SIMCLR, CLIP, SLIP) to identify candidate sites of multimodal integration in the human brain. Our data-driven method involves three steps: first, we parse the neural data into discrete, distinct event-structures, i.e., image-text pairs defined either by word onset times or visual scene cuts. We then use the activity generated by these event-structures in our candidate models to predict the activity generated in the brain. Finally, using contrasts between models with or without multimodal learning signals, we isolate those neural arrays driven more by multimodal representations than by unimodal representations. Using this method, we identify a sizable set of candidate neural sites that our model predictions suggest are shaped by multimodality (from 3\%-29\%, depending on increasingly conservative statistical inclusion criteria). We note a meaningful cluster of these multimodal electrodes in and around the temporoparietal junction, long theorized to be a hub of multimodal integration.


Associated Module: 

CBMM Relationship: 

  • CBMM Funded