Aran Nayebi, ICoN Postdoctoral Fellow at MIT
Deep neural networks trained on high-variation tasks ("goals”) have had immense success as predictive models of the human and non-human primate visual pathways. More specifically, a positive relationship has been observed between model performance on ImageNet categorization and neural predictivity. Past a point, however, improved categorization performance on ImageNet does not yield improved neural predictivity, even between very different architectures. In this talk, I will present two case studies in both rodents and primates, that demonstrate a more general correspondence between self-supervised learning of visual representations relevant to high-dimensional embodied control and increased gains in neural predictivity.
In the first study, we develop the (currently) most precise model of the mouse visual system, and show that self-supervised, contrastive algorithms outperform supervised approaches in capturing neural response variance across visual areas. By “implanting” these visual networks into a biomechanically-realistic rodent body to navigate to rewards in a novel maze environment, we observe that the artificial rodent with a contrastively-optimized visual system is able to obtain more reward across episodes compared to its supervised counterpart. The second case study examines mental simulations in primates, wherein we show that self-supervised video foundation models that predict the future state of their environment in latent spaces that can support a wide range of sensorimotor tasks, align most closely with human error patterns and macaque frontal cortex neural dynamics. Taken together, our findings suggest that self-supervised learning of visual representations that are reusable for downstream Embodied AI tasks may be a promising way forward to study the evolutionary constraints of neural circuits in multiple species.