Aligning deep networks with human vision will require novel neural architectures, data diets and training algorithms

Yes

Aligning deep networks with human vision will require novel neural architectures, data diets and training algorithms

Date Posted: February 24, 2025

Date Recorded: February 11, 2025

Speaker(s): Thomas Serre, Brown University

All Captioned Videos
Brains, Minds and Machines Seminar Series

Associated CBMM Pages:

Quest | CBMM Seminar Series - Aligning deep networks with human vision will require novel neural architectures, data diets and training algorithms

Description:

Abstract: Recent advances in artificial intelligence have been mainly driven by the rapid scaling of deep neural networks (DNNs), which now contain unprecedented numbers of learnable parameters and are trained on massive datasets, covering large portions of the internet. This scaling has enabled DNNs to develop visual competencies that approach human levels. However, even the most sophisticated DNNs still exhibit strange, inscrutable failures that diverge markedly from human-like behavior—a misalignment that seems to worsen as models grow in scale.

In this talk, I will discuss recent work from our group addressing this misalignment via the development of DNNs that mimic human perception by incorporating computational, algorithmic, and representational principles fundamental to natural intelligence. First, I will review our ongoing efforts in characterizing human visual strategies in image categorization tasks and contrasting these strategies with modern deep nets. I will present initial results suggesting we must explore novel data regimens and training algorithms for deep nets to learn more human-like visual representations. Second, I will show results suggesting that neural architectures inspired by cortex-like recurrent neural circuits offer a compelling alternative to the prevailing transformers, particularly for tasks requiring visual reasoning beyond simple categorization.

Search form

You are here

Video

Yes

Aligning deep networks with human vision will require novel neural architectures, data diets and training algorithms