CBMM Weekly Research Meeting: Explaining human-level visual recognition as deep inverse graphics

October 14, 2014 - 4:00 pm to 5:30 pm
Speaker/s: 

Ilker Yildirim and Tejas Kulkarni

Research Thrust: Development of Intelligence, CBMM Thrust 1

Abstract:

In recent years, there has been remarkable progress in the field of Computational Vision due to powerful feed-forward architectures that build classifiers for individual scene elements and learn features automatically from data. However, in comparison to humans, these architectures face a lot of difficulties in the presence of occlusion, pose-light variability, and extreme scale variations. These difficulties support the common intuition that the goal of closing the gap between computational models of vision and human performance will require architectures that go beyond bottom-up computational elements. Building upon this observation, we systematically evaluated computational models of different architectures and humans in a variant of the Visual Turing task for face analysis. We tested people’s invariance to facial identity under light and pose variability in a same/different judgment task. We tested common bottom-up architectures on the same task. We also developed and tested an inverse graphics model of face perception, which integrates a Convolutional Neural Network (CNN) with the top-down generative model. We found that there was a gap of about 20% between people and the best performing feed-forward model. On the other hand, our inverse graphics model achieved human-level recognition performance (both perform at 78%). More importantly, our model accounts for people’s judgments beyond just achieving equal performance: the latent variables inferred by the inverse graphics model capture the variability in subjects’ responses much better than the best performing baseline model. Our experiment suggests that charting people’s performance on such challenging and highly relevant tasks can lead to fruitful combinations of top-down generative models with bottom-up computational pipelines. We hope that such computational and behavioral insights will lead to new ways of investigating the neural bases of generative models of vision in the brain.

Details

Date: 
October 14, 2014
Time: 
4:00 pm to 5:30 pm
Venue: 
MIT: McGovern Institute Seminar Room, 46-3189
Address: 

43 Vassar Street, MIT Bldg 46, Cambridge, MA 02139 United States