Computational models of human social interaction perception

Computational Models of Human Social Interaction Perception

Leader: Leyla Isik

Humans perceive the world in rich social detail. In just a fraction of a second, we not only detect the objects and agents in our environment, but quickly recognize agents interacting with each other. The ability to perceive and understand others’ social interactions develops early in infancy [1] and is shared with other primates [2]. Recently, we identified a region in the human posterior STS that represents both the presence and nature (help vs. hinder) of social interactions [3]. However, the neural computations carried out in this region are still largely unknown. We propose comparing two different types of computational models (feedforward CNNs and generative models) to address this question.

Project 1: Computational models of social interaction detection

Can purely feedforward computational models detect the presence of social interactions and if so, do they do so in a manner that matches neural data? We will test CNNs trained in three different ways: on an object recognition task (Imagenet), on an action recognition task (Moments in Time dataset), and on a social interaction detection task (Moments in Time dataset). For each model, we will evaluate its performance and match to existing MEG data. We hypothesize based on preliminary data [4] that a purely feedforward network will be able to detect the presence of social interactions. We further hypothesize that not only will the network trained on social interactions perform best, but it will also provide the best match to our neural data. This finding would explain why humans and primates need dedicated cortical territory to recognize social interactions, and provide a computational description of the neural representations in the pSTS social interaction region. In particular, these results would show that social interaction detection is well approximated by feedforward computations.

Project 2: Computational models of social interaction understanding

In contrast to detecting a social interaction, we hypothesize that understanding the nature of a social interaction (e.g., helping vs. hindering) will not be well approximated by feedforward models. It seems likely that distinguishing help from hinder will require generative models, that take into account information about agents’ goals, beliefs, and the world around them [5]. This project is joint work with Josh Tenenbaum, Nancy Kanwisher, Tobias Gerstenberg, and Tomer Ullman. We will build on generative models developed in the Tenenbaum lab that take into account information about the physics of the environment and an agent’s actions to make moral judgements [5, 6] (e.g., is an agent putting forth a lot of effort to hinder worse than one putting forth only a little effort?). We will ask if these models predict human moral judgements? This modeling and behavioral work will pave the way for important follow-up neuroimaging experiments with the same stimuli. Our prior fMRI work showed that the pSTS represents helping vs. hindering. Does it make more fine-grained moral judgements, and are the computations carried out in these regions similar to those in our models?

I am requesting funding for two MIT Masters of Engineering students in the IAP and spring terms to carry out these projects. Project 1 will be led by Elizabeth Eastman, a mEng student already working with me. We plan to hire a second mEng student for Project 2. These projects will run between now and August 31, 2018. We expect to make significant progress developing the above models and comparing them to behavioral data and existing neural data in that time.


  1. Hamlin et al., Nature 2007.
  2. Sliwa and Freiwald, Science 2017.
  3. Isik et al., PNAS 2017.
  4. Isik et al., CCN 2018. 5. Ullman et al., 2009. 6. Sosa et al., 2018.
  5. Ullman et al., 2009.
  6. Sosa et al., 2018.