Neurally-plausible mental-state recognition from observable actions
Project Leader: Julian Jara-Ettinger
People’s ability to understand other people’s actions, predict what they’ll do next, and determine if they are friends or foes, relies on our ability to infer their mental states. This capacity, known as Theory of Mind, is a hallmark of human cognition: it is uniquely human, it involves reasoning about agents interacting with objects in space, and it requires interpreting observable data (namely, people’s actions) by appealing to unobservable concepts (beliefs, and desires). As part of CBMM, I have developed computational models that capture with quantitative accuracy how we infer other people’s competence and motivation, how we determine who is knowledgeable and who is ignorant, how we predict what people might do next, and how we decide if they’re nice or mean . While these models mimic the computations behind mental state reasoning, their implementations rely on biologically implausible architectures and have a high computational overhead due to the sampling-based approach to inference. This project consists of two parallel studies. The first focuses on studying how visual search supports rapid goal recognition. The second combines probabilistic programming with deep neural networks  to develop faster Theory of Mind models that can be used to generate testable neural hypotheses about how Theory of Mind is implemented in the brain.
A long history of empirical and computational research shows that the expectation that agents navigate efficiently in space is a building block in action understanding [3-4]. Computational models determine efficiency through model- based reinforcement learning planners that operate over an abstract representation of the environment. An alternative, however, is that predictions through an assumption of efficient action can be directly computed through saccades. In everyday simple events, such as grasping and reaching, people’s saccades may be instrumental to determining what an agent is reaching towards, rather than the consequence of top-down goal attribution. A set of simple grasping and reaching events will be recorded with enhanced information using PoseNet  and an object tracker to record arm, hand, and finger locations in space. We will vary whether the target objects are directly within arm’s reach or not, and whether they are within a straight projection of the hand movement or not (due to obstacles). Participants’ eye movements will be tracked in a goal-attribution task and their saccades will be compared to a top-down computational model of goal attribution where saccades are the output of reward inference, and a bottom-up computational model where saccades are used to project where an agent is moving towards and generate hypotheses about the agent’s rewards.
Next, in close collaboration with Ilker Yildrim (CBMM; Cocosci), we will use the generative models in state- of-the-art Theory of Mind models to synthesize training data to invert action planning through a deep neural network. A similar approach was recently explored with promising initial results  but with poor generalizability across environments. We diverge from the classical approach in three ways. First, rather than attempting end-to-end POMDP inversion, we will train inversion separately for different components of the planning process, in the style of . Second, we will enhance our approach with a second net trained to recover parameters that determine a sampling distribution over beliefs and reward functions (i.e. producing something akin to a salience map for high-probability locations where an agent’s rewards may be and types of assumptions the agent may be acting under). Finally, we will supplement our inference through a final network that implements a forward planner which can be used to revise and refine inferences. Together, this approach will enable us to test for the biological plausibility of an inverse planning account to Theory of Mind and to testable neural hypotheses for future research.
 Jara-Ettinger, J., Schulz, L. E., Tenenbaum, J. B. (in review). The naïve utility calculus supports quantitative producitivity, flexibility, and explanatory depth in commonsense psychology.
 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
 Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naıve theory of rational action. Trends in cognitive sciences, 7(7), 287-292.
 Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. (2016). The naïve utility calculus: Computational principles underlying commonsense psychology. Trends in cognitive sciences, 20(8), 589-604.
 Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., & Murphy, K. (2017, January). Towards accurate multi-person pose estimation in the wild. In CVPR (Vol. 3, No. 4, p. 6).
 Rabinowitz, N. C., Perbet, F., Song, H. F., Zhang, C., Eslami, S. M., & Botvinick, M. (2018). Machine Theory of Mind. arXiv preprint arXiv:1802.07740.
 Yildirim, I., Freiwald, W., & Tenenbaum, J. (2018). Efficient inverse graphics in biological face processing. bioRxiv, 282798.