Human visual experience not only contains physical properties such as shape and object, but also includes a rich understanding of others' mental states, including intention, belief, and desire. This ability is highlighted as "Theory of Mind (ToM)" in social and developmental psychology. In my research, I synthesize human psychophysics with a computational model that implements ToM with advanced algorithms and techniques from computer vision, robotics, and artificial intelligence. Intentions are represented by a Hierarchical Task Network, which is a type of temporal parse graph describing the hierarchical structure of actions and plans. Human beliefs are represented as a spatial parse tree characterizing the hierarchical structure of all objects, actions and their physical relations. By observing human actions and the 3D world, the model infers intentions and beliefs by reverse-engineering the decision making and action planing processes in a human’s mind under a Bayesian framework. This cognitively motivated model illustrates (a) how to achieve human-like explanations and predictions of social scenes; (b) how to recognize flexible human actions with little training; and (c) how to produce semantic representations of a social scene that connects vision with other domain-general processes, including language and common-sense reasoning.
Social cognition is at the core of human intelligence. It is through social interactions that we learn. We believe that social interactions drove much of the evolution of the human brain. Indeed, the neural machinery of social cognition comprises a substantial proportion of the brain. The greatest feats of the human intellect are often the product not of individual brains, but people working together in social groups. Thus, intelligence simply cannot be understood without understanding social cognition. Yet we have no theory or even taxonomy of social intelligence, and little understanding of the underlying brain mechanisms, their development, or the computations they perform. Here we bring developmental, computational, and cognitive neuroscience approaches to bear on a newly tractable component of social intelligence: nonverbal social perception (NVSP), which is the ability to discern rich multidimensional social information from a few seconds of a silent video.