Modeling Human Goal Inference as Inverse Planning in Real Scenes

Modeling Human Goal Inference as Inverse Planning in Real Scenes

one person helping another reach a block

Project Leaders: Tao Gao & Marta Kryven

A major challenge of current AI system is correctly interpreting human behaviour and interacting with people natural settings, in ways that people consider intuitive, productive and rewarding. Unlike AI, young children effortlessly engage in productive social interactions. By 9-month old infants can infer the goals (Woodward, 1998) and interpret actions of other agents as driven by rational intentions (Gergely et al., Cognition, 1995), and at 14 month of age children can recognize that someone is trying to obtain an out-of-reach object, pushing the object toward them (Warneken & Tomasello, Infancy, 2007).  Such basic goal inference is far beyond what current AI can do, because AI lacks the commonsense psychology, or `Theory of Mind' (Premack & Woodruff, Behavioral and Brain Sciences, 1978).

We formalize `social common sense' by a planning engine in a general probabilistic framework. The inputs of the planning engine are probabilistic representations of mental states (e.g. belief, intention, and utilities), and the output is a probabilistic planing policy that translates to a series of actions. Running the planning model forward, AI can generate rational actions and plans; by running the model backwards, AI can infer mental states of another agents, guess the input that could have generated the observed plans, and anticipate what humans will do in the future. Using a combination of forward planning and backward inference, an AI agent can work with a human operator as a team.

As the first step, we need do develop an inverse-planning model that correctly predicts human goal inference under various conditions. We develop this model using continuous goal inference scenarios, in which human actors reach for target objects among many distractors. Human observers, who view such videos, usually need little evidence to identify the actor's goal. Across scenarios we manipulate the physical structure of the scene, the positions of human actors and the presence of various obstacles that influence action planning. We incorporate sophisticated cognitive models of human action planning and navigation, inverse Bayesian inference, and Theory Of Mind, in order to make predictions that match inferences of human observers quickly, and with very little training.

Our project has four components:  (a) a computer vision system that models the actors action by a 3D space; (b) a psychophysical experiment goal inferences made by human observers; (c) a planning engine that generates human reaching actions in a 3D model; and (d) a Bayesian inverse planning model that infers human goals based on the planning engine.