%0 Conference Paper %B Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI) %D 2023 %T Zero-shot linear combinations of grounded social interactions with Linear Social MDPs %A Ravi Tejwani %A Yen-Ling Kuo %A Tianmin Shu %A Bennett Stankovits %A Dan Gutfreund %A Joshua B. Tenenbaum %A Boris Katz %A Andrei Barbu %X

Humans and animals engage in rich social interactions. It is often theorized that a relatively small number of basic social interactions give rise to the full range of behavior observed. But no computational theory explaining how social interactions combine together has been proposed before. We do so here. We take a model, the Social MDP, which is able to express a range of social interactions, and extend it to represent linear combinations of social interactions. Practically for robotics applications, such models are now able to not just express that an agent should help another agent, but to express goal-centric social interactions. Perhaps an agent is helping someone get dressed, but preventing them from falling, and is happy to exchange stories in the meantime. How an agent responds socially, should depend on what it thinks the other agent is doing at that point in time. To encode this notion, we take linear combinations of social interactions as defined in Social MDPs, and compute the weights on those combinations on the fly depending on the estimated goals of other agents. This new model, the Linear Social MDP, enables zero-shot reasoning about complex social interactions, provides a mathematical basis for the long-standing intuition that social interactions should compose, and leads to interesting new behaviors that we validate using human observers. Complex social interactions are part of the future of intelligent agents, and having principled mathematical models built on a foundation like MDPs will make it possible to bring social interactions to every robotic application.

%B Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI) %8 02/2023 %G eng %0 Generic %D 2022 %T Incorporating Rich Social Interactions Into MDPs %A Ravi Tejwani %A Yen-Ling Kuo %A Tianmin Shu %A Bennett Stankovits %A Dan Gutfreund %A Joshua B. Tenenbaum %A Boris Katz %A Andrei Barbu %X

Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microso- ciology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other’s hidden rewards. This extended Social MDP allows us to encode the five basic interactions that underlie microsociology: cooperation, conflict, coercion, competition, and exchange. The result is a robotic agent capable of executing social interactions zero-shot in new environments; like humans it can engage socially in novel ways even without a single example of that social interaction. Moreover, the judgments of these Social MDPs align closely with those of humans when considering which social interaction is taking place in an environment. This method both sheds light on the nature of social interactions, by providing concrete mathematical definitions, and brings rich social interactions into a mathematical framework that has proven to be natural for robotics, MDPs.

%2

https://hdl.handle.net/1721.1/141363

%0 Generic %D 2021 %T AGENT: A Benchmark for Core Psychological Reasoning %A Tianmin Shu %A Abhishek Bhandwaldar %A Chuang Gan %A Kevin A Smith %A Shari Liu %A Dan Gutfreund %A Elizabeth S Spelke %A Joshua B. Tenenbaum %A Tomer D. Ullman %B Proceedings of the 38th International Conference on Machine Learning %8 07/2021 %0 Journal Article %J arXiv %D 2020 %T ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation %A Chuang Gen %A Jeremy Schwartz %A Seth Alter %A Martin Schrimpf %A James Traer %A Julian De Freitas %A Jonas Kubilius %A Abhishek Bhandwaldar %A Nick Haber %A Megumi Sano %A Kuno Kim %A Elias Wang %A Damian Mrowca %A Michael Lingelbach %A Aidan Curtis %A Kevin Feigleis %A Daniel Bear %A Dan Gutfreund %A David Cox %A James J. DiCarlo %A Josh H. McDermott %A Joshua B. Tenenbaum %A Daniel L K Yamins %X

We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. With TDW, users can simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments. TDW has several unique properties: 1) realtime near photo-realistic image rendering quality; 2) a library of objects and environments with materials for high-quality rendering, and routines enabling user customization of the asset library; 3) generative procedures for efficiently building classes of new environments 4) high-fidelity audio rendering; 5) believable and realistic physical interactions for a wide variety of material types, including cloths, liquid, and deformable objects; 6) a range of "avatar" types that serve as embodiments of AI agents, with the option for user avatar customization; and 7) support for human interactions with VR devices. TDW also provides a rich API enabling multiple agents to interact within a simulation and return a range of sensor and physics data representing the state of the world. We present initial experiments enabled by the platform around emerging research directions in computer vision, machine learning, and cognitive science, including multi-modal physical scene understanding, multi-agent interactions, models that "learn like a child", and attention studies in humans and neural networks. The simulation platform will be made publicly available.

%B arXiv %8 07/2020 %G eng %U https://arxiv.org/abs/2007.04954 %9 Preprint %0 Generic %D 2020 %T ThreeDWorld (TDW): A High-Fidelity, Multi-Modal Platform for Interactive Physical Simulation %A Jeremy Schwartz %A Seth Alter %A James J. DiCarlo %A Josh H. McDermott %A Joshua B. Tenenbaum %A Daniel L K Yamins %A Dan Gutfreund %A Chuang Gan %A James Traer %A Jonas Kubilius %A Martin Schrimpf %A Abhishek Bhandwaldar %A Julian De Freitas %A Damian Mrowca %A Michael Lingelbach %A Megumi Sano %A Daniel Bear %A Kuno Kim %A Nick Haber %A Chaofei Fan %X

TDW is a 3D virtual world simulation platform, utilizing state-of-the-art video game engine technology

A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.

Researchers write Controllers that send commands to the Build, which executes those commands and returns a broad range of data types representing the state of the virtual world.

TDW provides researchers with:

TDW is being used on a daily basis in multiple labs, supporting research that sits at the nexus of neuroscience, cognitive science and artificial intelligence.

Find out more about ThreeDWorld on the project weobsite using the link below.

%8 07/2020 %U http://www.threedworld.org/ %1

ThreeDWorld on Github - https://github.com/threedworld-mit/tdw

%0 Conference Proceedings %B Neural Information Processing Systems (NeurIPS 2019) %D 2019 %T ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models %A Andrei Barbu %A David Mayo %A Julian Alverio %A William Luo %A Christopher Wang %A Dan Gutfreund %A Joshua B. Tenenbaum %A Boris Katz %X

We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases. Controls make ObjectNet robust to fine-tuning showing only small performance increases. We develop a highly automated platform that enables gathering datasets with controls by crowdsourcing image capturing and annotation. ObjectNet is the same size as the ImageNet test set (50,000 images), and by design does not come paired with a training set in order to encourage generalization. The dataset is both easier than ImageNet (objects are largely centered and unoccluded) and harder (due to the controls). Although we focus on object recognition here, data with controls can be gathered at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers. This work opens up new avenues for research in generalizable, robust, and more human-like computer vision and in creating datasets where results are predictive of real-world performance.

%B Neural Information Processing Systems (NeurIPS 2019) %C Vancouver, Canada %8 11/2019 %G eng