%0 Journal Article %J arXiv %D 2023 %T NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants %A Xavier Puig %A Tianmin Shu %A Joshua B. Tenenbaum %A Torralba, Antonio %X

In this work, we study how to build socially intelligent robots to assist people in their homes. In particular, we focus on assistance with online goal inference, where robots must simultaneously infer humans' goals and how to help them achieve those goals. Prior assistance methods either lack the adaptivity to adjust helping strategies (i.e., when and how to help) in response to uncertainty about goals or the scalability to conduct fast inference in a large goal space. Our NOPA (Neurally-guided Online Probabilistic Assistance) method addresses both of these challenges. NOPA consists of (1) an online goal inference module combining neural goal proposals with inverse planning and particle filtering for robust inference under uncertainty, and (2) a helping planner that discovers valuable subgoals to help with and is aware of the uncertainty in goal inference. We compare NOPA against multiple baselines in a new embodied AI assistance challenge: Online Watch-And-Help, in which a helper agent needs to simultaneously watch a main agent's action, infer its goal, and help perform a common household task faster in realistic virtual home environments. Experiments show that our helper agent robustly updates its goal inference and adapts its helping plans to the changing level of uncertainty.

%B arXiv %8 01/2023 %G eng %U https://arxiv.org/abs/2301.05223 %0 Conference Proceedings %B 14th European Conference on Computer Vision %D 2016 %T Lecture Notes in Computer ScienceComputer Vision – ECCV 2016Ambient Sound Provides Supervision for Visual Learning %A Owens, Andrew %A Isola, P. %A Josh H. McDermott %A William T. Freeman %A Torralba, Antonio %K convolutional networks %K Sound %K unsupervised learning %X

The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.

%B 14th European Conference on Computer Vision %C Cham %P 801 - 816 %8 10/2016 %@ 978-3-319-46447-3 %G eng %U http://link.springer.com/10.1007/978-3-319-46448-0 %R 10.1007/978-3-319-46448-010.1007/978-3-319-46448-0_48 %0 Conference Paper %B Conference on Computer Vision and Pattern Recognition %D 2016 %T Visually indicated sounds %A Owens, Andrew %A Isola, P. %A Josh H. McDermott %A Torralba, Antonio %A Adelson, Edward H. %A William T. Freeman %B Conference on Computer Vision and Pattern Recognition %8 06/2016 %G eng