Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning [video]
November 23, 2020
November 23, 2020
All Captioned Videos Publication Releases CBMM Research
Authors Kelsey Allen, Kevin Smith, and Joshua Tenenbaum describe their newly published paper in PNAS about a new cognitive model that learns to use tools like humans do.
[MUSIC PLAYING] KEVIN SMITH: Tool use is one of the defining characteristics of an intelligent species like humans. Every culture we know of uses objects to shape the environment, culminating in the tools and machines we use to build massive cities, fly in planes, or communicate instantly across the world. And while most of the tools we use have been created and passed down from other people, the spark of innovation to repurpose objects to serve as tools exists in all of us. We could easily see how to use a rock as a hammer or put an old book under a table leg to stabilize it, even if those aren't typical uses.
These capabilities come so easily to us that we often forget how complex these behaviors are. Despite the universality in people, only a handful of other animals, just a few great apes, whales, and birds, use objects in this way. And we tend to think of these as some of the most intelligent behaviors that other species display.
For example, consider the problem this crow faces. She wants to get the food floating in the water in this tube. But in order to reach it, she needs to raise the water level. And for that, she needs to know that heavy objects can displace water, and then what kind of objects would be heavy enough to do this.
This reasoning process is impressively complex and requires not just knowing how our actions will affect the world, but also picking out the right actions and in the right sequence to get the food.
KELSEY ALLEN: In this work, we set out to better understand what makes people such capable tool users. We designed the virtual tools game, a novel task that requires creative physical problem solving similar to how we think people might use a new tool. In this game, people see a 2D scene with a goal, like getting the red ball into the green area. To accomplish this goal, people must choose one of three tools from the side of the screen and then place it somewhere in the level to launch, stop, or even support other objects.
The catch is that people can only use a single tool in a single place to solve the problem. But they can try as many times as they want. And this way, they can learn what happens after each attempt and potentially update their plans accordingly. We made a number of levels for the virtual tools game which were designed to test different kinds of physical concepts, such as indirectly launching a goal object by hitting another object in the scene, supporting an object, potentially prevent it from falling, or even tipping and opening objects, among many other different kinds of physical concepts.
We hypothesized that people need three critical capabilities to solve these kinds of physical problems. First, an object-oriented prior that guides the initial actions to those that will make a difference in the scene; second, an ability to imagine the effects of their actions before taking them; and third, a way of rapidly updating those strategies when their current attempts fail. We captured these components in the Sample Simulate Update model, or SSUP, as we call it for short.
KEVIN SMITH: If people's skill with tools is based on these three capabilities, then we would expect SSUP to behave like our participants, which is, in fact, what we found. When people typically solve a level in just a few attempts, SSUP usually finds the answer quickly, too. Levels that people find more difficult, SSUP also takes longer to find a solution.
When we looked at individual levels, we found that SSUP didn't just solve them at similar rates to people, but also in similar ways. The SSUP model often starts a level by trying one of a variety of different actions which match the distribution of ways that people start that level [? at. ?] Similarly, the SSUP model predicts that people should solve the level in particular ways, which often describes the solutions that people do in fact find.
People's performance can't be explained by simple models. Even deep networks, which have achieved strong performance in other games like Atari, don't generalize to new levels in the virtual tool game if they weren't explicitly trained on them. Instead, it seemed like people have a good intuitive understanding of how their actions will affect the objects in each level that they then use to solve the game generally.
KELSEY ALLEN: So what does this tell us about human cognition and physical reasoning? We suggest that people are such amazing tool users because of their incredible ability to make use of very rich, internal physical simulation engines to rapidly update their beliefs about what kinds of actions are likely to be successful. If we want build machines that are similarly flexible in their physical reasoning, they will require both more structured policies and better physical models than what is currently the norm.
Looking towards the future, we plan to use the virtual tools game as a platform for studying other aspects of tool cognition, both at a larger scale and across a much broader range of scenarios than has been possible in the past. This is an exciting step toward understanding the computational and cognitive mechanisms that have allowed the human species to develop tools as simple as a rock-based hammer to those as complex as an airplane.
JOSHUA TENENBAUM: I'm excited about this work for many reasons. But one of them is how it relates to the broader context of current research in artificial and natural intelligence. One of the AI developments that everybody's been hearing about and is very exciting in the last few years are advances in the reinforcement learning. Systems that can learn from the mistakes that they make capture the intuition that we know is very important in human and animal learning of what we call "trial and error learning."
But there is a very big difference between the way trial and error learning works in today's artificial reinforcement systems and what you can see humans do, like in this task that we've been studying here, this virtual tools game. So today's AI systems and reinforcement learning, they might learn from thousands, or millions, or even billions of examples and experiences, mistakes that they make. And while that can be very powerful, it also is just much slower than we see in human trial and error learning.
So we're really excited that here we have a task where we can study experimentally how humans learn from just one or a few mistakes and can very quickly get better. And we even have a computational model, the SSUP model, that starts to capture that. Looking forward, this might even lead to advances in machine systems, artificial reinforcement learning that can learn as quickly and as flexibly as humans.
Associated Research Module: