Sometimes, it might be better to train a robot in an environment that’s different from the one where it will be deployed.
Adam Zewe | MIT News
A home robot trained to perform household tasks in a factory may fail to effectively scrub the sink or take out the trash when deployed in a user’s kitchen, since this new environment differs from its training space.
To avoid this, engineers often try to match the simulated training environment as closely as possible with the real world where the agent will be deployed.
However, researchers from MIT and elsewhere have now found that, despite this conventional wisdom, sometimes training in a completely different environment yields a better-performing artificial intelligence agent.
Their results indicate that, in some situations, training a simulated AI agent in a world with less uncertainty, or “noise,” enabled it to perform better than a competing AI agent trained in the same, noisy world they used to test both agents.
The researchers call this unexpected phenomenon the indoor training effect.
“If we learn to play tennis in an indoor environment where there is no noise, we might be able to more easily master different shots. Then, if we move to a noisier environment, like a windy tennis court, we could have a higher probability of playing tennis well than if we started learning in the windy environment,” explains Serena Bono, a research assistant in the MIT Media Lab and lead author of a paper on the indoor training effect.
The researchers studied this phenomenon by training AI agents to play Atari games, which they modified by adding some unpredictability. They were surprised to find that the indoor training effect consistently occurred across Atari games and game variations.
They hope these results fuel additional research toward developing better training methods for AI agents.
“This is an entirely new axis to think about. Rather than trying to match the training and testing environments, we may be able to construct simulated environments where an AI agent learns even better,” adds co-author Spandan Madan, a graduate student at Harvard University.
Bono and Madan are joined on the paper by Ishaan Grover, an MIT graduate student; Mao Yasueda, a graduate student at Yale University; Cynthia Breazeal, professor of media arts and sciences and leader of the Personal Robotics Group in the MIT Media Lab; Hanspeter Pfister, the An Wang Professor of Computer Science at Harvard; and Gabriel Kreiman, a professor at Harvard Medical School. The research will be presented at the Association for the Advancement of Artificial Intelligence Conference.
Training troubles
The researchers set out to explore why reinforcement learning agents tend to have such dismal performance when tested on environments that differ from their training space.
Reinforcement learning is a trial-and-error method in which the agent explores a training space and learns to take actions that maximize its reward.
The team developed a technique to explicitly add a certain amount of noise to one element of the reinforcement learning problem called the transition function. The transition function defines the probability an agent will move from one state to another, based on the action it chooses.
If the agent is playing Pac-Man, a transition function might define the probability that ghosts on the game board will move up, down, left, or right. In standard reinforcement learning, the AI would be trained and tested using the same transition function.
The researchers added noise to the transition function with this conventional approach and, as expected, it hurt the agent’s Pac-Man performance.
But when the researchers trained the agent with a noise-free Pac-Man game, then tested it in an environment where they injected noise into the transition function, it performed better than an agent trained on the noisy game.
“The rule of thumb is that you should try to capture the deployment condition’s transition function as well as you can during training to get the most bang for your buck. We really tested this insight to death because we couldn’t believe it ourselves,” Madan says...
Read the full story on the MIT News website using the link below.