Finding Friend and Foe in Multi-Agent Games [video]

Finding Friend and Foe in Multi-Agent Games [video]

Date Posted:  December 20, 2019
Date Recorded:  December 16, 2019
CBMM Speaker(s):  Max Kleiman-Weiner Speaker(s):  Jack Serrino
  • All Captioned Videos
  • Publication Releases

[MUSIC PLAYING]

MAX KLEIMAN-WEINER: The human social world is filled with all kinds of hidden social relationships, whether it's a friendship or an alliance. For machines to exist in that world and collaborate with us, they need to be able to make sense of it. They need to be able to figure out who's friend and who's foe.

The AI community has had many recent successes in building agents that can play games with people-- so things like Go, chess, poker, and even more recent multi-agent games where there's teams, such as DotA or Quake. But in none of these games is the challenge of figuring out who's on your team at all part of the game. So in all of these settings, you're either playing by yourself, or you know who your team is from the very beginning.

We started to study a different kind of game, a game called The Resistance-- Avalon. It's a board game traditionally played with a group of friends of five or more people.

JACK SERRINO: I really, really was attracted to Avalon simply because it was a game that I knew quite well, and I knew there was a large depth of strategy. And so I knew for a fact that it would be difficult for an AI to tackle a game like this.

MAX KLEIMAN-WEINER: And in this game, you don't know who's on your team. There is a good team and a bad team. And the bad guys know who's a bad guy and who's good, but the good guys don't know who's good or who's bad. So the challenge in this game, for the good guys, is to figure out who's bad. And the bad guys have to act in ways where the good guys can't figure that out.

So the whole challenge of the game is this kind of social inference task. Inspired by human theory of mind, we were able to endow our algorithm with a computational mechanism from distinguishing friend from foe. In brief, the agent thinks, what would a good player do, and what would a bad player do? So it thinks in a forward direction, how might different players act? And then it can go backwards and figure out, based on how someone acted, what kind of player was that? Were they good or were they bad?

JACK SERRINO: We used an algorithm called Counterfactual Regret Minimization, also known as CFR. CFR operates similar to self-play, where it imagines what would happen in many, many possible future scenarios, and updates its actions iteratively to improve its play. CFR has been applied in lots of other scenarios, like poker, to great success.

But one big challenge of applying it to Avalon is this concept of hidden motives. In poker, it's very clear that your opponent is just trying to increase the number of chips that they're holding, whereas in Avalon, it's not quite clear if your opponent is acting to win for the good team or acting to win for the bad team.

MAX KLEIMAN-WEINER: To make this algorithm scale, though, to the full game, which had over 10 to the 50 states, required using deep learning to more heuristically prune the search, allowing this algorithm to work in the full game of Avalon. We were able to evaluate our agent, not just in a bunch of simulations, but actually had it play online with real people. And we found that our algorithm outperformed both humans as collaborators and as competitors.

JACK SERRINO: You always would prefer to have the bot on your team than a fellow human. Our bot, in this way, is a better collaborator and better competitor. It's able to win in situations where humans aren't able to win. And it's able to prevent loss in ways that humans aren't always able to.

This work that we did actually furthers the field in situations where you're not quite sure who to trust. So maybe in the future algorithms similar to this can help you find who's truly on your side in normal, everyday social interactions.

MAX KLEIMAN-WEINER: Our work's a small step towards the grand challenge of making the human world, with all of its social complexity, computable for machines.

[MUSIC PLAYING]

Description: 

Co-authors Max Kleiman-Weiner and Jack Serrino discuss their latest publication where they created a new algorithm. The algorithm that the team developed, dubbed DeepRole, has three components. First, it plays against itself, iteratively, with variation, boosting the likelihood of actions that are beneficial to each player, improving slowly. This has worked well in other games, such as Go and poker. Second, it implements deductive reasoning, based on observed actions, and tracks the belief about the assignment of roles (spy or non-spy) given the history of actions. This inference, a computational mechanism to distinguish friend and foe from actions, is a type of Theory of Mind. It was Serrino who, as an Avalon enthusiast, brought detailed insight into how people actually play the game, which helped build the inferential computation. The space of Avalon actions is too large for brute-force search, so in the third element of the algorithm (as in successful poker AI) a deep neural network is used to prune the search tree.

Associated Research Module: