A new algorithm wins multi-player, hidden role games.
Kenneth I. Blum | Center for Brains, Minds and Machines
In the wilds of the schoolyard, alliances and conflicts are in flux every day, amid the screams and laughter. How do people choose friend and foe? When should they cooperate? Much past research has been done on the emergence of cooperation in a variety of competitive games, which can be treated as controlled laboratories for exploring interpersonal dynamics and decisions. MIT scientists have opened a new frontier in the study of cooperation: they have developed a computer program that plays—at a superhuman level—a multi-player, hidden-role game called The Resistance: Avalon. This board game, and its variants online, features allies and covert, subversive spies.
In recent years, deep neural networks have conquered chess, Go, and poker. These purely adversarial games fail to capture the real-world challenge of cooperating to compete, given uncertainty about friend and foe. In Avalon, played with five or more players, the game hinges on identifying spies hidden in the group. In this Manichaean world, the spies know who is a spy and who is not, and they act to prevent the non-spies from figuring any of it out.
At the outset, two randomly selected players are secretly assigned to be spies, known only to each other. There are five rounds of play. In each round, an individual proposes a team for a quest. The team must be approved by a majority of the five players or else the next player gets to propose a team. If all five proposals fail, the spies win. When a team is approved, the success or failure of the quest is decided by a vote; spies can sabotage the quest. The rounds continue until there are three total successes or failures. There is a further wrinkle. One spy is randomly assigned the role of Assassin. One non-spy is randomly and secretly assigned the role of Merlin, who knows all role assignments. At the end of the game, if the non-spies have managed three successful quests yet the Assassin guesses the identity of Merlin correctly, then the spies win.
As co-first-author Max Kleiman-Weiner—a postdoc at MIT affiliated with the Center for Brains, Minds, and Machines (CBMM) and a postdoctoral fellow at the Harvard Data Science and Center for Research on Computation and Society—puts it, “We know that cooperation has been this incredibly powerful force for humanity. It’s allowed us to achieve together what no individual could achieve on his or her own.” For intelligent machines to integrate smoothly with us in the world, he continues, “they need to be able to figure out who’s friend and who’s foe.”
Joining Kleiman-Weiner (who also got his PhD at MIT in Computational Cognitive Science) on the paper are co-first author Jack Serrino, MIT BS ’18 and MEng ’19 both in Electrical Engineering and Computer Science; senior author Joshua B. Tenenbaum, Professor of Brain and Cognitive Sciences at MIT and researcher in the Center for Brains, Minds, and Machines and the Computer Science and Artificial Intelligence Laboratory (CSAIL); and David Parkes, the George F. Colony Professor of Computer Science at Harvard and Co-director of the Harvard Data Science Initiative.
The algorithm that the team developed, dubbed DeepRole, has three components. First, it plays against itself, iteratively, with variation, boosting the likelihood of actions that are beneficial to each player, improving slowly. This has worked well in other games, such as Go and poker. Second, it implements deductive reasoning, based on observed actions, and tracks the belief about the assignment of roles (spy or non-spy) given the history of actions. This inference, a computational mechanism to distinguish friend and foe from actions, is a type of Theory of Mind. It was Serrino who, as an Avalon enthusiast, brought detailed insight into how people actually play the game, which helped build the inferential computation. The space of Avalon actions is too large for brute-force search, so in the third element of the algorithm (as in successful poker AI) a deep neural network is used to prune the search tree.
The team tested the algorithm in online play, in which all of the human players knew which of the five players were human and which were DeepRole. When humans play, as a board game or online, there is a good deal of chatter, attempts at persuasion, and lying. There were no restrictions on chat usage for the human players in the mixed games, but DeepRole was not endowed with any natural language capability, so it remained “mute” and disregarded all messages.
The result was that DeepRole competed successfully against human Avalon enthusiasts and other algorithms. Playing online, with the algorithm fixed after prior training on self-play, “our program outperformed humans as collaborators and as competitors,” said Kleiman-Weiner. In a game of four humans, the humans fared better, on average, if playing with DeepRole as a teammate (either as spy or non-spy) rather than another human Avalon player. DeepRole cooperates to compete—and it does so more effectively than humans.
Kleiman-Weiner describes the work as a “small step towards the grand challenge of making our human world, with all of its social complexity, computable for machines.” “Humans cooperate at a scale and scope that’s unprecedented in the natural world,” he says. He and his collaborators have created an algorithm that exhibits some of that social complexity, achieving cooperation, competition, and cooperation to compete. The algorithm follows Alexander Pope’s dictum: “Make use of ev'ry friend—and ev'ry foe.”