Reflexive Theory-of-Mind Reasoning in Games
Date Posted:
December 2, 2014
Description:
Prof. Jun Zhang , Department of Psychology and Department of Mathematics University of Michigan, Ann Arbor
Theory-of-mind (ToM) is the modeling of mental states (such as belief, desire, knowledge, perception) through recursive (“I think you think I think …”) type reasoning in order to plan one’s action or anticipate others’ action. Such reasoning forms the core of strategic analysis in the game-theoretic setting. Traditional analysis of rational behavior in games of complete information is centered on the axiom of “common knowledge,” according to which all players know something to be true, know that all players know it to be true, know that all players know all players know it to be true, etc. Such axiom requires recursive modeling of players to the full depth, and seems to contradict human empirical behavior revealed by behavioral game literature. Here, I propose that such deviation from normative analysis may be due to players’ building predictive mental models of their co-players based on experience and context without necessarily assuming a priori full rationality and common knowledge, rather than due to any lapse in “instrumental rationality” whereby players (and co-players) translate the predictions from their mental models to optimal choice. I investigate this mental model account of theory-of-mind reasoning by constructing a series of two-player, sequential-move matrix games all terminating in a maximal of three steps. By carefully designing payoff matrices, the depth of recursive reasoning (i.e., first-order ToM versus second-order ToM) can be contrasted based on participants’ choice behavior in those games. Empirical findings support the idea that depth of ToM recursion (related to perspective-taking) and instrumental rationality (rational application of belief-desire to action) constitute separate processes.
JUN ZHANG: OK, thank you. Thank you, Tommy, for giving me this opportunity to give a lecture on one of my research topics. And I'm very glad to come here and to talk with many of my colleagues' friends and the students here. That quote that Tommy just apparently looked up my website, which has yet to be updated, is actually taken from Jerome Busemeyer.
He said that life is complex because it has both real and imaginary parts. I have another quote, I think, on my website which I like, which talks about modeling, actually. The story goes that, say, OK, trains in Austria usually running late. So one day a passenger kind of waited on the train in the train station and complained about this and asked, well, if the train's always late-- well, he complained to the station master. He said, if the trains are always late, why do have to print out the timetable? What's the use of the timetable?
And then the station master looked at him and said, OK, without the timetable, how do you know the train is late? I think this is quite an interesting remark in terms of the way how models and the data interact. We built models, to some extent, to serve as a kind of a benchmark, a way of thinking about the data, to think about the underlying process. So with that, I would like to just now talk about, as Tommy said, my background of coming out from somebody who actually first studies theoretical physics in China and then got a PhD in neuroscience and then now in a psychology department with a goal of modeling a psychological process, modeling the mind.
And the approach which I have taken is trying to look at the mind as kind of a computational device which runs kind of a software, a neural hardware. OK, it's a particular kind of a view on computational intelligence. I therefore have this quote of mind, machine, and mathematics. I've done a variety of work, but today the topic I'll be talking about is so called the theory of mind.
The notional of theory of mind has its particular kind of a meaning in developmental psychology or in psychology in general, as evolved as a core domain for cognition. Here's a quote from Henry [? Willman's ?] book. Basically, he just says that, well, we have this lay view about, say, our perception, which forms the basis of our belief.
And we have basic emotions underlying neurophysiology which give rise to our desire. And then based on our belief and desire, we take actions. And then we get reactions from others and from our environment. And therefore from perception, belief, emotion, desire, action, and so forth, we have a generic kind of a theory about this belief, desire, psychology.
And this emerges in the study of human cognition, developmental cognition, the development of cognition. Very early on in a child's life, he or she will need to acquire this kind of a general model of the notion of a mind. So the theory-of-mind type of reasoning, that is to say reasoning about the beliefs and desire and actions, can be looked at from a more formal prospective via the so-called game theory.
Well, game theory emerges out of economics. It's a framework for model, interpersonal, strategic interaction. So the basic ingredients of a game is that you have a set of players, a set of players who individually can take actions. And based on the joint actions of these individual players, there is an outcome. So in other words an outcome is a result of the combined action of all the players.
Now, given an outcome, different players can have different values or payoffs for these outcomes. So the outcomes are determined by actions of all the players and the values of each of the outcomes. Or the outcome that's resulted is valued possibly differently by all the different players. So you have this player, outcome, and pay off, which are the basic ingredients for a game.
Now, in terms of the solutions for the game, the formal game-theoretic solutions of the game, people have various notions. One for instance is the notion of best reply, which involves a modelling of the other person's action and trying to devise a best response to the actions of others. The idea of equilibrium is that somehow the joint reasoning of the players leads to a state in which no individuals would like to deviate from that state of affairs. I'm going to get into more details later on.
In the economic studies of the game, there's some fundamental assumptions being made. And these assumptions are the axioms of common knowledge. That is so each player would know the structure of the game, know the players, know their actions, and know the payoffs or the structure of the game. And this is all common knowledge.
Now being common knowledge, that is to say that players know the structure, the nature, the process of the game and know that other players also know the strategies, the payoffs of the game, and know that other players know that they know the game structure and so forth. So this is a part of the common knowledge. So you know something. You know others know something. And you know that others know that you know something, et cetera. So it forms a common knowledge.
And this axiom of common knowledge is very important. It plays an important role, for instance, for that notion of the Nash equilibrium, for the equilibrium idea. And then along with that is the notion of the axiom of rationality. In other words, players tend to act rationally. Now, here's a little bit of trouble about this notion of the rationality in the case of the games. And this is one of the questions that we are going to address, a part of this notion of the rationality.
What we submit and what we argue is that while we can well define the notion of the so-called instrumental rationality, that is to say once we have a model of the game situation, a model of the other players' action in the game, then we can act rationally as a best response, as a best action, out of our model. But nevertheless, this model itself, the modelling of others in a game setting, is a much more involved process. It involves a recursive kind of a reasoning, which is yet to be explored.
So let me just give a detailed exposition of these concepts for the case of, say, for instance, the prison dilemma game and then the notion of the Nash equilibrium in the prison dilemma game. So a prison dilemma game as shown on the right-- this is adapting the strategies to say you have two players. Each can have a choice of being selfish or altruistic. And the numbers in the cells represent the payoffs to these individuals, with a smaller number meaning less of the value and larger numbers being more or higher of the value.
OK, so five is more than three and more than one, more than zero. So now if both players act altruistically, say, for both players you can get three points. But if one player acts selfishly and the other person acts altruistically, the player who plays selfish would get five points, whereas the person who played altruistically would get zero points. And if both play selfishly, then they both get one point.
OK, so this is a standard setting for a so-called prisoner dilemma game. So for this game, this has puzzled many, many people. For this kind of a game, if you do an analysis as to what players would do, what people would do in this kind of setting, the game-theoretic analysts say, for instance, the notion of the Nash equilibrium basically says that how people would choose would be in the state of affair in which nobody would want to unilaterally deviate.
So in other words, if somehow the players have chosen a particular strategy, knowing that other players have chosen that strategy, no player has the incentive to unilaterally deviate, on other words, to change if the other player will not change. So let's take a look at this. For instance, the three-three cell. That cell would not be a Nash equilibrium because if we are in the three-three cell, the role player would change to the other strategy, the selfish strategy, while gaining more points, a three becomes a five.
And likewise, the column player would also want to deviate from his or her choice to get a five. So three-three, the cooperative solution for this prison dilemma gain, is not a Nash equilibrium. Whereas the one-one, the selfish-selfish, or the non-cooperative solution, that is a Nash equilibrium because nobody would like to deviate unilaterally without damaging him or herself, without getting a lower pay off. OK, so that's the notion of the Nash equilibrium.
One way to explain or to solve this prison dilemma game is they say, OK, use this Nash equilibrium based idea, which is basically saying that it's a mutual best response based on like the theory of mind kind of a modeling of what others would do. That is to say, whether others would stick with the choice, this is the foundation for the Nash equilibrium, a unilateral deviation. Now, extensive psychological research showed that this notion of the Nash equilibrium would not be valid with actual humans, when they're engaging those games.
So this is showing that one-one would be the Nash equilibrium in this case. Nobody would want to unilaterally deviate. Now, another notion in solving the game is the notion of the so-called dominance strategy.
A dominating strategy is a strategy such that regardless of what others do, it's better for the player to choose one strategy versus the other, one action versus the other. So now in this case, if you look at the choices of the two players here, you soon realize that the one-one cell, the selfish-selfish, is actually a dominating strategy for both players, for both the row and column players. In other words, for the row and column players, for the players in this game, you do not need to know what the other person would have chosen.
It is always better for you to choose selfish. Why? Because if the other person chooses selfish, of course, you should choose selfish. But if the other person chooses altruistic, you should also choose selfish because it gets you higher points for this game. So the one-one solution, the non-cooperative solution, is actually very, very stable because it is a combination of the dominating strategies of the two players.
A dominating strategy is one that does not rely on a modelling of the other. You really don't even need to kind of extensively model the other player. But just by analysis of this, you are always better off in choosing dominating strategy. So for the prison dilemma game, the only way when you can actually get out of this non-cooperative solution-- OK, so the question is when cooperation in the prison dilemma game can be individually rational, can be rational to the individual or to the players themselves as an instrumental, rational kind of strategy.
Well, it turns out that the only way for that to happen is when you are not playing this as a single-shot game. And you have to assume there's a probability of continuation. There is a probability of continued interaction. So the nonzero probability of continued interaction is a necessary condition for this prisoner dilemma game to evolve a cooperative solution.
So basically in those cases, now a rational player would maximize a total expected payoff, taking into account the probability of continuation, of continued interaction. What happens is that when that is taken into account, then you can basically transform the payoff values of your actions. And what happens in this case is that your play, your actual choice in the current gain, not only will give rise to some immediate payoff but also would influence the other person's choice for future games, in future rounds of the same game.
So therefore, your action not only impacts your immediate reward, immediate gain, but also impacts on how others would act for future games. So therefore, the cooperation will arise. So it turns out that after very rigorous analysis of this thought, one can show that the sufficient condition for this to happen is when the continuing probability has to exceed a certain amount of threshold.
So to rationally solve the prison dilemma game, one requirement is that you would have to have continued interaction or a presumed probability or expectation of continued interaction. So that kind of shows that all these kind of cooperative solutions in the prison dilemma game would rely on that kind of expectation, which may be neurobiologically grounded into the brain through evolution, for instance. So that is a rational solution for cooperation.
Now, I'm not going to expand on this type of analysis for the prison dilemma game and how cooperation arrives in this. And I have some theoretical results, a theoretical paper on this. But rather today, I'm going to talk about a related issue about this recursive depth in this theory of mind reasoning, a recursion in theory of mind reasoning. OK, so to explain what that is, I'm going to give you an example, the so-called p-beauty contest game.
Now suppose we have, say, the audience here. Suppose I asked you to submit a number between 0 and 100. And you're going to write it on a piece of paper and hand it to me. I'm going to collect all the papers, and I'm going to do an average of all of the numbers that you submitted. And then I'm going to multiply 2/3 of that average. So I got some number.
After multiplying 2/3 of the average, I get some number. Now, the rule of the game is that whoever submits the number which is closest to the 2/3 of my average and not exceeding that, that person would win the game. OK, so everybody gets a chance to, say, submit a number between 0 and 100. You submit any number. To make it simple, you can maybe just submit an integer.
And I'm going to take an average of this and multiply 2/3, and that's my target number. You are going to shoot for the target number. Whoever gets closest to that number would win, say, a big prize. I actually run this for my mathematical psychology class, and by actually running this experiment and saying that whoever wins this really can get a boost in their grade. First, think about this. What number would you submit?
Suppose this is a very important kind of consequence for you. What number would you submit? I want you to think maybe for one minute, a couple minutes maybe.
AUDIENCE: 30.
JUN ZHANG: So that's then 30 here. Right, 30? OK. OK, so actually let's go through some of the reasoning that you might have in thinking about what number to submit. OK, now, in order to win this game, I'm trying to submit. So everyone will submit between 0 to 100, but I'm going to take the average and do 2/3 to it. So the number, the average which I get, can not be more than 67.
Assuming everybody submits 100, average 100. 2/3 of that is 67. So there's no way I can win by submitting anything above 67 because I need to be below my target number. So I should not submit anything above 67. But now realizing that maybe the person that sits next to me thinks the same thing, they realize that, too, so they think nobody will submit anything above 67. So then if everybody submits 67, I'm going to take 2/3 of 67, which is 44.
So maybe I should submit, like, 44. And then I realize that other people may realize the same thing. So I can [INAUDIBLE] to redo my calculations and so forth. So I submit 29 and so forth. If you keep doing this, very soon you will find out the best thing to submit is zero. So this is an example of this recursive reason, this recursion, the theory of mind recursion, because you think what others will think. You think what others think that you will think and so forth.
Well, I actually ran this experiment, you know, with my class, and others have run this experiment. It turns out that people would never submit zero, very few people. There are people who submit zero anyway, but not all. But you can argue, well, maybe the number they think about maybe is 15. That's in the middle, and then 2/3 of that is, like, 33. Maybe that's where the 33 comes about.
It turns out that such an experiment has been run on subjects in various contexts. They ran this with some college students in Germany. And the average number they get is, like, 35. And they ran some Caltech students, which is 24. That's one level kind of deeper. And then they have run other, more prestigious kind of subject groups, like readers of Financial Times, London.
They actually gave out a real prize for that. So they opened up a window for, say, two weeks for submission. And then they are going to tally the results, and they're going to give out an actual monetary reward for the person that does the best. So it turns out that for that group, the numbers they submit are always between, like, 24 and 35. OK, so that's the mode. I mean, there is a whole spectrum of numbers being submitted, but the mode of that, which indicates that people actually do know more than, like, second or third level of recursion.
Or even maybe, depending on your interpretation of data, it can be between 1 and 2, if you think everybody starts out with 50. So this level of recursion, of how deep people go into it in real, social interactions and so forth, is of interest here. In my first studies, I looked into the literature about how they measure the depths of recursive reasoning.
So it turns out the existing paradigms for measuring this kind of recursive depth is either through these so-called dominance solvable games, through iterated removal of this strategy in the dominantly solvable games. And what they do is that they always run a few games, but against many, many subjects. And so for these games, you can argue that if a subject is faced with many, many different subjects in doing this kind of reasoning, then they will have a model of the strategic sophistication of the population.
So that may be the reason why people don't even go to, like, 0, in this p-beauty contest game. Well, think about this. Even though you think theoretically the equilibrium solution is zero, you may not think that other people may have thought about that. Or maybe there's a distribution of the strategic sophistication that leads you to say, oh, maybe I'll just get like 24 or 35 or something. So that's the explanation for that.
So even though you are able to reason in great depth, maybe you're just not doing that because your model of the general population is such that there's a distribution of the depths of this recursion. So this depth, or order of recursion, in order to study that in a very individual social interaction and dynamic setting, we propose a paradigm in which we have a series of trial-unique games with a diagnostic payoff structure that will allow us to differentiate this order or depth of recursion.
And this paradigm has since been adopted with some modification by a variety of other groups to probe this depth of recursion. And I'm going to explain to you what that paradigm is. Now, the paradigm works in the following way.
So we have a game. It's a three step game. So we have to players, player one, player two. And the game starts with A, and then B, C, D. So it can move to B, to C, and then to D. There are four cells. And so the numbers, as before, represent the payoffs to the players, where the first number goes to player one. And the second number goes to player two.
And player one controls the first and the third move, and player two controls the second move. So this is a sequentially played game, a sequential game. So you start out, say, in cell A. And player one controls the first move. Player one can decide whether to stay in cell A and collect the reward, collect the points, or decide to move on to cell B and then pass the control to player two.
So if player one decides to move to cell B, and it's player two's turn to decide whether to stay at cell B and collect the reward or to move the piece to cell C and let player one have the final say-- so they think about either to stay there or to move. If then player two decides to move to cell C, then player 1 has another chance to either stay at C or move to D. And that's it.
OK, so whichever player decides to stop, then the game ends in that cell, and each player would collect their respective payoff. And as you can see, the numbers differ in these four cells for the two players. And also they differ from game to game. This payoff matrix is trial unique in the sense that players encounter that once in the whole experiment.
So, now, this sequential move game, there are a maximum of three steps, with four possible outcomes. So if you draw out in kind of a game-tree diagram, you can see that player one controls the first and third, and player two controls the second move. And these are the payoff values, and so forth. So we instruct the players such that the game is non-cooperative in the sense that the goal is to earn as many points as possible for themselves and not to worry how much the other, their opponent or their co-players, earn.
So the questions we ask our subjects are as follows. We ask them two questions in the sequence. First, we ask the subject, what would player two do if the game progresses to cell B? So what would player two do if the game progresses to cell B? And then we ask them the question, what would player one do in cell A, given the answer to question one?
So the first question is basically a modelling of what would happen, and the second question is translate that model into a rational action. So let's take this as an example. So we have these payoff numbers. I want you to kind of look at this and see how you would answer these questions. So now, what would player two do in cell B with this payoff matrix? So whether player two would move or would stay-- OK, so that's the question.
So there are people who think they should move. Oh, OK, so we do have move and stay, right? OK. So now if, say, we're in cell B now, whether player two should move or should stay-- so you think, well, because player two already has a three, but there's a potential of getting a four if he or she moves, then if you think further, player one had the final chance of deciding whether to stay in cell C or move away from cell C.
So one ought to try to think whether it makes sense for player one to move away from cell C or to stay at cell C upon the final move. OK, so in this case, player one would get a one by staying in C and get a two in D. So if the game had progressed to cell C, player one would want to move to D. So therefore player two would not want to move away from cell C because there is no way the game can end in cell C.
But you can see already in this type of reasoning, or in answering the question, you invoke a kind of a theory of mind reasoning. So you can reason on one hand, saying that, OK, so player two would think because it's better off in the payoff in C than B, so you should move that way. But on the other hand, if player two thinks what player one would do, so therefore adding from C to D, then player two should not move.
So you already see this kind of recursive reasoning. So in one case, a myopic kind of a reasoning would say player two would move because you can get a four. But a more predictive reasoner would say player two would not move away from B because player one here would want to move from C to D if he or she has a chance.
So this level of recursion can be revealed as to how you answer that particular question. And then given that answer, you can see what would be the rational thing to do for player one in cell A. So what would be a rational thing to do? Well, you just need to translate that model into the action.
So these are the two questions we ask. For this particular game, this is a diagnostic game, a diagnostic in the sense that depending on the level of recursion, whether our subject is reasoning myopically or predictively, you give opposite answers to the question whether player two would move from B or not. So it's diagnostic.
Now, compare with this game here. We have another set of payoff values here. So in this case, a myopic player two-- well, if you think myopically, when player two is in cell B, then they should not move because otherwise they will get a one, decreasing their points. Also if you think deeply or predictively, that means player one will not move away from C to D because there is a two over a one.
So for that reason, you should not move. These two answers, whether you're engaged in myopic reasoning or predictive reasoning, would give rise to the same answer. So this is a game that is non-diagnostic. It is non-diagnostic. So we have classes of games which on one hand are diagnostic and on the other hand are non-diagnostic. But for the non-diagnostic games, we use them as a way to serve as our so-called catch trials because we want the subjects to engage in reasoning.
And if somehow they are not paying attention to the payoff numbers, they are just randomly choosing and so forth, they will get those wrong. So we use them as catch trials. Our block of the games would mix the diagnostic games with a non-diagnostic game, using the non-diagnostic games as catch games, catch trials.
So we have diagnostic and non-diagnostic games. And we have four strategically distinct types of diagnostic games, based on the payoff values. There are four different kinds of games. And we balance. We counterbalance everything in terms of the predictions about, say, yes, no. They will move away. They will not move away. And we also counterbalance about their risk attitude about whether you should move.
I started with lower payoff. And I wouldn't move if I started with a higher payoff, or this heuristic kind of risk attitude. We make everything counterbalanced. So in this case, the player strategy, as I said, may either play myopically or predictively at cell B. And player one's model, or player two, can either be myopic or predictive. And player one's choice at cell A would depend on this model.
Now in the actual experiment, we assign either player one or player two-- one of them of course is our subject. But the other is an experimental confederate. We instruct the confederates to perform according to our instructions, and these are the experimental manipulations.
We start with 24 games as training for training the subjects to familiarize them with the game. And these are very simple payoff structures, so the subject would have no problem in deciding what player two would do in cell B, move or not move. And then we have two test blocks, each with 20 games. We have 16 diagnostic games interlinked with four catch trial or non-diagnostic ones.
And all games are unique, with distinct payoff structures. So they only encounter that once in the experiment. And to avoid heuristics, we block, like, all the games that start with a two with all of the games that start with a three in separate blocks to avoid an inference of the risk attitude.
So we have a total of 64 games, all presented in the fixed order. Now, subjects are assigned either as player one or player two. So it's a between-subjects design. The reason we want to do different assignments is that we want to test this so-called perspective taking because we ask the same kind of question. But we want the subjects to be in different shoes, to assume the role of player one or player two in this experimental manipulation.
The opponent is always an experimental confederate. And the games are actually played out by computer. We actually have a confederate that comes in. They are introduced to each, interact with our subjects, and then go to different rooms and play the game.
So subjects are asked to answer these two following questions in a fixed order. So the first question is, what's player two's optimal strategy at cell B? And question two is player one's optimal strategy at cell A. Yes?
AUDIENCE: Do they have infinite time?
How much time? Do they have x amount of time?
JUN ZHANG: They can think as much time as they want. But once they are familiarized with the game, it took them, like, a few seconds to do it. And these are the intro psych subject pool. And we recruit from them. They always want to get out of the experiment as quick as possible. We allow about one or two hours for the experimental blocks.
So the first question is about the optimal strategy. This is a reasoning question. This is what we call an anticipation question, or third person. When the subject is assigned as player one, then this is an anticipation question, or third-person perspective. So we ask that question, whether the opponent would move if the game progressed to cell B-- so whether the opponent would move if the cell progressed to B, and whether you will move away from cell A. This is the question, [? too. ?]
On the other hand, if the subject is assigned as player two-- and this is really their planning for their move in cell B-- and then we ask whether you will move if the game progresses to cell B and whether it is smart for the opponent to move away from cell A. So it's the same question, but phrased slightly differently to make the subjects reason differently, to take different perspectives based on their assignment.
The first question is referred to as the theory of mind, or the [INAUDIBLE] question. And the second question is an instrumental rationality question. For data analysis, we exclude unmotivated, confused subjects from our data analysis, based on their performance on the training games and also on the catch trials. So after this exclusion, we have 28 subjects playing player one and 36 subjects playing as player two. It's a between-subject design.
And then to score their choices for the first question, a myopic choice would be scored as zero. A predictive choice would be scored as one. And to score the second question on instrumental rationality, if their choice is consistent with what their theory of mind model is, then we score that as being no rationality error, zero. Otherwise, we scored it as one, or rationality error. So rationality performance is scored with respect to their theory of mind model that they have.
And scores for four successive games are averaged. We call it game-set. It will give rise to a predictive score from 0.25 and up to 1.0. So if they act predictively for all four games of the game set, then they get a score of one. Otherwise, you can just see the proportion of times they act predictively. So now here's the data.
So this is a distribution of this predictive score for the subjects when they are assigned as player one, on the left, and player two, on the right. So the different shades represent the predictive score. And the height of the bar represents the amount of subjects, total percent, 100%.
So as you can see, in both cases the subject starts out as having a relatively low predictive score. That is to say they play relatively kind of myopically. There's very few people, like, who have a score of, say, one in the very beginning. And gradually, as the game progresses, you see the distribution changes. So there's a growth of the number of people who score higher and higher.
And this is the case both for the subjects as player one and subjects as player two. So in the case, they are actually, through their interaction with the same subject, they are learning something. They are enhancing their model.
Now, in the data that I show you, the confederate, they always act predictively. OK, the confederate always acts predictively. This is the data when we average across the entire population. So the previous slide shows you the distribution of the number of subjects with the various predictive scores. This is averaging across that whole population.
As you can see, the predictive scores would increase as the interaction with the opponent. As player one, so they increase. And towards the end, they get a score of like 0.65 or so, 0.64, 0.65. As player two, they got a much higher score, too, like, 0.9.
The player-two condition, it is a case where the subjects are taking the shoes of the other person. So they are reasoning. Effectively, they are reasoning with one level of recursion in answering the first question. So now this kind of a pattern, if you compare with their rationality score-- the rationality score is the question about what would player one do in the first cell.
So it turns out that there's not much difference between player one and player two in the rationality score, in the instrumental rationality, in applying their theory of mind model to come up with the optimal choice. So there's not much difference in terms of the assigned role. So the rationality error decreased slightly in the second block. So the four game sets, and then there's a break.
There's a second block, so the decrease slightly in the second block. But there's no difference between the assignment of player one and player two.
Furthermore, we measured the response time. So we measured the response time for the subjects engaging in this task. So you look at the time it takes for them to answer these two questions.
So on the left, these are the subjects assigned as player one. This is their time to answer the first question. And these two bars are the time to answer the second question. Now, we sold sort these answers according to whether the answer is consistent with a myopic model or a predictive model, in other words, whether the answer to the question is consistent with predictive or deeper reasoning or myopic or shallow reasoning.
And with the hypotheses that if you engage in recursive theory of mind reasoning, you add basically one more step to that. It will take you longer to come to that conclusion, so therefore reaction time would be longer. So this is borne out. As you can see, if there's a predictive reasoning, then it takes a few seconds longer than if you reason, like, myopically. Compare this with the case where your answer to the second question, the instrumental rationality, that is converting your model, your prediction of what's happened in cell B, to what you should add in cell A.
The reaction time, the first [INAUDIBLE], there's no difference between whether this is based on a myopic or predictive model. The same pattern occurred when player is assigned as player two. So basically, one extra step of recursion would cost a few more seconds. So this is reaction-time time data that supports this idea that in fact they are engaging this kind of a recursive, or deep versus shallow, reasoning.
Now, next, we look at the statistics about the performance of individual subjects as the game progresses. Now, as the game progresses, it turns out that some subjects, they may start out by reasoning myopically, but eventually towards the end they kind of realize that they had maybe an aha moment in the middle. And they say, yeah. They switch to a predictive mode of reasoning.
So we measured like the switch point, if you look at their choice pattern. So we measured the time by which they did this switch to become predictive. And we do this for both the subject assigned as player one and the subject assign as player two. So we looked at the switch in time.
It turns out that these switch-in-time dynamics do not differ across the role assignment, going from myopic to predictive. So in other words, this learning, or the acquisition of this recursive thinking, or that it happened, it occurred to them they should actually think that way, think kind of one more step. That acquisition is independent of the role assignment of the subjects.
But it does matter in terms of whether any subject would convert, would actually switch. So it turns out that in the end, if the subjects are assigned as player one, only 43% of the subjects get converted, acquire this kind of deep reasoning. Whereas if they are assigned as player two, there are 64% of the subjects that get converted.
So this ratio of conversion differs by the role assignment, indicating that, indeed, this change of prospective, in other words asking the subjects to act as player two, does help them in reasoning predictively. Now, there is kind of a caveat thing, this kind of interpretation because one possible interpretation, or a way to interpret this pattern of data, could also be, like, maybe the subjects did not realize that the game has, like, a final step, the third step.
So the final step, so maybe this has to do with kind of a reasoning horizon or decision horizon, like how far ahead you look. Maybe when they are reasoning what player two would do from cell B to C, they may not have reasoned far enough to consider the possibility of what happens with C to D, so this kind of horizon. So this pattern can also be consistent with the fact that there may be a change or realization of the decision horizon or reasoning horizon.
But nevertheless, this difference of these two numbers clearly shows that there is a benefit for reflexive reasoning by perspective taking. With a perspective switch, there is a benefit. You're more likely to engage in the deeper level of recursion, and this is almost by definition, definition about recursive level.
OK, and then we also looked at the effect of the opponent's strategy-- so what our experimental confederate, how his or her action could impact the theory of mind model. So in this case, the opponent on the top would not switch. So either they consistently played myopically or consistently played predictively. So we want to see how our subjects would respond to this kind of a player.
So when the opponent plays consistently predictively, our subject kind of catches up. On average gradually, they just increase their level, the theory of mind level of the opponent, mirroring the actual behavior of the opponent. On the other hand, if the opponent acts consistently myopically, then you see the top model stays at the lower level, which means that subjects are able to dynamically adjust their model of their opponent throughout the experiment.
Now, this is when we actually have the opponents switch their strategy from a myopic to a predictive and from predictive to myopic. So during the first block, the opponent acts kind of predictively. So this is the data here. So in the first block, the opponent acts predictively. So our subject would have to follow a model to model them predictively.
But during the second block, we instruct the opponent to switch, to act myopically. And then as a result, the subject's model of the opponent model also switches. You can see the predictive score kind of goes down. On the other hand, this is to be contrasted with when the opponent first starts out being myopically in the first block. And in the second block, they become predictively. So you see this kind of increase, and there is a crossover. There's a crossover between these [INAUDIBLE].
So this data shows that the subjects are actually dynamically constructing and adjusting their theory of mind model of their opponent. And their prediction mirrors the way that opponent acts in these games. OK, so to conclude, we investigated depths in theory of mind reasoning. So basically the subjects seem to start out with a default myopic model, but then they are able to modify that with the dynamic interaction with the opponent.
And perspective would affect the likelihood of engaging in this predictive reasoning. So there is a cost for taking a third-person perspective compared with a first-person perspective. There's a cost of taking third-person perspective. But the perspective taking does not affect the time to acquire this predictive model.
This reaction time data for the recursive depths is consistent with the fact that they are actually reasoning with depths of recursion. On the side of the instrumental rationality, we see that the performance on the second question shows that their rationality error, the rationality analysis, instrumental rationality, they are not affected by a change of perspective. And also, they are not affected by a change of the opponent's strategy.
So this seems to be suggesting that depths of ToM recursion and instrumental rationality, they may constitute two separate modules, or two separate processes, for this theory of mind reasoning. So to conclude my presentation, I just give this motto of the day. "We more readily account for others' reaction to an actual we plan than we realize that others, when planning their action, may have already accounted for our possible counter-reaction."
This is from my favorite, like, the sorting hat in the Harry Potter story. These days I'm watching that with my son, and I really love this kind of widget. So we more readily account for others' reaction to an action we plan than we realize that others, when planning their action, may have already accounted for our possible counter-reactions. OK, thank you very much. That's the end.
[APPLAUSE]
Yes? There's one. Yes?
AUDIENCE: How do you make sure the subjects are clear about the rules of the game? The learning curve may involve, like, a subject gradually learns about [? the rules ?] of the game rather than really [INAUDIBLE].
JUN ZHANG: Right, good question-- so we have the 24 training games. So before they play all these games, they play 24 training games in which the payoffs are very simple. So in other words, say for instance the payoff for player two from B-C-D is like one, two, three. So if they understand the rule of the game, they should know. They should answer that very correctly.
So we gave them the training games, and we look at the performance in the last, say, eight games of the training games. And that, also, we coupled with these catch trials. So these are the ones we used to basically screen out the subjects. Right, so, yes. Yes?
AUDIENCE: I have a p-beauty contest question. You said that it wasn't typically any more than three or four orders [? of recursion ?] [INAUDIBLE]. Has there been any sort of correlation between, say, the size of the group that's asked the question [INAUDIBLE]?
JUN ZHANG: Yeah, that's a good question. But I'm not aware of-- I mean, I don't know much about the [INAUDIBLE] literature on the question about the size dependency on this. So the empirical kind of answer to the level is normally, like, two to three. But it depends on what you count as, say, the zero level and the first level because you can say, maybe everybody submitted like 15 instead.
So then, like, the zeroth level would really be 33 because 2/3 of that and then-- so there are some arguments. So you can always have like one level off. But normally it's like the argument has been, like, two to three. And we look at this. This is even, like, one to two in our game, only investigating one to two steps of recursion. Yes?
AUDIENCE: [INAUDIBLE] tell people about the recursion and then [INAUDIBLE]. I don't know if you do. Do they add more levels of recursion to their thinking?
JUN ZHANG: Ah, so in other words, whether they learn to be kind of recursive. It's a good question. And I have been thinking about just using these as, like, training games. I've been thinking about using these as training for recursion. So there's one issue that we need to resolve first in terms of kind of why people, say, startup with, like, the myopic model of the opponent.
Or maybe this is kind of an economy of effort thing. They don't work hard enough, and they gradually realize. They adjust and so forth. Or it could also be that this is a rather abstract kind of a notion, and the payoff numbers are giving out very abstract [INAUDIBLE]. And what happens if we want to give-- say we have a very concrete kind of reasoning paradigm, just like in the [? recent ?] [? kind of ?] selection task.
So there's a difference between running an abstract-reasoning game versus a concrete reasoning game, right? So we have studied running subjects by actually giving them stories, a cover story for three-step reasoning. Say, like, a typical example would be an application game. So you apply for a college. You can decide whether to apply to a college. The college can decide whether to accept or not accept, and then you can decide whether to go or not go to a college.
So this is a very typical kind of a three-stage game. The applicant has control of the first and last step, and the university and the college has the control of the second step. So you can give a variety of payoff structures in terms of the desirability of all possible outcomes, the desirability in a sense of whether, say, a university would like more students to apply but reject them so that their rejection ratio can be higher.
Or the university can have some preference for a person really they didn't want. They didn't want the person to come. And then for the applicant, they can have relative rankings of the outcome based on maybe their preference of the various outcomes. So you apply. You reject or you get accepted. So we give these scenarios and then have people reason on these scenarios and want to see any difference than the abstract reasoning task.
We don't have the result yet, but this is the direction that we are testing. But I think the question about using those to train recursive reasoning, this is a very interesting question. We hope this set of games can be useful towards that goal.
AUDIENCE: In the history of theory of mind, there have been [INAUDIBLE] the average onset of a working theory of mind falls in the development of a child between five and seven years of age. But if a child is taught to play games at earlier ages, does it or does it not accelerate the development of a working theory of mind? Do you have an opinion on that?
JUN ZHANG: Yeah, so the typical developmental literature, the time that they pin down is between, like, three and four. But of course, this is, say, as evidenced by, like, false-belief tasks and so forth. It does not involve a recursion [INAUDIBLE]. So I'm not aware of the development literature about the recursion or what age.
This would be an age where actually training would be possible because they can understand the structure. They can understand instructions. So, yes, one thing to try out is to have them reason through these games and whether that would help. And in fact, there are groups. I think they are applying this to children.
I mentioned earlier, our paradigm has been adopted by one group in the Netherlands which are being used upon children and training. Dr. Verbrugge's group in the Netherlands, they are devising a concrete game of this sort, a three-step game which they are actually running on children. And there are some other groups, but the other groups are running among adults. I think the first group is running on children. Yes?
AUDIENCE: Did you find in the data any evidence of social preferences? So for example, like, a person might-- even though you told them to only look at their own payoffs, they might prefer where both the agents get three rather than passing it so that they can get four. But the other person will only end up with one. Does that explain any of the myopic behavior?
So this is a good question about, say, whether our subjects are playing as we instructed, in terms of playing as a non-cooperative. And that's number one. The second is that because there is prolonged interaction, there is always this possibility of signaling. So in other words, they may play the first few games in a certain way to signal the other person that they are playing this way so the other person can react in kind.
So that's kind of a, like, possibility. So we checked a few of the heuristics about the playing of non-cooperative [INAUDIBLE]-- for instance, if there is a higher number in that game in the ending cell, in D, whether you should go based on heuristic because everybody wants to go there. So we check off some of this heuristic, but we hadn't systematically kind of checked the answers based on the second part of it.
But we did, again, use these catch trials, catch games, just to make sure that they are not doing anything like that because if they are doing anything like that, we may catch them. We may spot them in the catch trials. And then the subject would get excluded. In the exit questionnaire, we asked them about strategy and so forth. They didn't mention that they were signalling the opponent.
We had one manipulation of the apparent intelligence of a confederate. So in that manipulation, our confederate comes in. And there are two conditions. One is intelligent. One is, like, a not as intelligent one. So the intelligence confederate comes in, like, a minute late and apologizes to the subject. I'm late because I'm just tutoring math students. And the session runs long, and the person carries a mathematics kind of textbook and so forth.
And then when sitting down, interacting with the subjects, and says, OK, I'm a member of the chess club in honors college and so forth [INAUDIBLE]. And you get a condition where, say, the person just carries like a supermarket tabloid, apologizing that the calculus tutoring session is running too long and finds calculus very difficult.
And then if asked what the person wants to do, you know, I just want to hang around, not declared any major. So we run these. We run the manipulation, and then we ask for ratings for intelligence, friendliness, and so forth. It turns out, the manipulation did work in affecting the people's rating of intelligence. I'm surprised by actually the outcome of this simple manipulation and that when we asked them to rate the other person that they did show that.
But it didn't affect depths of reason at all. So there's no effect on the recursive depths for this manipulation in either direction. So there's no effect on that.
PRESENTER: All right, there are refreshments outside, and you can continue the discussion outside. Thank you.
JUN ZHANG: OK, thank you very much. Thank you. Thank you.