Non-Cooperative, N-Person Games

Updated 8/29/05.



The discussion of the Prisoner's Dilemma is adapted from Poundstone.
Some of the examples presented on this page were adapted from Casti. 
The discussion of the stag hunt was adapted from Syrms.

Equilibrium Revisited

Recall that equilibrium (sometimes called Nash equilibrium) in a two-person game means that neither player has an incentive to unilaterally defect.  The minimax theorem states that for two-person, zero-sum games there exists a mixed strategy that is an equilbrium; a corollary to this theorem states that such an equilibrium solution is also a minimax solution.  How does this extend to games with more than two players?  What happens if the game is no longer zero sum?  What happens if the game is played multiple times?  In this section, we'll discuss Nash's theorem which states that for every N-person, non-cooperative game there exists an equilibrium solution.  The catch is that there can be more than one equilibrium solution, and that the equilibrium solution may differ from some of the other possible solution concepts.  This leads to some interesting games.

Let's begin by reviewing the Prisoner's Dilemma game.


Prisoner's Dilemma

In the (BYU-ized) PD problem, two individuals are brought to the honor code office for a suspected violation (they allegedly threw tortillas at a BYU football game).  The two individuals are placed in separate rooms and are interrogated.  Each individual can either tell the police that they threw tortillas or they can deny any involvement.  For historical reasons, we identify their choices as Cooperate or Defect, meaning that they cooperate with their fellow tortilla thrower and keep silent, or they defect to the police by telling what they've done.  (When we discuss this in class, please be aware that I will probably say that the individual confesses.  Please be aware that confession, which starts with C, is tanatmount to defection and not to cooperation.  Please do not think that the "C" terminology stands for confession.)  As the students think through the consequences that they might experience, they identify the following:
 

Prisoner's Dilemma Consequences

P1/P2

Cooperate

Defect

Cooperate

Both get minimal fines

P1 gets suspended, P2 gets wrist slapped

Defect

P2 gets suspended, P1 gets wrist slapped

Both get medium fines

In the table, the rows correspond to P2's (Prisoner 2's) possible choices, and the columns correspond to P1's choices.  The entries in the table indicate the consequences that each prisoner receives.  For example, the entry in the middle row, right column says that if P2 cooperates (by saying nothing) and if P1 defects (by telling the police) then P2 gets suspended from school and P1 gets a ceremonial slap on the wrist.

We could construct a utility for each of these choices, but instead we'll just rank order them with 4 being the most preferred consequence and 1 being the least preferred.  The ordered pair (x,y) represents the preference (x) for P1 and (y) for P2.  Thus, the payoff matrix is given by
 

Prisoner's Dilemma Payoffs

P1/P2

Cooperate

Defect

Cooperate

(3,3)

(1,4)

Defect

(4,1)

(2,2)

For example, the entry in the middle row, right column says that if P1 cooperates and P2 defects then P2 receives its most preferred consequence and P1 receives its least preferred consequence.

We now review how choices should be made given this payoff matrix using a maximin solution.
 

Maximin

Let's take P2's perspective for a minute.  As I look at my choices to cooperate or defect, I can evaluate the worst case scenario.
 

Prisoner's Dilemma: P2's worst case payoff

P1/P2

Cooperate

Defect

Worst Case

1

2

Thus, the worst thing that can happen when I cooperate is that I receive my least preferred consequence (getting suspended).  The worst thing that can happen when I defect is that I my second to least preferred consequence (a medium fine).  Being a fairly conservative prisoner, I decide that the thing I should try to do is try to maximize the worst case payoff.  T(This worst case payoff is often called the security level of the choice.) his means that I should choose to defect.  Choosing the action that maximizes my worst case payoff (or, equivalently, choosing the action that minimizes my worst case cost) is known as the maximin (minimax) solution.  For the PD problem, when I use this strategy the maximin choice is defection.
 

Nash Equilibrium

But there is an alternative to using maximin.  Rather than considering the worst case individual scenario, I can instead look at the situation from the other agent's perspective.  Pretend that you are P1, and let's do a thought experiment.  Pretend that P2 is in the room with you and that you two are deciding which action to take.  As you look at the consequences, you observe that when both of you cooperate then you are both better off than if you both defect (which results when you both use the maximin strategy).  P2 then leaves the room, and you get to thinking.  You observe that if P2 cooperates then there is an incentive to defect; defecting increases your payoff from 3 to 4.  While mulling this temptation, you are faced with a frightening realization: if there is an incentive for you to defect then there might also be an incentive for P2 to defect.  You reevaluate the cooperative solution and notice that by defecting, P2 increases its payoff from 3 to 4 and your payoff drops from 3 to 1.  Bummer!

Given that you suspect P2 might defect, you start to decide if you should risk playing the cooperative solution.  You notice that if P2 defects then you can improve your payoff from 1 to 2 by defecting too.  Thus, if P2 cooperates you have an incentive to defect, and if P2 defects you have an incentive to defect.  It appears that the cooperative solution you reached with P2 when you were together in the same room is too big of a risk, and you decide to defect.

Feeling a little bit guilty for being disloyal, you start thinking about what would happen if you both defected.  You notice that if you two had both agreed to defect then nothing that P2 could do would increase P2's payoff.  In fact, just the opposite would happen.  If you had agreed to defect and then P2 changed its mind, then P2's payoff would decrease from 2 to 1.  There is no incentive for P2 to change.  Similarly, the "both defect" joint solution offers you no good reason to change your mind either since switching from defection to cooperation makes your payoff drop.  The "both defect" joint solution is in a state of equilibrium because neither player can increase its payoff by changing its mind.

Nash's theorem states the following (taken from Casti): Any N-person, noncooperative game (zero-sum or nonzero-sum) for which each player has a finite number of pure strategies has at least one equilibrium set of (mixed) strategies.  Fortunately, the proof for the minimax theorem extends trivially to such problems; Unfortunately, the corollary which states that the equilibrium solution is also a minimax solution is not true.
 

Using the Prisoner's Dilemma

The prisoner's dilemma is probably the most studied game ever.  This is partly because the dilemma is engaging --- it makes people feel uncomfortable to know that the rational thing to do leads to worse results that playing the Pareto optimal solution.  It is also because it models a lot of interesting situations. In this section, we will discuss two such situations.

The Bomb.
In Poundstone's book, "Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb" he describes how the prisoner's dilemma could be used to model the cold war.  (Incidentally, this book has the best reference of the origins of the game that I have found.  I recommend this book highly.)  Once both the Soviet Union and the United States had the atomic bomb, and before the proliferation of hydrogen bombs that could be delivered by missiles, there was a large national debate about whether the United States should pre-emptively bomb the Soviet Union.  The choices available to the United States were to preemptively bomb or not.  The same choices were available to the Soviet Union.  In some of the debates, if the United States bombed the Soviet Union without the Soviet Union responding, then the US would receive their most preferred outcome (the destruction of the USSR) and the Soviets their least preferred.  If the bombing roles were reversed, then the US received their least preferred option and the Soviets received their most preferred.  If both bombed, both received their second-least preferred outcome (a weakened USA and USSR, but no regret about having been completely destroyed).  If neither bombed, then both received their second-most preferred option (the status quo where neither country was destroyed).  This payoff matrix is precisely the prisoner's dilemman payoff matrix.  According to game theory, the rational thing to do is to bomb.

Self Control.
In a book by Howard Rachlin, "The Science of Self Control" he presents the problem of self control as a prisoner's dilemma.  In this dilemma, the two agents are actually the same person at two different points in time.  For concreteness, suppose that player 1 is defined as the person at time 1, and player 2 is the person some time later.  The person is given a choice at time 1 that will impact the set of choices available to him or her at time 2.  After the player makes a choice at time 1, they receive a reward, but they also determine what choices will be available to them at time 2.  The problem of self control is that people will often opt for a near-term payoff (a payoff at time 1) that is high, at the expense of a farther off payoff (a payoff at time 2).  A complete discussion of this problem is beyond the scope of this class (and requires and understanding of repeated play games), but it is interesting that self control can be modelled using a prisoner's dilemma-like game.
 

Free Rider

Poundstone presents a multi-agent extension of the prisoner's dilemma that he refers to as "Free Rider."  I quote him [page 126]:

    The [free rider] name to the dilemma confronting public transit riders.  It's late at night, and there's no one in the subway station.  Why not just hop over the turnstiles and save yourself the fare?  But remember, if everyone hopped the turnstiles, the subway system would go broke, and no one would be able to get anywhere.
    It is the easiest thing in the world to rationalize hopping the turnstiles.  What's the chance that your lost fare will bankrupt the subway system?  Virtually zero.  The trains run whether the cars are empty or full.  In no way does an extra passenger increase the system's operaing expenses.  Etc., etc., etc. --- but if everygody thingks this way ...

The choices in this dilemma are pay and ride free.  The consequences include spending money and losing public transit.  Can you construct a payoff matrix?
 

Multi-Agent Social Dilemma

A student of mine constructed a prisoner's dilemma-like game with multiple agents and more than two actions.  (He started working on this problem in CS670, and eventually completed a master's thesis and published two papers in top tier conferences on the topic.)   This game is a generalization of the free-rider's dilemma as well as the tragedy of the commons.   If you'd like to know more about this work, click here to access his paper published in ICML 2003.


Battle of the Sexes

For this problem, the maximin solution and the Nash equilibrium coincide.  Do they always?  Unfortunately, for N-person, non-zero-sum games the answer is no.  Consider the famous game of Battle of the Sexes.  In the game, a husband and wife must independently decide on a date activity.  The husband would prefer one form of entertainment, say fishing, and the wife would prefer another form of entertainment, say shopping for clothes.  Although both have their most preferred activity, both prefer being together to being alone.
 

Battle of the Sexes Consequences

husband/wife

Shopping

Fishing

Fishing

Though the husband gets to go fishing and the wife shopping, they are apart; both say "Ehh."

The husband gets to spend time with his wife and go fishing.  The wife gets to spend time with her husband.

Shopping

The wife gets to spend time with her husband and go shopping.  The husband gets to spend time with his wife.

The husband goes shopping and the wife goes fishing; both say "Yuck!"

By deciding to go fishing, the wife is defecting to her husband, and by deciding to go shopping, the husband is defecting to his wife.  Using the cooperate/defect terminology and using 4 as the most preferred choice and 1 as the least preferred, the payoffs for this game are as follows:
 

Battle of the Sexes Payoffs

P1/P2

Cooperate

Defect

Cooperate

(2,2)

(4,3)

Defect

(3,4)

(1,1)

Maximin vs. Equilibrium

For this game, the maximin solutions for both the husband (P1) and the wife (P2) are to cooperate.  The bad news is that neither of the partners is particularly content with this solution; if either defectted then both would be better off.  In fact, either consequence that results when one cooperates and the other defects dominates (in the Pareto-optimal sense) the maximin solution, and both of these consequences are in equilibrium (since neither player benefits by unilaterally changing his/her mind).  Unfortunately, with no way to communicate the players are left with making independent choices to try and reach an equilbrium.  Additionally, using a mixed strategy does not help their chances.  In fact, the expected payoffs for two independent mixed strategies are pretty bad; if both flip an unbiased coin that they use to make their choice for them then they both receive an average (expected) payoff of only 2.5 --- not much better than the minimax value.  For your information, the set of possible payoffs for all possible combinations of mixed strategies are illustrated below.


Four More Games

Let's look at two more games and try to figure out what happens when we apply maximin and equilibrium concepts.

Chicken

In the game of chicken, two idiots get in their cars and drive straight toward each other.  Each has the option of swerving to avoid a colliion and being called a "chicken" (Cooperate) or continuing to drive toward the other vehicle (Defect).  Neither player wants> to die, so the least preferred option is for both to defect.  If both players chicken out, then both are embarassed, but neither dies and neither wins.  The most preferred option for me is to continue to drive and have you chicken out.  Chickening out is worse than both of us chickening out but better than both of us dying.  The payoff for this game is as follows:

Chicken Payoffs

P1/P2

Cooperate

Defect

Cooperate

(3,3)

(2,4)

Defect

(4,2)

(1,1)

What are the equilibrium pairs for this game?  Just like the battle of the sexes, there are two equilibrium strategies resulting in the payoffs (4,2) or (2,4).  Can you see why they are equilibrium?  What is the maximin solution?  It is for both to cooperate.  Are there any Pareto optimal strategies?

A multi-agent variant of Chicken is discussed by Poundstone.  He calls the game the Volunteer's Dilemma.  I quote [page 201-202]:

    You're sitting at home one night and the lights go out.  You look outside and see that all the lights in the neighborhood are out.  The electric company will send someone out to fix th problem --- provided, that is, someone calls to tell them about it.  Should you call?  Naw --- let someone else do it.
    In a volunteer's dilemma, someone has to take a chore that will benefit everyone.  It doesn't matter who does it --- but everyone's in trouble if no one does it.
    In two-person chicken, it is desirable that one player "volunteer" to swerve for the common good.  When on one volunteers, both get stung.  When both volunteer, each player kicks himself for not driving straight.

Poundstone goes on to discuss an experiment conducted by a magazine that created a volunteer's dilemma.  I recommend reading this, because it is really interesting.

Leader

In the game of leader, two drivers want to merge into traffic from opposite sides of an intersection.  The problem is that if they both wait (cooperate) until the other driver goes first then they are both delayed because nobody makes progress.   However, if they both pull into the intersection (defect) at the same time then they may collide.  If  the leader pulls out, then the follower may still be able to merge with minimal delay.  The payoffs are

Leader Payoffs

P1/P2

Cooperate

Defect

Cooperate

(2,2)

(3,4)

Defect

(4,3)

(1,1)

What is the minimax solution?  What strategies are in equilibrium?  What strategies are nondominated?

Deadlock

The game of deadlock is a social dilemma.  It has the following payoffs:
 

Deadlock Payoffs

P1/P2

Cooperate

Defect

Cooperate

(2,2)

(1,4)

Defect

(4,1)

(3,3)

Why do you think this game is considered a social dilemma?  What is the Nash equilibrium?  What is the minimax solution?  Which actions are Pareto Optimal?

Stag Hunt

The final game is stag hunt, and it is also a social dilemma.  It has the following payoffs:
 

Stag Hunt Payoffs

P1/P2

Cooperate

Defect

Cooperate

(4,4)

(1,3)

Defect

(3,1)

(2,2)

Why do you think this game is considered a social dilemma?  What are the Nash equilibria?  What is the minimax solution?  Which actions are Pareto Optimal?


Brian Skyrms wrote a really interesting book about the Stag Hunt game The Stag Hunt and the Evolution of Social Structure Cambridge University Press, 2004.  In the book, he argues that the stag hunt is a better type for social conflict that the prisoner's dilemma.  The key to his argument is to notice that both (4,4) and (2,2) are Nash equilibria to the game, but that the (2,2) solution is also produced when both players play a maximin strategy. Moreover, (4,4) is the unique Pareto optimal solution to the game.


There is a tension between the "Pareto pull" toward (4,4) and the "maximin pull" toward (2,2).  Clearly, the (4,4) solution is better for both players than the (2,2) solution, but the (2,2) solution exposes the players to less risk.  I strongly recommend this book.

 


Oh, to be able to Communicate!

Shouldn't chicken, leader, and battle of the sexes be easier if we could communicate before we played the game.  By communication, I mean some form of exchange that influences us toward an equilibrium or cooperative solution rather than toward a substandard maximin solution.  This communication can take many forms including side payments, threats, etc., and we will need to be careful to prevent them from changing the game.  We'll discuss these issues in another lecture.