Updated 8/29/05.
The
discussion of the Prisoner's Dilemma is adapted from Poundstone.
Some of the examples presented on this page were adapted from
Casti.
The discussion of the stag hunt was adapted from Syrms.
Recall that equilibrium (sometimes called Nash equilibrium) in a two-person game means that neither player has an incentive to unilaterally defect. The minimax theorem states that for two-person, zero-sum games there exists a mixed strategy that is an equilbrium; a corollary to this theorem states that such an equilibrium solution is also a minimax solution. How does this extend to games with more than two players? What happens if the game is no longer zero sum? What happens if the game is played multiple times? In this section, we'll discuss Nash's theorem which states that for every N-person, non-cooperative game there exists an equilibrium solution. The catch is that there can be more than one equilibrium solution, and that the equilibrium solution may differ from some of the other possible solution concepts. This leads to some interesting games.
Let's begin by reviewing the Prisoner's Dilemma game.
In the (BYU-ized) PD problem, two individuals are
brought to
the honor code office for a suspected violation (they allegedly threw
tortillas
at a BYU football game). The two individuals are placed in
separate rooms
and are interrogated. Each individual can either tell the police
that
they threw tortillas or they can deny any involvement. For
historical
reasons, we identify their choices as Cooperate or Defect,
meaning that they cooperate with their fellow tortilla thrower and keep
silent,
or they defect to the police by telling what they've done. (When
we
discuss this in class, please be aware that I will probably say that
the
individual confesses. Please be aware that confession, which
starts with
C, is tanatmount to defection and not to cooperation. Please do
not think
that the "C" terminology stands for confession.) As the
students think through the consequences that they might experience,
they
identify the following:
|
Prisoner's Dilemma Consequences |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
Both get minimal fines |
P1 gets suspended, P2 gets wrist slapped |
|
Defect |
P2 gets suspended, P1 gets wrist slapped |
Both get medium fines |
In the table, the rows correspond to P2's (Prisoner 2's) possible choices, and the columns correspond to P1's choices. The entries in the table indicate the consequences that each prisoner receives. For example, the entry in the middle row, right column says that if P2 cooperates (by saying nothing) and if P1 defects (by telling the police) then P2 gets suspended from school and P1 gets a ceremonial slap on the wrist.
We could construct a utility for each of these choices, but instead
we'll
just rank order them with 4 being the most preferred consequence and 1
being
the least preferred. The ordered pair (x,y) represents
the
preference (x) for P1 and (y) for P2. Thus, the
payoff
matrix is given by
|
Prisoner's Dilemma Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(3,3) |
(1,4) |
|
Defect |
(4,1) |
(2,2) |
For example, the entry in the middle row, right column says that if P1 cooperates and P2 defects then P2 receives its most preferred consequence and P1 receives its least preferred consequence.
We now review how choices should be made given this payoff matrix
using a
maximin solution.
Let's take P2's perspective for a minute. As
I look at
my choices to cooperate or defect, I can evaluate the worst case
scenario.
|
Prisoner's Dilemma: P2's worst case payoff |
||
|
P1/P2 |
Cooperate |
Defect |
|
Worst
Case |
1 |
2 |
Thus, the worst thing that can happen when I cooperate is that I
receive my
least preferred consequence (getting suspended). The worst thing
that can
happen when I defect is that I my second to least preferred consequence
(a
medium fine). Being a fairly conservative prisoner, I decide that
the
thing I should try to do is try to maximize the worst case
payoff. T(This
worst case payoff is often called the security level of the
choice.) his
means that I should choose to defect. Choosing the action that
maximizes
my worst case payoff (or, equivalently, choosing the action that
minimizes my
worst case cost) is known as the maximin (minimax) solution. For
the PD
problem, when I use this strategy the maximin choice is defection.
But there is an alternative to using maximin. Rather than considering the worst case individual scenario, I can instead look at the situation from the other agent's perspective. Pretend that you are P1, and let's do a thought experiment. Pretend that P2 is in the room with you and that you two are deciding which action to take. As you look at the consequences, you observe that when both of you cooperate then you are both better off than if you both defect (which results when you both use the maximin strategy). P2 then leaves the room, and you get to thinking. You observe that if P2 cooperates then there is an incentive to defect; defecting increases your payoff from 3 to 4. While mulling this temptation, you are faced with a frightening realization: if there is an incentive for you to defect then there might also be an incentive for P2 to defect. You reevaluate the cooperative solution and notice that by defecting, P2 increases its payoff from 3 to 4 and your payoff drops from 3 to 1. Bummer!
Given that you suspect P2 might defect, you start to decide if you should risk playing the cooperative solution. You notice that if P2 defects then you can improve your payoff from 1 to 2 by defecting too. Thus, if P2 cooperates you have an incentive to defect, and if P2 defects you have an incentive to defect. It appears that the cooperative solution you reached with P2 when you were together in the same room is too big of a risk, and you decide to defect.
Feeling a little bit guilty for being disloyal, you start thinking about what would happen if you both defected. You notice that if you two had both agreed to defect then nothing that P2 could do would increase P2's payoff. In fact, just the opposite would happen. If you had agreed to defect and then P2 changed its mind, then P2's payoff would decrease from 2 to 1. There is no incentive for P2 to change. Similarly, the "both defect" joint solution offers you no good reason to change your mind either since switching from defection to cooperation makes your payoff drop. The "both defect" joint solution is in a state of equilibrium because neither player can increase its payoff by changing its mind.
Nash's theorem states the following (taken from Casti): Any
N-person,
noncooperative game (zero-sum or nonzero-sum) for which each player has
a
finite number of pure strategies has at least one equilibrium set of
(mixed)
strategies. Fortunately, the proof for the minimax theorem
extends
trivially to such problems; Unfortunately, the corollary which states
that the
equilibrium solution is also a minimax solution is not true.
The prisoner's dilemma is probably the most studied game ever. This is partly because the dilemma is engaging --- it makes people feel uncomfortable to know that the rational thing to do leads to worse results that playing the Pareto optimal solution. It is also because it models a lot of interesting situations. In this section, we will discuss two such situations.
The Bomb.
In Poundstone's book, "Prisoner's Dilemma: John von Neumann, Game
Theory,
and the Puzzle of the Bomb" he describes how the prisoner's dilemma
could
be used to model the cold war. (Incidentally, this book has the
best
reference of the origins of the game that I have found. I
recommend this
book highly.) Once both the Soviet Union and the United States
had the
atomic bomb, and before the proliferation of hydrogen bombs that could
be
delivered by missiles, there was a large national debate about whether
the
United States should pre-emptively bomb the Soviet Union. The
choices
available to the United States were to preemptively bomb or not.
The same
choices were available to the Soviet Union. In some of the
debates, if
the United States bombed the Soviet Union without the Soviet Union
responding,
then the US would receive their most preferred outcome (the destruction
of the
USSR) and the Soviets their least preferred. If the bombing roles
were
reversed, then the US received their least preferred option and the
Soviets
received their most preferred. If both bombed, both received
their
second-least preferred outcome (a weakened USA and USSR, but no regret
about
having been completely destroyed). If neither bombed, then both
received
their second-most preferred option (the status quo where neither
country was
destroyed). This payoff matrix is precisely the prisoner's
dilemman
payoff matrix. According to game theory, the rational thing to do
is to
bomb.
Self Control.
In a book by Howard Rachlin, "The Science of Self Control" he
presents the problem of self control as a prisoner's dilemma. In
this
dilemma, the two agents are actually the same person at two different
points in
time. For concreteness, suppose that player 1 is defined as the
person at
time 1, and player 2 is the person some time later. The person is
given a
choice at time 1 that will impact the set of choices available to him
or her at
time 2. After the player makes a choice at time 1, they receive a
reward,
but they also determine what choices will be available to them at time
2.
The problem of self control is that people will often opt for a
near-term
payoff (a payoff at time 1) that is high, at the expense of a farther
off
payoff (a payoff at time 2). A complete discussion of this
problem is
beyond the scope of this class (and requires and understanding of
repeated play
games), but it is interesting that self control can be modelled using a
prisoner's
dilemma-like game.
Poundstone presents a multi-agent extension of the prisoner's dilemma that he refers to as "Free Rider." I quote him [page 126]:
The [free rider] name to the
dilemma confronting public transit riders. It's late at night,
and
there's no one in the subway station. Why not just hop over the
turnstiles and save yourself the fare? But remember, if everyone
hopped the turnstiles, the subway system would go broke, and no one
would be
able to get anywhere.
It is the easiest thing in the world to
rationalize
hopping the turnstiles. What's the chance that your lost fare
will
bankrupt the subway system? Virtually zero. The trains run
whether
the cars are empty or full. In no way does an extra passenger
increase
the system's operaing expenses. Etc., etc., etc. --- but if
everygody
thingks this way ...
The choices in this dilemma are pay and ride
free.
The consequences include spending money and losing public
transit. Can
you construct a payoff matrix?
A student of mine constructed a prisoner's dilemma-like game with multiple agents and more than two actions. (He started working on this problem in CS670, and eventually completed a master's thesis and published two papers in top tier conferences on the topic.) This game is a generalization of the free-rider's dilemma as well as the tragedy of the commons. If you'd like to know more about this work, click here to access his paper published in ICML 2003.
For this problem, the maximin solution and the
Nash
equilibrium coincide. Do they always? Unfortunately, for
N-person,
non-zero-sum games the answer is no. Consider the famous game of Battle
of the Sexes. In the game, a husband and wife must
independently
decide on a date activity. The husband would prefer one form of
entertainment, say fishing, and the wife would prefer another form of
entertainment, say shopping for clothes. Although both have their
most
preferred activity, both prefer being together to being alone.
|
Battle of the Sexes Consequences |
||
|
husband/wife |
Shopping |
Fishing |
|
Fishing |
Though the husband gets to go fishing and the wife shopping, they are apart; both say "Ehh." |
The husband gets to spend time with his wife and go fishing. The wife gets to spend time with her husband. |
|
Shopping |
The wife gets to spend time with her husband and go shopping. The husband gets to spend time with his wife. |
The husband goes shopping and the wife goes fishing; both say "Yuck!" |
By deciding to go fishing, the wife is defecting to her husband, and
by
deciding to go shopping, the husband is defecting to his wife.
Using the
cooperate/defect terminology and using 4 as the most preferred choice
and 1 as
the least preferred, the payoffs for this game are as follows:
|
Battle of the Sexes Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(2,2) |
(4,3) |
|
Defect |
(3,4) |
(1,1) |
For this game, the maximin solutions for both the husband (P1) and the wife (P2) are to cooperate. The bad news is that neither of the partners is particularly content with this solution; if either defectted then both would be better off. In fact, either consequence that results when one cooperates and the other defects dominates (in the Pareto-optimal sense) the maximin solution, and both of these consequences are in equilibrium (since neither player benefits by unilaterally changing his/her mind). Unfortunately, with no way to communicate the players are left with making independent choices to try and reach an equilbrium. Additionally, using a mixed strategy does not help their chances. In fact, the expected payoffs for two independent mixed strategies are pretty bad; if both flip an unbiased coin that they use to make their choice for them then they both receive an average (expected) payoff of only 2.5 --- not much better than the minimax value. For your information, the set of possible payoffs for all possible combinations of mixed strategies are illustrated below.

Let's look at two more games and try to figure out what happens when we apply maximin and equilibrium concepts.
In the game of chicken, two idiots get in their
cars and
drive straight toward each other. Each has the option of swerving
to
avoid a colliion and being called a "chicken" (Cooperate) or
continuing to drive toward the other vehicle (Defect). Neither
player
wants> to die, so the least preferred option is for both to
defect. If
both players chicken out, then both are embarassed, but neither dies
and
neither wins. The most preferred option for me is to continue to
drive
and have you chicken out. Chickening out is worse than both of us
chickening out but better than both of us dying. The payoff for
this game
is as follows:
|
Chicken Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(3,3) |
(2,4) |
|
Defect |
(4,2) |
(1,1) |
What are the equilibrium pairs for this game? Just like the battle of the sexes, there are two equilibrium strategies resulting in the payoffs (4,2) or (2,4). Can you see why they are equilibrium? What is the maximin solution? It is for both to cooperate. Are there any Pareto optimal strategies?
A multi-agent variant of Chicken is discussed by Poundstone. He calls the game the Volunteer's Dilemma. I quote [page 201-202]:
You're sitting at home one night
and the lights go out. You look outside and see that all the
lights in
the neighborhood are out. The electric company will send someone
out to
fix th problem --- provided, that is, someone calls to tell them about
it. Should you call? Naw --- let someone else do it.
In a volunteer's dilemma, someone has to take a
chore
that will benefit everyone. It doesn't matter who does it --- but
everyone's
in trouble if no one does it.
In two-person chicken, it is desirable that one
player
"volunteer" to swerve for the common good. When on one
volunteers, both get stung. When both volunteer, each player
kicks
himself for not driving straight.
Poundstone goes on to discuss an experiment conducted by a magazine that created a volunteer's dilemma. I recommend reading this, because it is really interesting.
In the game of leader, two drivers want to merge
into
traffic from opposite sides of an intersection. The problem is
that if
they both wait (cooperate) until the other driver goes first then they
are both
delayed because nobody makes progress. However, if they
both pull
into the intersection (defect) at the same time then they may
collide.
If the leader pulls out, then the follower may still be able to
merge
with minimal delay. The payoffs are
|
Leader Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(2,2) |
(3,4) |
|
Defect |
(4,3) |
(1,1) |
What is the minimax solution? What strategies are in equilibrium? What strategies are nondominated?
The game of deadlock is a social dilemma. It
has the
following payoffs:
|
Deadlock
Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(2,2) |
(1,4) |
|
Defect |
(4,1) |
(3,3) |
Why do you think this game is considered a social dilemma? What is the Nash equilibrium? What is the minimax solution? Which actions are Pareto Optimal?
The final game is stag hunt, and it is also a
social dilemma.
It has the following payoffs:
|
Stag
Hunt Payoffs |
||
|
P1/P2 |
Cooperate |
Defect |
|
Cooperate |
(4,4) |
(1,3) |
|
Defect |
(3,1) |
(2,2) |
Why do you think this game is considered a social
dilemma? What are the Nash
equilibria? What is the minimax
solution? Which actions are Pareto
Optimal?
Brian Skyrms wrote a really interesting book about the Stag Hunt game The
Stag Hunt and the Evolution of Social Structure
Cambridge University Press, 2004. In the book, he argues that the
stag hunt is a better type for social conflict that the prisoner's
dilemma. The key to his argument is to notice that both (4,4) and
(2,2) are Nash equilibria to the game, but that the (2,2) solution is
also produced when both players play a maximin strategy. Moreover,
(4,4) is the unique Pareto optimal solution to the game.
There is a tension between the "Pareto pull" toward (4,4) and the
"maximin pull" toward (2,2). Clearly, the (4,4) solution is
better for both players than the (2,2) solution, but the (2,2) solution
exposes the players to less risk. I strongly recommend this book.
Shouldn't chicken, leader, and battle of the sexes
be easier
if we could communicate before we played the game. By communication,
I mean some form of exchange that influences us toward an equilibrium
or
cooperative solution rather than toward a substandard maximin
solution.
This communication can take many forms including side payments,
threats, etc.,
and we will need to be careful to prevent them from changing the
game. We'll
discuss these issues in another lecture.