In the (BYU-ized) PD problem, two individuals are brought to the
honor
code office for a suspected violation (they allegedly threw tortillas
at
a BYU football game). The two individuals are placed in separate
rooms and are interrogated. Each individual can either tell the
police
that they threw tortillas or they can deny any involvement. For
historical
reasons, we identify their choices as Cooperate or Defect,
meaning that they cooperate with their fellow tortilla thrower and keep
silent, or they defect to the police by telling what they've
done.
(When we discuss this in class, please be aware that I will probably
say
that the individual confesses. Please be aware that confession,
which
starts with C, is tantamount to defection
and not to cooperation.
Please do not think that the "C" terminology stands for
confession.)
As the students think through the consequences that they might
experience,
they identify the following:
|
|
|
|
|
|
Both get minimal fines | P1 gets suspended, P2 gets wrist slapped |
|
|
P2 gets suspended, P1 gets wrist slapped | Both get large fines |
In the table, the columns correspond to P2's (Prisoner 2's) possible choices, and the rows correspond to P1's choices. The entries in the table indicate the consequences that each prisoner receives. For example, the entry in the middle row, right column says that if P1 cooperates (by saying nothing) and if P2 defects (by telling the police) then P1 gets suspended from school and P2 gets a ceremonial slap on the wrist.
We could construct a utility for each of these choices, but instead
we'll just rank order them with 4 being the most preferred consequence
and 1 being the least preferred. The ordered pair (x,y)
represents
the preference (x) for P1 and (y) for P2. Thus, the
payoff matrix is given by
|
|
|
|
|
|
|
|
|
|
|
|
For example, the entry in the middle row, right column says that if P1 cooperates and P2 defects then P2 receives its most preferred consequence and P1 receives its least preferred consequence. Throughout the remainder of this semester, I will try to follow the rule that player 1's choices will be listed on rows and player 2's choices will be listed in columns. Furthermore, I will try to follow the rule that payoffs will be represented as (u1(a,b),u2(a,b)) where a is player 1's choice, b is player 2's choice, u1(a,b) is the payoff received by player 1 (hence the subscript 1) when P1 plays a and P2 plays b, and u2(a,b) is the payoff received by player 2 (hence the subscript 2) when P1 plays a and P2 plays b
We can now turn attention to how choices should be made given this
payoff
matrix.
|
|
|
|
|
|
|
|
Thus, the worst thing that can happen when I cooperate is that I
receive
my least preferred consequence (getting suspended). The worst
thing
that can happen when I defect is that I receive my
second to least preferred consequence
(a large fine).
Being a fairly conservative prisoner, I decide that
the thing I should try to do is try to maximize the worst case
payoff.
(This worst case payoff is often called the security level of
the
choice.) This
means that I should choose to defect. Choosing the action
that maximizes my worst case payoff (or, equivalently, choosing the
action
that minimizes my worst case cost) is known as the maximin (minimax)
solution.
For the PD problem, when I use this strategy the maximin choice is
defection.
The notion of a best response is very useful to help understand
other
solution concepts from multi-agent choice. For example, the
maximin
solution can be viewed as the best-response (i.e., the maximal payoff)
when I believe that the other player will always be able to choose the
thing that hurts me the most.
Given that you suspect P2 might defect, you start to decide if you should risk playing the cooperative solution. You notice that if P2 defects then you can improve your payoff from 1 to 2 by defecting too. Thus, if P2 cooperates you have an incentive to defect, and if P2 defects you have an incentive to defect. It appears that the cooperative solution you reached with P2 when you were together in the same room is too big of a risk, and you decide to defect.
Feeling a little bit guilty for being disloyal, you start thinking about what would happen if you both defected. You notice that if you two had both agreed to defect then nothing that P2 could do would increase P2's payoff. In fact, just the opposite would happen. If you had agreed to defect and then P2 changed its mind, then P2's payoff would decrease from 2 to 1. There is no incentive for P2 to change. Similarly, the "both defect" joint solution offers you no good reason to change your mind either since switching from defection to cooperation makes your payoff drop. The "both defect" joint solution is in a state of equilibrium because neither player can increase its payoff by changing its mind.
In terms of the notion of best response, a Nash equilibrium is a set
of solutions where every player's choice is a best response to every
other
player's choice. In other words, the equilibrium solution is
made up of a set of individual choices that make up a joint action.
For this problem, the maximin solution and the Nash equilibrium
coincide.
Do they always? We'll settle this question when we talk about the
minimax theorem.
Pareto optimality (or Pareto efficiency) is a way to identify which
options are acceptable when we cooperate. (Some of you will
recall
how in CS470 we talked about the concept of non-domination in a
multi-objective
optimization setting. Pareto optimality is the embodiment of
non-domination
in a multi-agent setting.) The idea behind Pareto optimality is
that
some joint solutions should obviously be avoided because they are bad
for
everyone. We can illustrate as follows. For each joint choice, create a
plot of the payoff that each agent receives. Let one axis correspond to
the payoff received by P1, and let the other axis correspond to the
payoff
received by P2. For example, if both P1 and P2 choose to defect, then
both
P1 and P2 receive a payoff of 2. This is indicated by the black dot in
the lower left of the figure.

Repeating this process for each joint choice, we finish creating the
plot. As we examine this plot, we quickly observe that the "both
cooperate"
decision has a higher payoff for both agents (the choice (C,C)
dominates
the (D,D) choice). Thus, from a joint perspective, we should
eliminate
the decision for both of us to defect. In general, we should eliminate
any joint action if an alternative joint action exists which has a
higher
payoff for both agents. For the PD problem, the (D,D) action is
inferior to the (C,C) action so (D,D) is not Pareto
optimal.
Unfortunately, every other joint action is Pareto efficient. How do we
handle this situation? There are many ways to address this issue, but
for
our purposes we'll settle the question by appealing to interaction
protocols.
This will be the main topic in Chapter 5.
There is not a good interpretation of best-response for the notion
of
Pareto optimal. It is interesting to note that although
Pareto optimality does not coincide with a best response solution from
an individual perspective, the Pareto concept is closely tied to a set
of solutions that maximizes the weighted sum of individual
utilities. A weighted sum of individual utilities is often
refered to as a social welfare utility, so Pareto optimality is closely
tied to social welfare. Note, however, that the concept of a fair
social welfare is not included --- just the maximum weighted sum
whether or not the weightings are fair.
|
|
|
|
|
|
|
|
|
|
|
|
Look at P1's choice to play C. First, note that when P2 plays D then P1 receives a payoff of 4 units when it plays C and 2 units when it plays D. Thus, against P2's choice of D, P1's choice should be C. Now, note that when P2 plays C then P1 receives a payoff of 3 units when it plays C, and 1 unit when it plays D. Thus, agains't P2's choice of C, P1's choice should be C. We can therefore say that C dominates D for player 1. To see if you understand this concept, ask yourself if there exists a dominant strategy for P2? (No. Why?) How would you test for dominance if P1 had 100 choices and P2 had 2 choices?
In terms of the notion of best-response, a strategically dominant solution is a solution which is a best response for every possible choice made by the other players.
The prisoner's dilemma has a strategically dominant solution for
each
player. What are they?
There are many different ways to formulate the notion of satisficing, but the easiest to understand was proposed by Herbert Simon. (Incidentally, Simon was the person who originally introduced the notion of satisficing into decision theory and economics.) In Simon's formulation, an agent maintains an aspiration level, which we will denote by a. An agent is satisfied, or satisficed, if the utility produced by a joint choice meets or exceeds the aspiration level:
u1(a1,a2) >= aiIf this is true for all agents, then a satisficing equilibrium has been obtained.
In the Prisoner's Dilemma, any outcome can be satisficing depending on what aspiration levels are chosen. Although allowing any outcome to be a valid solution seems nonsensical, it starts being useful if the aspiration level is chosen according to past experience. Later in the semester, we will show that if aspirations start high and then decay as agents play the Prisoner's Dilemma game repeatedly, eventually agents will tend to converge to mutual cooperation as the satisficing equilibrium.