Solution Concepts

Updated 9/8/04.


Prisoner's Dilemma

Recall that the primary difference between single agent and multiple agent decision making is that consequences depend on the actions of both agents.  Each agent identifies its preferences and identifies its corresponding utility function. The trick is to figure out how to make a decision if these utility functions are known to all agents. There are three qualitatively different approaches to making these decisions: minimax/maximin, Nash equilibria, and Pareto optimal. To discuss these three concepts, let's look at the celebrated Prisoner's Dilemma (PD) problem.   As we look at the problem, we will introduce a standard terminology.

In the (BYU-ized) PD problem, two individuals are brought to the honor code office for a suspected violation (they allegedly threw tortillas at a BYU football game).  The two individuals are placed in separate rooms and are interrogated.  Each individual can either tell the police that they threw tortillas or they can deny any involvement.  For historical reasons, we identify their choices as Cooperate or Defect, meaning that they cooperate with their fellow tortilla thrower and keep silent, or they defect to the police by telling what they've done.  (When we discuss this in class, please be aware that I will probably say that the individual confesses.  Please be aware that confession, which starts with C, is tantamount to defection and not to cooperation.  Please do not think that the "C" terminology stands for confession.)  As the students think through the consequences that they might experience, they identify the following:
 

Prisoner's Dilemma Consequences
P1/P2
Cooperate
Defect
Cooperate
Both get minimal fines P1 gets suspended, P2 gets wrist slapped
Defect
P2 gets suspended, P1 gets wrist slapped Both get large fines

In the table, the columns correspond to P2's (Prisoner 2's) possible choices, and the rows correspond to P1's choices.  The entries in the table indicate the consequences that each prisoner receives.  For example, the entry in the middle row, right column says that if P1 cooperates (by saying nothing) and if P2 defects (by telling the police) then P1 gets suspended from school and P2 gets a ceremonial slap on the wrist.

We could construct a utility for each of these choices, but instead we'll just rank order them with 4 being the most preferred consequence and 1 being the least preferred.  The ordered pair (x,y) represents the preference (x) for P1 and (y) for P2.  Thus, the payoff matrix is given by
 

Prisoner's Dilemma Payoffs
P1/P2
Cooperate
Defect
Cooperate
(3,3)
(1,4)
Defect
(4,1)
(2,2)

For example, the entry in the middle row, right column says that if P1 cooperates and P2 defects then P2 receives its most preferred consequence and P1 receives its least preferred consequence.  Throughout the remainder of this semester, I will try to follow the rule that player 1's choices will be listed on rows and player 2's choices will be listed in columns.  Furthermore, I will try to follow the rule that payoffs will be represented as (u1(a,b),u2(a,b)) where a is player 1's choice, b is player 2's choice, u1(a,b) is the payoff received by player 1 (hence the subscript 1) when P1 plays a and P2 plays b, and u2(a,b) is the payoff received by player 2 (hence the subscript 2) when P1 plays a and P2 plays b

We can now turn attention to how choices should be made given this payoff matrix.


Maximin

Let's take P1's perspective for a minute.  As I look at my choices to cooperate or defect, I can evaluate the worst case scenario.
 
Prisoner's Dilemma: P1's worst case payoff
P1/P2
Cooperate
Defect
Worst Case
1
2

Thus, the worst thing that can happen when I cooperate is that I receive my least preferred consequence (getting suspended).  The worst thing that can happen when I defect is that I receive my second to least preferred consequence (a large fine).  Being a fairly conservative prisoner, I decide that the thing I should try to do is try to maximize the worst case payoff.  (This worst case payoff is often called the security level of the choice.) This means that I should choose to defect.  Choosing the action that maximizes my worst case payoff (or, equivalently, choosing the action that minimizes my worst case cost) is known as the maximin (minimax) solution.  For the PD problem, when I use this strategy the maximin choice is defection.


Best Response

Suppose for a moment that I am P1 and that I know what P2 will choose.  Given that I know P2's choice, I can search through all of my choices and find the option that is the best response to her choice.  For the prisoner's dilemma, if P2 chooses to cooperate then the option that maximizes my payoff is to defect.  Similarly, if P2 chooses to defect then the option that maximizes my payoff is to defect.  Suppose that instead of knowing exactly which choices P2 was going to make, I know only that P2 will play cooperate, say, 40% of the time and defect 60% of the time.  I can find the expected utility for cooperation (.4*3+.6*1 = 1.8) and the expected utility for defection (.4*4+.6*2 = 2.8), which tells me that my best response to P2's strategy is to defect.

The notion of a best response is very useful to help understand other solution concepts from multi-agent choice.  For example, the maximin solution can be viewed as the best-response (i.e., the maximal payoff) when I believe that the other player will always be able to choose the thing that hurts me the most.


Nash Equilibrium

But there is an alternative to using maximin.  Rather than considering the worst case individual scenario, I can instead look at the situation from the other agent's perspective.  Pretend that you are P1, and let's do a thought experiment.  Pretend that P2 is in the room with you and that you two are deciding which action to take.  As you look at the consequences, you observe that when both of you cooperate then you are both better off than if you both defect (which results when you both use the maximin strategy).  P2 then leaves the room, and you get to thinking.  You observe that if P2 cooperates then there is an incentive to defect; defecting increases your payoff from 3 to 4.  While mulling this temptation, you are faced with a frightening realization: if there is an incentive for you to defect then there might also be an incentive for P2 to defect.  You reevaluate the cooperative solution and notice that by defecting, P2 increases its payoff from 3 to 4 and your payoff drops from 3 to 1.  Bummer!

Given that you suspect P2 might defect, you start to decide if you should risk playing the cooperative solution.  You notice that if P2 defects then you can improve your payoff from 1 to 2 by defecting too.  Thus, if P2 cooperates you have an incentive to defect, and if P2 defects you have an incentive to defect.  It appears that the cooperative solution you reached with P2 when you were together in the same room is too big of a risk, and you decide to defect.

Feeling a little bit guilty for being disloyal, you start thinking about what would happen if you both defected.  You notice that if you two had both agreed to defect then nothing that P2 could do would increase P2's payoff.  In fact, just the opposite would happen.  If you had agreed to defect and then P2 changed its mind, then P2's payoff would decrease from 2 to 1.  There is no incentive for P2 to change.  Similarly, the "both defect" joint solution offers you no good reason to change your mind either since switching from defection to cooperation makes your payoff drop.  The "both defect" joint solution is in a state of equilibrium because neither player can increase its payoff by changing its mind.

In terms of the notion of best response, a Nash equilibrium is a set of solutions where every player's choice is a best response to every other player's choice.  In other words, the equilibrium solution is made up of a set of individual choices that make up a joint action.

For this problem, the maximin solution and the Nash equilibrium coincide.  Do they always?  We'll settle this question when we talk about the minimax theorem.


Pareto Optimal

Both minimax and Nash equilibrium solutions are pretty pessimistic; they frame the problem as a competitive game between my interests and your interests.  This competition is not really healthy because they both produce a solution with both of us defecting, and both of us therefore receiving are next to least favorite outcome.  Clearly, the payoff matrix indicates that we'd both be better off if we'd just cooperate.

Pareto optimality (or Pareto efficiency) is a way to identify which options are acceptable when we cooperate.  (Some of you will recall how in CS470 we talked about the concept of non-domination in a multi-objective optimization setting.  Pareto optimality is the embodiment of non-domination in a multi-agent setting.)  The idea behind Pareto optimality is that some joint solutions should obviously be avoided because they are bad for everyone. We can illustrate as follows. For each joint choice, create a plot of the payoff that each agent receives. Let one axis correspond to the payoff received by P1, and let the other axis correspond to the payoff received by P2. For example, if both P1 and P2 choose to defect, then both P1 and P2 receive a payoff of 2. This is indicated by the black dot in the lower left of the figure.

Repeating this process for each joint choice, we finish creating the plot. As we examine this plot, we quickly observe that the "both cooperate" decision has a higher payoff for both agents (the choice (C,C) dominates the (D,D) choice). Thus, from a joint perspective, we should eliminate the decision for both of us to defect. In general, we should eliminate any joint action if an alternative joint action exists which has a higher payoff for both agents. For the PD problem, the (D,D) action is inferior to the (C,C) action so (D,D) is not Pareto optimal. Unfortunately, every other joint action is Pareto efficient. How do we handle this situation? There are many ways to address this issue, but for our purposes we'll settle the question by appealing to interaction protocols. This will be the main topic in Chapter 5.

There is not a good interpretation of best-response for the notion of Pareto optimal.  It is interesting to note that although Pareto optimality does not coincide with a best response solution from an individual perspective, the Pareto concept is closely tied to a set of solutions that maximizes the weighted sum of individual utilities.  A weighted sum of individual utilities is often refered to as a social welfare utility, so Pareto optimality is closely tied to social welfare.  Note, however, that the concept of a fair social welfare is not included --- just the maximum weighted sum whether or not the weightings are fair.


Strategic Dominance

There is one more solution concept in multi-agent games that needs to be addressed.  This is the concept of an individual action that stretegically dominates (sometimes we will just say dominates) another individual action.  Be careful to distinguish this notion from the notion of Pareto optimality; in the former, one of my actions is better than all of my other actions no matter what action you take, but in the latter, one of our joint actions (your action plus my action) is no worse off than some other possible joint action. 

Let me illustrate with a new example.
 
Payoffs for a Simple Game
P1/P2
C
D
C
(3,2)
(4,4)
D
(1,3)
(2,1)

Look at P1's choice to play C.  First, note that when P2 plays D then P1 receives a payoff of 4 units when it plays C and 2 units when it plays D.  Thus, against P2's choice of D, P1's choice should be C.  Now, note that when P2 plays C then P1 receives a payoff of 3 units when it plays C, and 1 unit when it plays D.  Thus, agains't P2's choice of C, P1's choice should be C.  We can therefore say that C dominates D for player 1.  To see if you understand this concept, ask yourself if there exists a dominant strategy for P2? (No. Why?)  How would you test for dominance if P1 had 100 choices and P2 had 2 choices?

In terms of the notion of best-response, a strategically dominant solution is a solution which is a best response for every possible choice made by the other players.

The prisoner's dilemma has a strategically dominant solution for each player.  What are they?


Satisficing Equilibrium

There are other, non-traditional solution concepts that are relevant for multi-agent games.  One of these solution concepts is the notion of satisficing equilibrium.  The word satisfice means to strive for something that is sufficient.  A satisficing equilibrium occurs when agents have arrived at choices such that the consequence produced by these choices yields utilities that all agents are satisfied with.  It is an equilibrium because a satisficing agent is content with what they have.  If all agents are content, no agent has an incentive to change its actions, which means that the solution is stable and therefore an equilibrium.

There are many different ways to formulate the notion of satisficing, but the easiest to understand was proposed by Herbert Simon.  (Incidentally, Simon was the person who originally introduced the notion of satisficing into decision theory and economics.)  In Simon's formulation, an agent maintains an aspiration level, which we will denote by a.  An agent is satisfied, or satisficed, if the utility produced by a joint choice meets or exceeds the aspiration level:

u1(a1,a2) >= ai
If this is true for all agents, then a satisficing equilibrium has been obtained.

In the Prisoner's Dilemma, any outcome can be satisficing depending on what aspiration levels are chosen.  Although allowing any outcome to be a valid solution seems nonsensical, it starts being useful if the aspiration level is chosen according to past experience.  Later in the semester, we will show that if aspirations start high and then decay as agents play the Prisoner's Dilemma game repeatedly, eventually agents will tend to converge to mutual cooperation as the satisficing equilibrium.