Mixed Strategies and Minimax

Updated 9/13/04.

Two Person, Zero Sum Games

Recall that, although agents share consequences, they may have different preferences because they will frequently have different goals. These different preferences translate into different utilities. In previous notes, we denoted these utilities as u1(a,b) and u2(a,b), where we have dropped the dependence on states. Sometimes, two agents have goals that directly oppose each other. Under such conditions, we can assume that the two utilities are precisely the opposite, hence u1(a,b) = - u2(a,b). We call these games zero sum because u1(a,b) + u2(a,b) = 0 for all choices a for player 1 and all choices b for player 2.

A Case Study: The Battle of the Bismark Sea

Let's look at an example that can be found in both Luce and Raiffa and in Casti. For a map of the relevant geography, go here. During World War II, the northern half of the island of New Guinea was controlled by the Japanese, while the Allies controlled the southern half. Intelligence reports indicated that the Japanese were assembling a troop and supply convoy that would move from the port of Rabaul, which lies on the eastern part of the island of New Britain, to Lae which lies on the western part of New Guinea. The convoy could take one of two different routes: (1) north of New Britain, where rain and bad visibility were predicted, or (2) south, where the weather was expected to be fair. It was estimated that the trip would take 3 days on either route.

The allied forces were commanded by General George C. Kenney. His objective was to do as much damage as possible to the Japanese convoy, but he had to find them first. He could either start searching north of New Britain or south.   Assume that it will take one day to either find the convoy or determine that they took another route under stormy conditions, and assume that it will take almost no time at all to find the convoy under sunny conditions ,but one day to determine if the convoy took another route under sunny conditions.  With these assumptions, we can construct the following payoff matrix.  Note that only the payoffs, presented as number of days of bombing, for the allied forces are shown.  Since the game is zero sum,  the payoff for the Japanese forces is simply the opposite of the allied payoff.
 

Payoff Matrix for the Battle of the Bismark Sea
Allies/Japanese
Sail North
Sail South
Search North
2 days
2 days
Search South
1 day
3 days

This payoff matrix indicates that if the Japanese sail north and the allies search south, then the allies will spend one day determining that the Japanese did not sail south and then spend another day finding the Japanese ships to the north.  This results in only one day of bombing.  Similar reasoning applies to the other entries.

So what do you do?  The maximin solution for the allied forces is to search north, and the minimax solution for the Japanese forces is to sail north.  Additionally, the "sail and search north" solution is a Nash equilibrium since there is no incentive for either combatant to change its strategy from this strategy.  Since every other option in the payoff matrix comes with an incentive to change strategies, there is no other equilibrium solution.  For this problem, the minimax solution and the Nash equilibrium coincide.  When this occurs, the resulting payoff is referred to as the value of the game.  In our example, 2 days of bombing is the game's value.

You might wonder why this payoff is called the value of the game?  The answer to this lies in the fact that, for zero-sum games, the minimax value for the minimizing player and the maximin value for the maximizing player are the same.  Thus, finding the minimax value yields a unique number that represents the value of the game.  In the proof of the minimax theorem, you will encounter theorem 1 which states that if an equilibrium exists than the minimax value equals the maximin value.

For historical purposes, these choices were indeed the ones made by the forces; the convoy was sighted about one day after it sailed and the Japanese suffered severe losses.  Eventually (1944), the Japanese forces on the island were isolated and, though the force withered, the force continued guerilla warfare until the war ended in 1945.


A Case Study:  Fighters and Bombers

Does the minimax solution always correspond to the Nash equilibrium?  Is there always an unique Nash equilibrium?  The answer to both of these questions is sort of.  Let's take a look at another example, again drawn from World War II and described by Casti.  During air combat with much slower bomber aircraft, fighter airplanes normally adopted a strategy of swooping down on the bombers from the sun.  To combat this srategy, the gunners on the bombers should put on their sunglasses and stare into the sun looking for fighters.  To counter this strategy, the fighter pilots adopted a second strategy wherein they would attack the bomber by flying straight at it from below.  If they weren't spotted, the fighters succeeded, but if they were then they were invariably shot down.  We can create the payoff matrix for these strategies by using the survival probabilities for the fighter.
 
Payoff Matrix for the Fighter vs. Bomber
Fighter/Bomber
Look Up
Look Down
Sun Attack
0.95
1
Bottom Attack
1
0

For example, if the fighter attacked from the sun and the bomber crew was looking up then the fighter had a 95 percent chance of surviving.  If they attacked from below and the bomber crew was looking down then they had 0 percent chance of surviving.

Is there a Nash equilibrium?  Let's consider the options.  For (Sun,Up) the fighter payoff is 0.95 but the fighter has an incentive to attack from the bottom because the payoff increases to 1.  Thus, (Sun, Up) is not an equilibrium point.  For (Bottom, Up) the fighter payoff is 1 but the bomber crew has an incentive to look down because the fighter payoff decreases to 0.  Thus, (Bottom, Up) is not an equilibrium point.  It is left to you to think through (Sun, Down) and (Bottom, Down).  When you are done thinking, you should find out that there is no equilibrium solution.

What value results when the fighter employs maximin and the bomber employs minimax?  For the fighter, the maximin solution is to attack from the sun and the maximin value(in terms of cost) is 0.95.  For the bomber, either strategy is a minimax solution and both minimax values are 1 (corresponding to a maximin payoff of -1).

How can we rationally address this problem?  What should the bomber do and what should the fighter do?  In answering this question, we will need to employ the idea of a mixed strategy.


Mixed Strategies

A mixed strategy is one in which more than one action can be taken.  For example, if the bomber looks up 20 out of 21 times and looks down the other time, then what kind of worst case payoff can the bomber expect?  Let's take a look at some numbers.  Suppose, for the moment, that the fighter always uses the "Sun Attack" strategy.  Then the expected payoff for the bomber crew is 20/21(-0.95) + 1/21(-1) =-0.9524 which is higher than the pure strategy maximin payoff value of -1 .  If, by contrast, the fighter always uses the "Bottom Attack"strategy then the expected payoff for the bomber crew is 20/21(-1) + 1/21(0) =-0.9524 which is also better than the maximin payoff value .  By employing a mixed strategy, one in which one action is taken with a certain probability and another action is take with a certain probability, the bomber crew can actually improve their worst case expected payoff from -1 to -0.9524.  Incidentally, if the fighter employs the "Sun Attack" strategy 20 times out of 21 sorties and the "Bottom Attack" strategy 1 out of 21 sorties, then the worst case expected payoff for the fighter jumps from 0.95 to 0.9524.  What's going on here?  It turns out that when we use mixed strategies, we can always find a minimax strategy that is also a Nash equilibrium

The essence of a mixed strategy is that we perform one action with a certain probability and other actions with certain probabilities. Sometimes mixed strategies work better than the pure minimax strategies because the mixed strategies are better at minimizing (expected) risk.


The Minimax Theorem

The minimax theorem states that for every two-person, zero-sum game, there always exists a mixed strategy for each player such that the expected payoff for one player is the same as the expected cost for the other.  Furthermore, V is the best payoff each can expect to receive from the game.  More precisely, V represents the expected payoff to player 1 and the expected cost to player 2.   We could flip this around and say that -V is the expected cost to player 1 and the expected payoff to player 2.  I want you to understand the formal proof of this theorem, so we'll need to cover some formal terminology that will help make our discussion more precise.  When we are done with the theorem, we will have developed a rigorous proof based on the Brower Fixed Point Theorem.   We will describe what this theorem says intuitively, but we won't prove it.  Please note that the proof we will be covering is from Luce and Raiffa.

We begin with a formal statement of a two person, zero sum game. Please note that I am trying to stick closely to the terminology presented in previous lectures, but that it is notationally easier to denote P2's set of actions as B instead of A2.  Because I can't get HTML to display all of the symbols I need, please refer to the pdf version of the proof.


A Case Study: Computing Mixed Strategies

It's one thing to know that an equilibrium solution exists among the set of mixed strategy solutions, but it is another thing to find it.  Unlike the pure strategy case where we can blindly search through a finite space of possibilities, there are uncountably infinitely many mixed strategy solutions.  How can we find one?  There are many approaches to this problem, but we won't cover them because they are too general.  We will, however, cover how you can find a solution in a two player game when one player has only two actions (pure strategies) and the other has only three.  Consider the following game taken from Casti.
 
A Payoff Matrix for a Simple Game
P1/P2
b1
b2
b3
a1
0
5/6
1/2
a2
1
1/2
3/4

We'll adopt P1's perspective.  Suppose that P1's optimal mixed strategy is to play strategy a1 p percent of the time and play a2 1-p percent of the time.  Then, on the average, payoffs to P1 against the three strategies of P2 are

M(p,b1) = 0p + 1(1-p) = 1-p            against strategy b1
M(p,b2) = 5/6p+1/2(1-p) = 1/2 + 1/3p  against strategy b2
M(p,b3) = 1/2p + 3/4(1-p) = 3/4-1/4p  against strategy b3.
Let's plot these expected payoffs as a function of p.

Recall that P1 wants to maximize the minimum payoff that it will receive.  This minimum payoff corresponds to the lowest M as p changes.  For points to the left of about 0.4 the function M(p,b2) is minimum.  For points to the right of about 0.4 the function M(p,b1) is minimum.  The maximum of these minimal functions occurs at their intersection, that is at the point where

M(p,b1) = M(p,b2)
1-p = 1/2 + 1/3p
which occurs when p=3/8=.375.

What is the optimal mixed strategy for P2? Can you do this exercise for the Fighters and Bombers game?