Minimax and Alpha-Beta Pruning

Last updated 10/27/06.  

Two Agent Games

I problems with two decision makers (called agents), consequences depend not only on the actions of one agent and the state of the environment, but also on the actions of another agent.  Both agent 1 and agent 2 share the consequences of their actions, but since agent 1 and agent 2 may have differing goals their preferences and utilities may also be different.  For such games, there is not a single solution concept that everyone thinks is best.  Rather, there are several different kinds of solution concepts.  In this tutorial, we will focus on one of the solution concepts, known as the minimax or maximin solution.  After discussing the concept we will discuss an "efficient" algorithm for finding such solutions in a wide class of games.  I put the word "efficient" in quotes because the worst case performance of this algorithm is no better than breadth first search.  However, its average case behavior is a lot better.

Let A1={a1,a2,...} denote the set of actions available to agent 1, and let A2={b1,b2,...} denote the set of actions available to agent 2.  Ignore the state of nature (assume that it is constant and known -- other treatments of nature are possible within game theory, but are beyond the scope of this class).  Based on their goals, each agent builds their own utility function.  Let u1 and u2 denote the utility functions for agent 1 and 2, respectively.  Thus, u1(a3,b2) is the utility that agent 1 receives when he or she chooses a3 and agent 2 chooses b2.  Similary, u2(a3,b2) denotes the utility to agent 2 for the same pair of choices.  The sets of choices and the pair utility functions define a game.

When u1(a,b)=-u2(a,b) for all a and for all b, the game is called a zero sum game because the sum of the two utility functions is always zero.

Zero sum games are strictly competitive, and therefore lend themselves to the a solution concept known either as minimax or maximin.  The essence of a minimax solution methodology is that an agent tries to maximizes the worst case payoff that can occur.  Sometimes, rather than expressing a game in terms of utilities or payoffs, the game is instead expressed in terms of costs.  Costs are just the negation of the utility.  When costs are used instead of utility, an agent doesn't try to maximize worst case (since maximum cost is bad), but rather tries to minimize worst case cost.  These concepts are just different ways for saying the same thing, so I will use both the term minimax and the term maximin to represent the philosophy of choosing the best among worst case alternatives.

In terms of motivation, it might be helpful to know that the best computer chess, checkers, and backgammon players use a form of minimax solution (supplemented with some other information) to make their decisions.



Maximin in a Simple Game

Consider the following problem.    I am agent 1 with actions aj  for j=1,2,3 and you are agent 2 with actions bi  for i=1,2,3.  When you choose action b1 and I choose action a2 then you get a payoff of u2(a2,b1)=4 and I get a payoff of u1(a2,b1)=-4.  If the game is zero sum, we need only specify one of these values. For this example, we'll just specify the payoff for me.  We denote this payoff as u1(ai,bj) where the u1( , )  denotes payoff. A payoff of u1(ai,bj) corresponds to a loss of u2(ai,bj)=-u1(ai,bj) for you.
 
Your choices
b1 b2 b3

My choices

a1
-5
-2
1
a2
-4
2
2
a3
-1
3
-3
A simple game.  Payoffs for me (and losses for you)  are listed in the middle cells.

For each potential  aj consider what choice of bi would give me the worst case payoff.  These worst case choices are highlighted in the following table.  For example, if I were going to take action a1 then the worst case is if you would take b1 because that would produce a loss of u1(a1,b1)=-5.  Being a rational player, my best choice would be to minimize your best possible payoff, so I should choose action a3 because that action maximizes my worst case payoff.  This worst case payoff occurs when I choose a3 and you choose b3 yielding u(a3,b3)=-3.  The table below shows the worst case outcome for each possible choice that I can make.

 

Your choices

b1 b2 b3

My choices

a1

-5

-2

1

a2

-4

2

2

a3

-1

3

-3

My task is thus to choose an action that produces the maximum worst case gain.  This action is given by the maximin solution

a* = arga maxi minj u1(ai,bj)
Since payoffs for me are costs to you, the smart thing for you to do would be to choose an action that produces the minimizes worst case loss.  This action is given by the minimax solution
b* = argb minj maxi u1(ai,bj)
Notice that the maximin and minimax values are symmetric; if I define the payoff relative to me instead of you then I should choose the maximin solution and you should choose the minimax solution.  Note the presence of two terms: maximin value and maximin solution.  The solution is the argument that maximizes the worst case payoff, and the value is the utility that results at that solution assuming that the opponent chooses the worst case.

In the following table, I show the maximin value for me in red, the minimax value for you in vlue, and the result when I choose maximin and you choose minimax in purple.  (Clearly, if I know that you are going to choose the maximin value then I have no incentive to change.  However, if you know that I'm going to choose the minimax value then your best move would be to choose b3.  How to handle situations like this are beyond the scope of this course.)
My choice \ Your choice
b1
b2
b3
a1
-5
-2
1
a2
-4
2
2
a3
-1
3
-3
A simple game.  The maximin value for me is in red, and the minimax value for you is in blue.  The payoff that results if you choose minimax and I choose maximin is in purple.


Efficient minimax: alpha-beta pruning

Minimax is a useful decision concept and it is straightforward to find the minimax solution and minimax values when a game is in a simple matrix form like the example above; simply do an exhaustive search through all possibilities.  The problem with finding a minimax solution for a turn-taking game such as chess, checkers, and etc. (these games are typically in extensive form) requires a little more effort.  The basic algorithm is for me to maximize over my first move over the minimum of your first move over the maximum over my second move over the minimum over your second move, and so on.  This is a depth first search.

In a depth first search, I would ask myself, "what if I took action aj at time t, my opponent took action bi at time t+1, I took action ak at time t+2, ..." until the game outcome was determined.  Starting at the leaf node and assuming that numbers are presented as losses to me, I would then identify the worst case action my opponent would take during his or her next turn, identify that action that guarantees me the highest payoff over these choices, and so on.  This method, though time consuming, would allow me to choose that action with highest worst case payoff after all turns are taken.  The problem with this approach is that it must look at all leaf nodes in the search even if there is no hope of those leaf nodes producing acceptable solutions (i.e., no hope that a superior solution would be found by continuing to search). 

The alpha-beta pruning algorithm is an alternative which explores a branch only if there is a possibility of producing superior alternatives.  To illustrate, consider the following game.  In the game, there are two players, player 1 and player 2. Each player takes two turns in the game, and the payoff for player 1 (and the loss for player 2) is shown in the leaf nodes of the graph.  In this game, player 1 goes first. Thus, the leaf nodes are the payoffs that result when player 1 chooses ai and player 2 chooses bj following which player 1 chooses ak and player 2 chooses bl, resulting in a payoff to player 1 of u1(ai,bj,ak ,bl,).

The method of alpha beta pruning is a more efficient depth-first search-based method for finding the maximin solution via an exhaustive search.

Image of a turn taking game with two agents.

In the figure, the green triangles point up suggesting that they are maximizers (and thus represent player 1's choice points) and the red triangles point down suggesting that they are minimizers (and thus represent player 2's choice points).  Each node is labelled with a number to help our discussion.  Conventional depth first search starts by expanding down the left branch of the tree starting with node 30, node 28, etc., until it encounters the payoff of 5 at leaf node 0.  The search continues by expanding the next child of red node number 16.  This child returns a value of 1, and the red parent chooses the lower of the two solutions yielding a value of 1.  The search returns to green node 24, and proceeds to expand red node 17.  Both children are expanded, and red node 17 returns the minimizing value of 1.  Green triangle 24 selects the value that maximizes the worst case presented from its two children, and returns the value of 1 to red node 28.  Depth first search continues until all leaf nodes are visited, requiring 31 nodes to be expanded in all.

By contrast, alpha beta pruning only visits a node and its descendants if there is a possibility that the maximin or minimax values will be changed.  Consider the following scenario.  We have expanded all of the descendants of green node 24 and have found the minimax value returned from this node to be 1.  We then return to red node 28, and begin expanding the descendants of green node 25.  Red node 18 returns a minimum value of -1.  Green node 25 then expands red node 19.  When red node 19 looks at node 6 with a payoff of -5, green node 25 knows that the worst case associated with choosing action is at most -5 and possibly lower.  Since green node 25 knows that the worst that can occur if it chooses a1 is -1, there is know need to expand to node 7.  We say that node 7 is pruned; it is simply never visited, and we can save some time.  In other words, expanding the second child  of red node 19 is a waste of time since.

This problem exemplifies the essence of alpha beta pruning.    By keeping a memory of the best that can be done by parent nodes, some branches of the tree can be neglected without risk of error.  Stated in words, the essence of alpha-beta pruning is that maximizing nodes return just as soon as they learn that they will not affect the decision of its minimizing parent by continuing further search.  Similarly, minimizing nodes return just as soon as they learn that they will not affect the decision of its maximizing parent.

Check out the Russell and Norvig's recursive algorithm for performing alpha-beta pruning.  I had to spend some time reviewing how recursive functions work before I could understand these algorithms, but I've been assured that many of you have spent hours playing with recursive functions.  If not, think about the stack, recall how parameters are placed on the stack and how these values change when the function returns, etc.

The following questions might help you assess your understanding:

  1. How would the shape of the tree need to change to find the maximin value if player 2 were to go first but otherwise the payoffs were equivalent to those in the table above?
  2. What needs to change about the implementation so that the minimax solution is returned instead of the minimax value?
  3. Can you use this algorithm on nonzero sum games?
  4. Can you use this algorithm on games with a random component?
  5. Can you use this game even if you can't get all the way to the end of play, by evaluating the strength of a position using heuristics?
  6. Does this algorithm produce the same solution and value as we would have discovered if we had turned the extensive form game into a normal form game and then done an exhaustive search?