Zero sum games are strictly competitive, and therefore lend
themselves
to the a solution concept known either as minimax or maximin. The essence
of a minimax solution methodology is that an agent tries to maximizes
the worst case payoff that can occur. Sometimes, rather than
expressing a game in terms of utilities or payoffs, the game is instead
expressed in terms of costs. Costs are just the negation of the
utility. When costs are used instead of utility, an agent doesn't
try to maximize worst case (since maximum cost is bad), but rather
tries to minimize worst case cost. These concepts are just
different
ways for saying the same thing, so I will use both the term minimax
and the term maximin to
represent
the philosophy of choosing the best among worst case alternatives.
In terms of motivation, it might be helpful to know that the best
computer chess, checkers, and backgammon players use a form of minimax
solution (supplemented with some other information) to make their
decisions.
Consider the following problem. I am agent 1 with
actions
aj for j=1,2,3 and you are agent
2 with actions bi for i=1,2,3. When
you choose action b1 and I choose action
a2
then you get a payoff of u2(a2,b1)=4
and I get a payoff of u1(a2,b1)=-4.
If the game is zero sum, we need only specify one of these values. For
this example, we'll just specify the payoff for me. We denote
this
payoff as
u1(ai,bj) where the u1( , )
denotes payoff. A payoff of u1(ai,bj)
corresponds to a loss of u2(ai,bj)=-u1(ai,bj)
for you.
|
|
|
|||
| b1 | b2 | b3 | ||
|
My choices |
a1 |
|
|
|
| a2 |
|
|
|
|
| a3 |
|
|
|
|
For each potential aj consider what choice of bi would give me the worst case payoff. These worst case choices are highlighted in the following table. For example, if I were going to take action a1 then the worst case is if you would take b1 because that would produce a loss of u1(a1,b1)=-5. Being a rational player, my best choice would be to minimize your best possible payoff, so I should choose action a3 because that action maximizes my worst case payoff. This worst case payoff occurs when I choose a3 and you choose b3 yielding u(a3,b3)=-3. The table below shows the worst case outcome for each possible choice that I can make.
|
|
Your choices |
|||
| b1 | b2 | b3 | ||
|
My choices |
a1 |
-5 |
-2 |
1 |
| a2 |
-4 |
2 |
2 |
|
| a3 |
-1 |
3 |
-3 |
|
My task is thus to choose an action that produces the maximum worst case gain. This action is given by the maximin solution
In the following table, I show the maximin value for me in red, the minimax value for you in vlue, and the result when I choose maximin and you choose minimax in purple. (Clearly, if I know that you are going to choose the maximin value then I have no incentive to change. However, if you know that I'm going to choose the minimax value then your best move would be to choose b3. How to handle situations like this are beyond the scope of this course.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In a depth first search, I would ask myself, "what if I took action aj at time t, my opponent took action bi at time t+1, I took action ak at time t+2, ..." until the game outcome was determined. Starting at the leaf node and assuming that numbers are presented as losses to me, I would then identify the worst case action my opponent would take during his or her next turn, identify that action that guarantees me the highest payoff over these choices, and so on. This method, though time consuming, would allow me to choose that action with highest worst case payoff after all turns are taken. The problem with this approach is that it must look at all leaf nodes in the search even if there is no hope of those leaf nodes producing acceptable solutions (i.e., no hope that a superior solution would be found by continuing to search).
The alpha-beta pruning algorithm is an alternative which explores a
branch only if there is a possibility
of producing superior alternatives. To illustrate, consider the
following game. In the game, there are two players, player 1 and
player 2. Each player
takes two turns in the game, and the payoff for player 1 (and the loss
for player 2) is shown in the leaf nodes of the graph.
In this game, player 1 goes first. Thus, the leaf nodes are the payoffs
that
result when player 1 chooses ai and player 2 chooses
bj
following which player 1 chooses ak
and player 2 chooses
bl, resulting in a payoff to player 1 of u1(ai,bj,ak
,bl,).

In the figure, the green triangles point up suggesting that they are maximizers (and thus represent player 1's choice points) and the red triangles point down suggesting that they are minimizers (and thus represent player 2's choice points). Each node is labelled with a number to help our discussion. Conventional depth first search starts by expanding down the left branch of the tree starting with node 30, node 28, etc., until it encounters the payoff of 5 at leaf node 0. The search continues by expanding the next child of red node number 16. This child returns a value of 1, and the red parent chooses the lower of the two solutions yielding a value of 1. The search returns to green node 24, and proceeds to expand red node 17. Both children are expanded, and red node 17 returns the minimizing value of 1. Green triangle 24 selects the value that maximizes the worst case presented from its two children, and returns the value of 1 to red node 28. Depth first search continues until all leaf nodes are visited, requiring 31 nodes to be expanded in all.
By contrast, alpha beta pruning only visits a node and its descendants if there is a possibility that the maximin or minimax values will be changed. Consider the following scenario. We have expanded all of the descendants of green node 24 and have found the minimax value returned from this node to be 1. We then return to red node 28, and begin expanding the descendants of green node 25. Red node 18 returns a minimum value of -1. Green node 25 then expands red node 19. When red node 19 looks at node 6 with a payoff of -5, green node 25 knows that the worst case associated with choosing action is at most -5 and possibly lower. Since green node 25 knows that the worst that can occur if it chooses a1 is -1, there is know need to expand to node 7. We say that node 7 is pruned; it is simply never visited, and we can save some time. In other words, expanding the second child of red node 19 is a waste of time since.
This problem exemplifies the essence of alpha beta pruning. By keeping a memory of the best that can be done by parent nodes, some branches of the tree can be neglected without risk of error. Stated in words, the essence of alpha-beta pruning is that maximizing nodes return just as soon as they learn that they will not affect the decision of its minimizing parent by continuing further search. Similarly, minimizing nodes return just as soon as they learn that they will not affect the decision of its maximizing parent.
Check out the Russell and Norvig's recursive algorithm for performing alpha-beta pruning. I had to spend some time reviewing how recursive functions work before I could understand these algorithms, but I've been assured that many of you have spent hours playing with recursive functions. If not, think about the stack, recall how parameters are placed on the stack and how these values change when the function returns, etc.
The following questions might help you assess your understanding: