Payoff Matrices

Updated Sept 12, 2006.  Changes are in bold red or in bold greenI used different colors if I thought it was confusing to do otherwise.

Utilities and Payoff Matrices

Now that we have a utility defined for the consequences of our decisions, I can make a choice. Let c(a1,a2) denote the consequence that results when I (agent 1) choose action a1 and you (agent 2) choose action a2. Suppose further that I have determined the utility for receiving this consequence, and denote this utility as u1[c(a1,a2)], which indicates the utility to agent 1 of the consequence c(a1,a2). Since using this consequence-based notation gets messy, we often abbreviate this as u1(a1,a2).  How can I make a choice with such a utility?

Before answering this question, it is useful to briefly introduce the notation for more general types of problems.  When the environment is complicated or depends on an element of chance, we may include a state of nature in our notation and write u1(a1,a2;s).   The complexity of the environment makes it necessary to explicitly keep track of the state or to account for multiple possible states.  We will find this most useful when the environment is non-deterministic or when the true state of the environment is not known to the players.

Returning to the question of how to make a choice given a utility, there has been considerable work done in this area --- most of it beginning in the 20th century.  There is a sound mathematical theory called Game Theory for dealing with multi-agent choice problems when every agent knows its utilities and the utilities of all other agents. Since I believe that any complete treatment of multi-agent systems should include a discussion of this theory, we will study it in this class. The critical assumptions of Game Theory are that (a) all agents have defined their preferences over the set of actions (using the method described above) and (b) each agent knows the other agents' utilities. Using the terminology, not only have u1(a1,a2) and u2(a1,a2) been determined, but we both know u1 and u2.

We can write this set of two utilities as a payoff matrix as follows.  Assign rows of the matrix to the actions of the first agent and the columns of the matrix to the actions of the second agent.  In each cell of the matrix, we can write the utility-pair for each agent as an ordered pair.  Thus, (u1(a1,a2), u2(a1,a2)) represents the ordered pair of utiltiies when agent 1 chooses a1 and agent 2 chooses a2.  Consider the following example.

Example.  Consider a two agent game.  Agent 1 can play one of two cards: Up or Down.  Agent 2 can also play two cards: Left or Right.    If agent 1 plays Up and agent 2 plays Left, then agent 2 must give agent 1 a penny.  Similarly, if agent 1 plays Down and agent 2 plays Right then agent 2 must give agent 1 a penny.  If any other pair is played, then agent 1 must pay agent 2 a penny.  The payoff matrix for this game is shown below.

   
Agent 2's plays
   
Left
Right
Agent 1's plays
Up
(1,-1)
(-1,1)
Down
(-1,1)
(1,-1)
In the matrix, the cell in the bottom right is the utility for player 1 and player 2 when player 1 goes right and player 2 goes down.   Thus (u1(Up,Left),u2(Up,Left))=(1,-1).

For the first part of this class, we will rely heavily on payoff matrices.  As you might expect, these matrices exist if there are two or more agents (though for more than two agents, the matrices have more than two dimensions). You may wonder if all games can be written as payoff matrices.  More specifically, you may wonder if games with turn-taking can be written as payoff matrices.  We address this in the next section.


Some Sleight of Hand: Normal and Extensive Forms

Before we talk more about game theory, we want to be able to make turn-taking games fit into a payoff matrix. When we look at either the framework diagram from Lecture 1 or at the payoff matrix in the example above, we see that consequences (and therefore utilities) depend only on my immediate choice.  In real life, the outcome of a game (or another interesting decision problem) usually depends on a sequence of my choices. For example, consider the following game tree with two agents, a green agent and a red agent.
What the turn taking game looks like in its simple form.
The game, which is a turn-taking game, starts with the red agent choosing either action a1 or a2. The green agent then chooses either b1 or b2, etc. The red agent then repeats its choice, and the green agent its choice. The numbers in the box represent the red agent's utility for the sequence of choices. For example, if the agents choose a1, b1, a1, b1, then the red agent gets five dollars. Note that we only see the utility of one agent in this game tree.  This is a shorthand notation that I can use whenever u1(a1,a2) = -u2(a1,a2). In the tree, we can just show the payoff to player 1, the read agent.  The payoff to player 2 is negative five dollars; in other words, player 2 pays player 1 five dollars. 

The tree structure of this game makes it clear that the outcome depends not on a single action chosen by me (the red agent) and a single action chosen by you (the green agent), but instead on a sequence of such choices. When we represent a game in this form, it is said to be in extensive form. I think that this terminology represents the idea that the game extends across time or turns.

When outcomes depend only on a single choice by me and a single choice by you, then the game is said to be in normal form. I think that this terminology represents the idea that the game is in a normative form, meaning a canonoical form. The payoff matrix expresses games in normal form, but the tree expresses games in extensive form.   It will probably be helpful to associate the phrase extensive form with the notion of a tree, and the phrase normal form with the notion of a payoff matrix.

We would like it if there were only one form to a game because this would allow us to develop only one theory.  To address this issue, game theorists frequently perform some sleight of hand.  Game theorists can reason about strategies, based on contigencies, rather than discrete actions.  In this context, a strategy is not an attitude like play aggressively or defend the goal, but rather a complete expression of what to do in every contigency.  As described by Poundstone,

[A strategy] is a complete description of a particular way to play a game, no matter what the other player(s) does and no matter how long the game lasts .  A strategy must prescribe actions so thoroughly that you never have to make a decision in following it.  W. Poundstone, Prisoner's Dilemma, Anchor Books, page 42.


To choose a strategy, we can start lumping possible actions together.  Instead of defining two actions that take place at different times during game play, what if I define an "action" as a sequence of two choices instead of as a single choice, and allow these choices to depend on the previous choices of the game. This abstract action is precisely a strategy. For the game tree we have been discussing, there are four actions for the red agent and four actions for the green agent.  Clearly, if you are the green agent then your second choice depends on which branch of the tree previous moves have taken us.  For example, if on my first move I played a1, on your first move you played b1, and on my second move I played a1, then you will want to chose action b2 because this means that you will only have to pay me one dollar instead of five dollars.  On the other hand, if on my first move I played a2, on your first move you played b1, and on my second move I played a1, then you will want to chose action b1 because this means that I will have to pay you one dollar instead of nothing.

The main lessons from the previous paragraph are that the payoff that we receive depends on the strategy that we adopt, and that a strategy is defined as a choice made in response to the previous choices made in the game.  Using these lessons, we can turn the game in extensive form into one in normal (or matrix) form by considering what payoffs each player will get as a function of the strategy that he or she adopts.

Let's see if we can turn the extensive form game above into a normal form game.  To simplify the discussion, let's just consider a portion of the game above.  Consider the lower left portion of the game.

Subgame

As before, this is an extensive form (i.e.  turn-taking game) with the red agent playing first followed by the green agent.  (For the sake of the discussion, let's call the red agent RED and the green agent GREEN.)  The payoffs shown in the squares are the payoff to RED and the costs (negative payoffs) to GREEN.  What are RED's possible strategies?  They are simply a1 or a2.  What are GREEN's possible strategies?  They are

Let's call these strategies ALWAYSB1, MATCH, OPPOSITE, and ALWAYSB2, respectively.  The payoff matrix corresponding to this extensive form game becomes



GREEN
ALWAYS1
MATCH
OPPOSITE
ALWAYS2
RED
a1
(5,-5)
(5,-5)
(1,-1)
(-1,1)
a2
(1,-1)
(2,-2)
(1,-1)
(2,-2)
Which strategy would you choose if you were GREEN?  I'd choose OPPOSITE.  More importantly, notice how we changed the turn-taking (extensive form) game into a matrix game.

Let's repeate this exercise for a larger subgame of our original extensive form game.  This subgame is shown below.
Bigger subgame

Let's start by considering the strategies available to RED.
Those are the same strategy ideas that were available to GREEN in the previous subgame.  The choices for GREEN are now bigger.  They are

We can write these in a more interpretible form as follows.
The payoff matrix becomes:

GREEN
(b1,ALWAYSB1)
(b2,ALWAYSB1) (b1,MATCH) (b2,MATCH) (b1,OPPOSITE) (b2,OPPOSITE) (b1,ALWAYSB2) (b2,ALWAYSB2)
RED
ALWAYSA1
(5,-5)
(0,0)
(5,-5)
(0,0)
(1,-1)
(-1,1)
(1,-1)
(-1,1)
MATCH
(5,-5)
(-5,5)
(5,-5)
(0,0)
(1,-1)
(-5,5)
(1,-1)
(0,0)
OPPOSITE
(1,-1)
(0,0)
(2,-2)
(0,0)
(1,-1)
(-1,1)
(2,-2)
(-1,1)
ALWAYSA2
(1,-1)
(-5,5)
(2,-2)
(0,0)
(1,-1)
(-5,5)
(2,-2)
(0,0)


In general, we can peform this sleight of hand on any game if we are patient enough. Clearly, this will grow exponentially fast, which means that we can theoretically perform the transformation but the complexity will be prohibitive except for the most trivial cases.  (How many cells in the normal form version of the original extensive form game?)
  
Nevertheless, this ability to perform the computation give me a lot of freedom.   Since we can transform any game in extensive form into an equivalent game in normal form, we can restrict most of our theoretical development to games in normal form only. From here on out, we will deal primarily with games in normal form, although we will use the extensive form when we discuss alpha-beta pruning.