Payoff Matrices
Updated Sept 12, 2006. Changes are in
bold red or in bold green.
I used different
colors if I thought it was confusing to do otherwise.
Utilities and Payoff Matrices
Now that we have a utility defined for the consequences of our
decisions,
I can make a choice. Let c(a1,a2) denote
the
consequence that results when I (agent 1) choose action a1
and you (agent 2) choose action a2. Suppose further
that
I have determined the utility for receiving this consequence, and
denote
this utility as u1[c(a1,a2)],
which
indicates the utility to agent 1 of the consequence
c(a1,a2).
Since using this consequence-based notation gets messy, we often
abbreviate this as
u1(a1,a2). How can I make
a choice with such a utility?
Before answering this question, it is useful to briefly introduce the
notation for more general types of problems. When
the environment is complicated or depends on an element of chance, we
may include a state of nature
in our notation and write
u1(a1,a2;s). The
complexity of the
environment makes it necessary to explicitly keep track of the state or
to account for multiple possible states. We will find this most
useful when the environment is non-deterministic or when the true state
of the environment is not known to the players.
Returning to the question of how to make a choice given a utility,
there has been considerable work done in this area --- most of it
beginning in the 20th century. There is a sound mathematical
theory called Game Theory for
dealing
with multi-agent choice problems when every agent knows its utilities
and
the utilities of all other agents. Since I believe that any complete
treatment of
multi-agent
systems should include a discussion of this theory, we will study
it in this class.
The critical assumptions of Game Theory are that (a) all agents have
defined
their preferences over the set of actions (using the method described
above)
and (b) each agent knows the other agents' utilities. Using the
terminology,
not only have
u1(a1,a2) and
u2(a1,a2)
been determined, but we both know u1 and u2.
We can write this set of two utilities as a payoff matrix as
follows.
Assign rows of the matrix to the actions of the first agent and the
columns
of the matrix to the actions of the second agent. In each cell of
the matrix, we can write the utility-pair for each agent as an ordered
pair. Thus, (u1(a1,a2), u2(a1,a2))
represents the ordered pair of utiltiies when agent 1 chooses a1
and agent 2 chooses a2. Consider the
following example.
Example.
Consider a two agent game. Agent 1 can play
one of two cards: Up or Down. Agent 2 can
also
play
two cards: Left or Right.
If agent 1
plays
Up
and agent 2 plays Left, then agent 2 must
give agent 1 a
penny.
Similarly, if agent 1 plays Down and agent 2 plays Right then
agent 2 must give agent 1 a penny. If any other pair is
played,
then
agent 1 must pay agent 2 a penny. The payoff matrix for this game
is shown below.
| |
|
Agent 2's plays
|
| |
|
Left
|
Right
|
|
Agent 1's plays
|
Up
|
(1,-1)
|
(-1,1)
|
|
Down
|
(-1,1)
|
(1,-1)
|
In the matrix, the cell
in the bottom right is the utility for player
1 and player 2 when player 1 goes right and player 2 goes
down.
Thus (u1(Up,Left),u2(Up,Left))=(1,-1).
For the first part of this class, we will rely heavily on payoff
matrices.
As you might expect, these matrices exist if there are two or more
agents (though for more than two agents, the matrices have more than
two dimensions). You may wonder if all games can be written as payoff
matrices.
More
specifically, you may wonder if games with turn-taking can be written
as
payoff matrices. We address this in the next section.
Some Sleight of Hand: Normal and Extensive Forms
Before we talk more about game theory, we want to be able to make
turn-taking
games fit into a payoff matrix. When we look at either the framework
diagram from
Lecture 1 or at the
payoff matrix in the example above, we see that consequences (and
therefore utilities) depend only on my immediate choice. In real
life, the outcome of a game (or another interesting decision problem)
usually
depends on a sequence of my choices. For example, consider the
following
game tree with two agents, a green agent and a red agent.

The game, which is a turn-taking game, starts with the red agent
choosing either action
a1
or a2. The green agent then chooses either b1
or b2, etc. The red agent then repeats its choice,
and
the green agent its choice. The numbers in the box represent the red
agent's
utility for the sequence of choices. For example, if the agents choose
a1,
b1,
a1,
b1,
then the red agent gets five dollars. Note that we only see the
utility
of one agent in this game tree. This is a shorthand
notation that I can use whenever u1(a1,a2) = -u2(a1,a2).
In the tree, we can just show the payoff to player 1, the read
agent. The payoff to player 2 is negative five dollars; in other
words, player 2 pays player 1 five dollars.
The tree structure of this game makes it clear that the outcome
depends
not on a single action chosen by me (the red agent) and a single action
chosen by you (the green agent), but instead on a sequence of such
choices.
When we represent a game in this form, it is said to be in extensive
form. I think that this terminology represents the idea that the game
extends
across time or turns.
When outcomes depend only on a single choice by me and a
single
choice by you, then the game is said to be in normal form. I
think
that this terminology represents the idea that the game is in a
normative
form, meaning a canonoical form. The payoff matrix expresses games in
normal
form, but the tree expresses games in extensive form. It
will
probably be helpful to associate the phrase extensive form with
the notion of a tree, and the phrase normal form with the
notion
of a payoff matrix.
We would like it if there were only one form to a game because this
would allow us to develop only one theory. To address this issue,
game theorists frequently perform some sleight of
hand.
Game theorists can reason about strategies, based on
contigencies, rather than discrete
actions. In this context, a strategy is not an attitude like play
aggressively or defend the goal, but rather a complete
expression
of what to do in every contigency. As described by Poundstone,
[A strategy] is a complete description of a
particular
way to play a game, no matter what the other player(s) does and no
matter
how long the game lasts . A strategy must prescribe actions so
thoroughly
that you never have to make a decision in following it. W.
Poundstone, Prisoner's
Dilemma, Anchor Books, page 42.
To choose a strategy, we can start lumping possible actions
together.
Instead of defining two actions that take place at different times
during
game play, what if I define an "action" as a sequence of two choices
instead of as a single choice, and allow these choices to depend on the
previous choices of the game.
This
abstract
action is precisely a strategy. For the game tree
we have been discussing, there are four actions for the red agent and
four
actions for the green agent. Clearly, if you are the green agent
then your second choice depends on which branch of the tree previous
moves have taken us. For example, if on my first move I played a1, on your first move
you played b1, and
on my second move I played a1,
then you will want to chose action b2
because this means that you will only have to pay me one dollar instead
of five dollars. On the other hand, if on my first move I played a2, on your first move
you played b1, and
on my second move I played a1,
then you will want to chose action b1
because this means that I will have to pay you one dollar instead of
nothing.
The main lessons from the previous paragraph are that the payoff
that we receive depends on the strategy that we adopt, and that a
strategy is defined as a choice made in response to the previous
choices made in the game. Using these lessons, we can turn the
game in extensive form into one in normal (or matrix) form by
considering what payoffs each player will get as a function of the
strategy that he or she adopts.
Let's see if we can turn the extensive form game above into a normal
form game. To simplify the discussion, let's just consider a
portion of the game above. Consider the lower left portion of the
game.
As before, this is an extensive form (i.e. turn-taking game)
with the red agent playing first followed by the green agent.
(For the sake of the discussion, let's call the red agent RED and the
green agent GREEN.) The payoffs shown in the squares are the
payoff to RED and the costs (negative payoffs) to GREEN. What are
RED's possible strategies? They are simply a1 or a2. What are
GREEN's possible strategies? They are
- If RED played a1
then I'll play b1
and if RED played a2
then I'll play b1.
- If RED played a1
then I'll play b1
and if RED played a2
then I'll play b2.
- If RED played a1
then I'll play b2
and if RED played a2
then I'll play b1.
- If RED played a1
then I'll play b2
and if RED played a2
then I'll play b2.
Let's call these strategies ALWAYSB1, MATCH, OPPOSITE, and ALWAYSB2,
respectively. The payoff matrix corresponding to this extensive
form game becomes
|
GREEN
|
ALWAYS1
|
MATCH
|
OPPOSITE
|
ALWAYS2
|
RED
|
a1
|
(5,-5)
|
(5,-5)
|
(1,-1)
|
(-1,1)
|
a2
|
(1,-1)
|
(2,-2)
|
(1,-1)
|
(2,-2)
|
Which strategy would you choose if you were GREEN? I'd choose
OPPOSITE. More importantly, notice how we changed the turn-taking
(extensive form) game into a matrix game.
Let's repeate this exercise for a larger subgame of our original
extensive form game. This subgame is shown below.
Let's start by considering the strategies available to RED.
- If GREEN played b1
then I'll play a1
and if GREEN played b2
then I'll play a1.
ALWAYSA1
- If GREEN played b1
then I'll play a1
and if GREEN
played b2
then I'll play a2.
MATCH
- If GREEN played b1
then I'll play a2
and if GREEN
played b2
then I'll play a1.
OPPOSITE
- If GREEN played b1
then I'll play a2
and if GREEN
played b2
then I'll play a2.
ALWAYSA2
Those are the same strategy ideas that were available to GREEN in the
previous subgame. The choices for GREEN are now bigger.
They are
- Play b1 on
my first turn. If RED played a1
then play b1 on my
second turn and if RED played a2
then play b1 on my
second turn.
- Play b2 on
my first turn. If RED played a1
then play b1 on my
second turn and if RED played a2
then play b1 on my
second turn.
- Play b1 on
my first turn. If RED played a1
then play b1 on my
second turn and if RED played a2
then play b2 on my
second turn.
- Play b2 on
my first turn. If RED played a1
then play b1 on my
second turn and if RED played a2
then play b2 on my
second turn.
- Play b1 on
my first turn. If RED played a1
then play b2 on my
second turn and if RED played a2
then play b1 on my
second turn.
- Play b2 on
my first turn. If RED played a1
then play b2 on my
second turn and if RED played a2
then play b1 on my
second turn.
- Play b1 on
my first turn. If RED played a1
then play b2 on my
second turn and if RED played a2
then play b2 on my
second turn.
- Play b2 on
my first turn. If RED played a1
then play b2 on my
second turn and if RED played a2
then play b2 on my
second turn.
We can write these in a more interpretible form as follows.
- Play b1 on
my first turn. Play ALWAYSB1 on the second turn.
- Play b2 on
my first turn. Play ALWAYSB1 on the second turn.
- Play b1 on
my first turn. Play MATCH on the second turn.
- Play b2 on
my first turn. Play MATCH on the second turn.
- Play b1 on
my first turn. Play OPPOSITE on the second turn.
- Play b2 on
my first turn. Play OPPOSITE on the second turn.
- Play b1 on
my first turn. Play ALWAYSB2 on the second turn.
- Play b2 on
my first turn. Play ALWAYSB2 on the second turn.
The payoff matrix becomes:
|
GREEN
|
(b1,ALWAYSB1)
|
(b2,ALWAYSB1) |
(b1,MATCH) |
(b2,MATCH) |
(b1,OPPOSITE) |
(b2,OPPOSITE) |
(b1,ALWAYSB2) |
(b2,ALWAYSB2) |
RED
|
ALWAYSA1
|
(5,-5)
|
(0,0)
|
(5,-5)
|
(0,0)
|
(1,-1)
|
(-1,1)
|
(1,-1)
|
(-1,1)
|
MATCH
|
(5,-5)
|
(-5,5)
|
(5,-5)
|
(0,0)
|
(1,-1)
|
(-5,5)
|
(1,-1)
|
(0,0)
|
OPPOSITE
|
(1,-1)
|
(0,0)
|
(2,-2)
|
(0,0)
|
(1,-1)
|
(-1,1)
|
(2,-2)
|
(-1,1)
|
ALWAYSA2
|
(1,-1)
|
(-5,5)
|
(2,-2)
|
(0,0)
|
(1,-1)
|
(-5,5)
|
(2,-2)
|
(0,0)
|
In general, we can peform
this sleight of hand on any game if we are patient enough. Clearly,
this will grow exponentially fast, which means that we can
theoretically perform the transformation but the complexity will be
prohibitive except for the most trivial cases. (How many cells in
the normal form version of the original extensive form game?)
Nevertheless, this ability to perform the computation give me a lot of
freedom. Since we can transform any game in extensive form into
an equivalent game in
normal
form, we can restrict most of our theoretical development to games
in normal form only. From here on out, we will deal primarily with
games in normal
form,
although we will use the extensive form when we discuss alpha-beta
pruning.