Evolutionary Games
Last update: 10/13/2005.
Previous update: 9/22/2005
Introduction
Recall that one of the central themes
of this class is that there is no one "true" solution concept for
multi-agent decision problems; there are several solution concepts and
you get to choose which one you want to use. The solution
concepts that we have discussed in some detail include minimax
solutions, strategically dominant solutions, equilibrium solutions,
Pareto optimal solutions, and best response solutions. Of the
various types of equilibrium solutions, we've discussed only a few of
the different possibilities (Nash and satisficing) but have not
discussed other types (e.g. subgame perfect equilibria).
We now turn attention to another kind of equilibrium-based
solution. This is a solution that is produced by some form of
learning or adaptation process. It can be argued that any learned
solution is valid simply because, out of all the possible solutions
that could have been learned, only one solution was learned.
However, we want to make a stronger argument about which types of
learned strategies are justifiable.
The first step to justifying a learned solution is to identify a
learning mechanism that we "believe in." The notion of learning
that we will apply in thisportion of the class is the learning
that comes as a result of an evolution process. More precisely,
we will focus on the kinds of things that can be learned in a
population of learning
agents. We'll have to be careful because evolution has a huge
number of things that can affect it: mating, mutation, environment,
catastrophes, other agents in the population, etc. We'll restrict
attention to just a couple of these factors. We'll select those
factors that are most strongly studied in the field of evolutionary
games.
The second step of justifying a learned solution is to identify when a
learned solution is appropriate for games. The most interesting
game-theoretic notion that applies to learning is the notion of an
equilibrium. We want to identify solutions that are learned
through an evolution process and achieve an equilibrium.
In general, the main requirement for reaching an equilibrium in
learning is that the learning algorithms stop changing. This type
of
equilibrium can be very weak as when, for example, a learning agent
happena to select parameter values that cause another learning agent to
stop adapting, and vice versa. Or, both agents get tired of
adapting and just "freeze" their solutions even though they may not be
good solutions.
This type of equilibrium may also be weak because even the
smallest perturbation to this type of equilibrium can cause the system
to adapt to another solution. A stronger notion of equilibrium is
a learned solution that is not
easily changed by perturbing the system. We call such an
equilibrium a
stable
solution. In this class, we will be focusing on a type of
learning
that tends to select stable equilibria.
Finally, not every learning
process has an equilibrium. Since only
certain types of learning processes and games produce these equilibria,
the notion of a learning-based equilibrium is not as universal as
the notion of a Nash equilibrium.
Let's summarize our discussions so far. We will focus on
algorithms that produce
evolutionary
stable strategies. This phrase has three parts.
- evolution implies
the kind of
learning dynamics that we will apply
- stable
implies that we will focus on
solutions that are robust equilibria
- strategies implies that
we will
focus on applying the learning algorithms to games.
Segue
Our goal for the remainder of these notes is to identify what kinds of
things affect the outcome of evolution in games. In evolutionary
games, the two main factors that contribute to what is
learned are:
- The types of interactions that occur between the agents in a
population.
- The rules that are applied to determine which strategies within
the population are fit and therefore likely to be learned by the
population.
Example
Let's begin by using an example. Suppose that we have two large
and separate groups of agents (
males
and
females) who will be
playing the battle of the sexes
game. Suppose that each of these two groups has a mix of agents
that either always play cooperate or always play defect; the
males group has a certain number of
agents that always cooperate and the rest of the agents always defect
-- similarly for the
females
group. One
agent from each group, one
male
and one
female, is selected
at
random, they each make their choice, and they get the reward that
results.
After a bunch of these encounters, we will quantify which strategies
for the males are doing well and which strategies for the females are
doing well. Quantifying how well the strategies are doing
computes the
relative fitness
of the strategies. For the sake of discussion, suppose that the
male population is composed almost entirely of agents that always
cooperate. For this population, the female agents who use the
always
defect strategy will get higher payoffs, on average, than female
agents
who use the
always cooperate
strategy.
Given these measures of relative fitness, we will have the male group
create a new generation of males and the female group create a new
generation of females. (I want you to be careful here. I am
using the male and female names to identify the agent groups.
I've chosen these names because they match up with the stereotypical
gender roles associated with the battle of the sexes game.
However, the males and females do not "mate" to create the next
generation. This game uses a type of asexual reproduction.)
If we have a limited number of agents who will survive (i.e., there is
some
selection force acting
on the populations) AND if selection favors those agents who get higher
payoffs in the battle of the sexes game, then it makes sense that the
strategies that produce higher payoffs will have more offspring in the
second generation than the strategies that produce lower payoffs.
I did a series of simulations of these group dynamics, starting with
random numbers of defector and cooperator agents in the two
groups. I'll show you the videos in class, but some images are
below.

In these images, the x-axis represents the
number of rounds that the game was played and the y-axis represents the
percentage of the female group (red circles) and of the male group
(green squares) that play always
cooperate. Note that the two graphs
represent the two most common outcomes --- all the females play always
cooperate while all the males play always defect (top graph), or all
the females play always defect while all the males play always
cooperate (bottom graph).
This should make some intuitive sense. If the two groups play a
lot, then they should learn to settle on one of the two Pareto optimal,
Nash equilibrium solutions, but which solution is chosen
depends on the initial make-up of the group. For these
simulations, the initial population was very close to 50/50, but with a
small random perturbation towards either always defect or always
cooperate for each group.
The dip at the start of the curves may not make as much sense, so we
should discuss it. Think about the payoff matrix for the battle
of the sexes
game. Which is better, if both players play defect or if both
players play cooperate? Since simultaneous defection yields
a
(2,2) payoff, it is favored over simultaneous cooperation. Thus,
initially, both pupolations tend to favor always defect.
However, eventually the different number of cooperators and defectors
in the two groups gets reinforced enough that one strategy becomes more
useful for one group, with the complementary strategy becoming more fit
for the other group.
Key Concepts
This example illustrates the two key elements of evolutionary games: an
interaction model and a selection criterion. For the example, the
interaction model was two independent groups interacting exclusively
with the other group. For the example, the selection criterion
tended to favor those strategies that had higher relative
fitness. Note that the term is relative fitness because a
strategy does not have to yield a high payoff to be selected, it only
needs to yield a higher payoff than other existing strategies to be
selected more frequently than those other strategies. Such a
strategy is more fit relative to the set of other strategies in the
population.
The selection dynamics in the example, based on relative fitness, are
quite common. These selection dynamics are known
as replicator dynamics.
Stating that a game uses replicator dynamics means that population
proportions evolve according to relative fitness.
The type of interaction in the example is more rare. It is much
more common to have randomized
pairings from individuals selected from the same group.
In the next section, we will formally define how evolution can
occur under the replicator dynamics when agents are randomly paired
with each other. For simplicity, we will restrict focus to two
player games only.
Computing Replicator Dynamics and Relative Fitness
When we use replicator dynamics and when we use randomized pairings, we
can formalize the way that we decide how many of each type of strategy
are found in each new generation.
Terms. Before proceeding, we will introduce some
terminology to help us in our discussion. We will use the term strategy for player i to be the
policy used by a player i to
determine which action he or
she will play. For simplicity, we will restrict attention to pure
strategies. We will use the term strategy to be a player independent
policy that determines actions according to some rule. Our goal
is to compute the relative fitness of a strategy. Relative
fitness will depend on both the expected utility of a strategy for a
player and the number of players that use this strategy. We begin
by computing the expected utility to a player for choosing an action.
Note that throughout our discussion, and without loss of generality, we
will be using non-negative utilities.
Expected Utility. The first
step in creating this formalization is to figure out how to compute the
expected fitness of an action under randomized pairings. We'll
begin by computing the expected utility of a single action.
(Later, we'll take into consideration the number of agents who play
this action and use this information to compute the fitness of the
action.) Let U1(ai,bk)
denote the utility to player 1 when player 1's strategy says to choose
action ai and when
player 2's strategy says to choose bk.
The expected utility for player 1 choosing action ai is
E(ai) =Σ jU1(ai,bj)p(bj)
where the sum is taken over all possible actions,
bk, from player 2 and
where
p(bk) is the
probability that player 1 will be randomly paired with a player from
the group that plays action
bk.
In a group of
N
agents, we can compute
p(bk) by counting the number of agents
playing strategy
bk and
normalize by
N-1 (the minus
one comes because I
can't be paired with myself):
p(bj) = Σk d(bj, Sk) / (N-1),
where
- the sum is taken over all agents
- Sk denotes
the strategy for player k
- d(bj, Sk)
denotes the delta function
which takes a value of 1 if bj=
Sk and takes a value of 0 otherwise.
This delta function, which appears in signal processing a lot, allows
us to count the number of players that are using the strategy that
chooses action bj.
Putting these two equations together gives the expected utility for
playing action ai.
E(ai) =Σ j Σk U1(ai,bj)
d(bj, Sk)
/ (N-1).
In simulation, this value is computed
by
playing this action against every other agent and summing the
utility. In words, the expected utility of playing an action is
obtained by pairing the action against every agent in the group,
computing the utility for player results, summing the utility, and
normalizing by the number of games that were
played. (I am normalizing by N-1
because I am assuming that action ai
is being played by a player in the group.)
Relative Fitness. Now
that we have the expected utility of action ai we can perform the
second step by computing the relative fitness of this action. To
compute relative fitness, we calculate the cumulative utility of every
strategy in the group. Recall that a strategy is the policy by
which an agent chooses an action. Several agents in a population
may use the same strategy (which is indeed the case if the number of
actions is small and the number of agents is large). We want to
compute the fitness of a strategy
which may depend on the number of agents who play the strategy.
Let Si denote the
strategy used by agent i.
Since we are restricting attention to pure strategies, the fitness of a
given strategy is simply the fitness of the given pure strategy
The absolute utility of a
strategy is the sum of the expected utilities for all players that use
that strategy.
f(S)
= Σ i E(ai)
d(ai,S)
where the sum is over all agents in the
group. The absolute utilityof a strategy is the accumulation of
the utilities produced by all agents that play that strategy.
Thus, if a particular strategy is used by a lot of agents in the group,
this strategy will have high absolute utility.
The relative fitness of a strategy depends on both the absolute utility
of this strategy and on the
total
utility in the group. The total utility is the
accumulation of the absolute utilities for a strategy.
Some of this total utility is
attributable to strategy
S and some of it is
attributable to other strategies. Some of
these possible strategies contribute a lot to the total utility and
others contribute
very little to the total utility. The
relative fitness of a strategy is
the amount that it contributes to the total utility
r(S) = f(S) / Σ R f(R)
where the sum
is taken over all possible strategies found in the population.
Replication of Fit Strategies. We
now need to translate the relative fitness of these strategies into the
relative number of strategies that make it into the next
generation. The gist of replicator dynamics is that the number of
offspring of a strategy is proportional to the relative fitness of that
strategy. Since we won't let the total number of agents in a
population grow, we'll proportion
N,
the number of agents in the population, into strategy groups in
proportion to the relative fitness of these groups.
To help
understand how to do this, it is helpful to interpret some of the
values above.
- The expected utility E(ai) of playing action ai is the absolute fitness of this
action.
- The expected utility f(S)
of a strategy is the absolute
fitness of this strategy.
- The total utility Σ R f(R) is the amount of
utility accumulated by every agent in the group.
- The ratio of the expected utilityof an action to the total
utility is the relative
fitness of the individual that played that action, r(ai)
= E(ai) / Σ R f(R) = E(ai)
/ Σ b E(b), where the dummy
variable R is over strategies
and the dummy variable b is
over action.
- The ratio of the expected utility of a strategy to the
total utility is the relative
fitness of that stratetgy within the population, r(S)
= f(S) / Σ R f(R) = Σ i E(ai)
d(ai,S) / Σ R f(R) = Σ i E(ai)
d(ai,S) / Σ b E(b) = Σ
i r(ai)d(ai,S).
- Relative fitness of an action (respectively, strategy) is a
measure of the portion of the next generation produced by that action
(respectively strategy). The relative fitness of an action
depends only on how well that action performed relative to every other
action played in the tournament. By contrast, the relative
fitness of a strategy depends both on how well the individuals who play
that strategy perform relative to every other action in the tournament,
as well as on how many agents play that strategy.
Since the
relative fitness of an action is a measure of the portion of the next
generation produced by that action, we can talk about how this affects
the population. Consider a population with N agents. If
every agent does as well as every other agent, then the relative
fitness of each agent is 1/N. A
successful agent will have a relative fitness higher than meaning that
they will contribute more to the population in the next generation than
average. An unsuccessful agent will have a relative fitness lower
than 1/N meaning that they
will contribute less to the population in the next generation than
average.
When we
consider how many agents that play a particular strategy, we can decide
how many agents that play this strategy will be in the next
generation. The total proportion of agents in the next generation
playing a strategy is simply the sum of relative fitnesses of all of
the agents in this subpopulation. Thus,
o(S)
= N r(S)
represents that
fraction of the population that will play strategy S in the next
generation.
The
wrong way to do this is to simply the size of the population by the
number of agents in it as follows:
o(S)
= N r(S).
The problem with doing this is that the
population sizes immediately jump to the relative fitness of the
strategies within the population.
A better and more correct way of doing this is to interpret the
relative fitness as the rate at which a particular subpopulation will
have offspring. To make this concrete, suppose that we have a
population of N agents and that there are two strategies within this
population. Call these strategies A and B. Using random pairings, we
compute the relative fitness of the subpopulations that play each of
the strategies. Let N(A)
and N(B) denote the
subpopulations that plays strategies A
and B, respectively.
Note that N(A) + N(B) = N.
Suppose now that we compute the relative fitness of each strategy to
get r(A) and r(B). Interpreting relative
fitness as the birthrate of the two populations, we can compute the
number of offspring of each subpopulation as N(A)·r(A) and N(B)·r(B).
We then
normalize the population based on this relative birth rates to get the
new size of the subpopulations:
M = N(A)·r(A) + N(B)·r(B)
N(A) <-- N(A)·r(A)
·
N/M
N(B)
<-- N(B)·r(B)
·
N/M.
The
equation for the the size of the subpopulation that will play a
particular strategy in the next generation is thus
o(S)
= N(S)·r(S)
/ (Σ S N(S)·r(S) / N).
Since this will not in general be an
integer, we need to renormalize. One way to do this is to round
the number offspring to the next integer and then either trim (if there
are too many offspring
>N)
or add (if there are too foo offspring
<N).
Imitator Dynamics
Replicator dynamics and random pairings of solutions are not the only
models for evolution. Thus, they are not the only learning models
that have some claim to justification. We will explore a
different technique for selecting the proportion of strategies that
evolve from one generation to another, but first we will need to
explore other models for selecting which agents interact with each
other.
Playing with Neighbors.
In the previous section, agents were randomly paired with other agents
from the group. From an evolutionary perspective, it sometimes
makes more sense to assume that agents are paired with their neighbors
rather than being randomly paired with any other agent. This
pairing with neighbors can be implemented in two ways.
- Agents have some way to recognize another agent. If they
are randomly paired with another agent that they do not like, they can
ask to be reassigned. The reassignment will be random, but at
least they get one chance to reject an undesirable agent and they
therefore get more chances to interact with their friends.
- Agents are physically arranged in group. For example,
agents may be arranged on a lattice and restricted so that they can
only interact with their immediate neighbors. These immediate
neighbors can be defined as those agents to the N, S, E, or W of the
agent, or to the N, NE, E, SE, S, SW, W, or NW of the agent. For
another example, agents may be arranged on the perimeter of a circle
and only able to interact with an agent to their right or left.
Imitator Dynamics. When
agents can only play with their neighbors, we can introduce a different
way (different from replicator dynamics) of selecting which strategies
propagate to the next generation. One way to do this is for an
agent to imitate its most successful neighbor. The algorithm for
doing this goes something like this:
- Interact with all of my neighbors, and let all my neighbors
interact with their neighbors.
- After the interactions with my neighbors are complete, identify
the interaction strategy from my neighbors that was most successful
unless my current strategy beat all of my neighbors (in which case I'll
stick to my strategy).
- Change my strategy to the most successful strategy of my
neighbors -- imitate them -- on the next round.
Imititator dynamics can produce vastly different results than
replicator dynamics. You will explore these differences in the
lab.