Evolutionary Games

Last update: 10/13/2005.  Previous update: 9/22/2005


Introduction

Recall that one of the central themes of this class is that there is no one "true" solution concept for multi-agent decision problems; there are several solution concepts and you get to choose which one you want to use.  The solution concepts that we have discussed in some detail include minimax solutions, strategically dominant solutions, equilibrium solutions, Pareto optimal solutions, and best response solutions.  Of the various types of equilibrium solutions, we've discussed only a few of the different possibilities (Nash and satisficing) but have not discussed other types (e.g. subgame perfect equilibria).

We now turn attention to another kind of equilibrium-based solution.  This is a solution that is produced by some form of learning or adaptation process.  It can be argued that any learned solution is valid simply because, out of all the possible solutions that could have been learned, only one solution was learned.  However, we want to make a stronger argument about which types of learned strategies are justifiable.

The first step to justifying a learned solution is to identify a learning mechanism that we "believe in."  The notion of learning that we will apply in thisportion of the class is the learning that comes as a result of an evolution process.  More precisely, we will focus on the kinds of things that can be learned in a population of learning agents.  We'll have to be careful because evolution has a huge number of things that can affect it: mating, mutation, environment, catastrophes, other agents in the population, etc.  We'll restrict attention to just a couple of these factors.  We'll select those factors that are most strongly studied in the field of evolutionary games.

The second step of justifying a learned solution is to identify when a learned solution is appropriate for games. The most interesting game-theoretic notion that applies to learning is the notion of an equilibrium.  We want to identify solutions that are learned through an evolution process and achieve an equilibrium. 

In general, the main requirement for reaching an equilibrium in learning is that the learning algorithms stop changing.  This type of equilibrium can be very weak as when, for example, a learning agent happena to select parameter values that cause another learning agent to stop adapting, and vice versa.  Or, both agents get tired of adapting and just "freeze" their solutions even though they may not be good solutions. 

This type of equilibrium may also be weak because even the smallest perturbation to this type of equilibrium can cause the system to adapt to another solution.  A stronger notion of equilibrium is a learned solution that is not easily changed by perturbing the system.  We call such an equilibrium a stable solution.  In this class, we will be focusing on a type of learning that tends to select stable equilibria.

Finally, not every learning process has an equilibrium.  Since only certain types of learning processes and games produce these equilibria, the notion of a learning-based equilibrium is not as universal as the notion of a Nash equilibrium.

Let's summarize our discussions so far.  We will focus on algorithms that produce evolutionary stable strategies.  This phrase has three parts.


Segue

Our goal for the remainder of these notes is to identify what kinds of things affect the outcome of evolution in games.  In evolutionary games, the two main factors that contribute to what is learned are:
  1. The types of interactions that occur between the agents in a population.
  2. The rules that are applied to determine which strategies within the population are fit and therefore likely to be learned by the population.

Example

Let's begin by using an example.  Suppose that we have two large and separate groups of agents (males and females) who will be playing the battle of the sexes game.  Suppose that each of these two groups has a mix of agents that either always play cooperate or always play defect;  the males group has a certain number of agents that always cooperate and the rest of the agents always defect -- similarly for the females group.  One agent from each group, one male and one female, is selected at random, they each make their choice, and they get the reward that results.

After a bunch of these encounters, we will quantify which strategies for the males are doing well and which strategies for the females are doing well.  Quantifying how well the strategies are doing computes the relative fitness of the strategies.  For the sake of discussion, suppose that the male population is composed almost entirely of agents that always cooperate.  For this population, the female agents who use the always defect strategy will get higher payoffs, on average, than female agents who use the always cooperate strategy.

Given these measures of relative fitness, we will have the male group create a new generation of males and the female group create a new generation of females.  (I want you to be careful here.  I am using the male and female names to identify the agent groups.  I've chosen these names because they match up with the stereotypical gender roles associated with the battle of the sexes game.  However, the males and females do not "mate" to create the next generation.  This game uses a type of asexual reproduction.)  If we have a limited number of agents who will survive (i.e., there is some selection force acting on the populations) AND if selection favors those agents who get higher payoffs in the battle of the sexes game, then it makes sense that the strategies that produce higher payoffs will have more offspring in the second generation than the strategies that produce lower payoffs.

I did a series of simulations of these group dynamics, starting with random numbers of defector and cooperator agents in the two groups.  I'll show you the videos in class, but some images are below.
What evolves in battle of the sexes? 
What evolves in battle of the sexes?
In these images, the x-axis represents the number of rounds that the game was played and the y-axis represents the percentage of the female group (red circles) and of the male group (green squares) that play always cooperate.  Note that the two graphs represent the two most common outcomes --- all the females play always cooperate while all the males play always defect (top graph), or all the females play always defect while all the males play always cooperate (bottom graph). 

This should make some intuitive sense.  If the two groups play a lot, then they should learn to settle on one of the two Pareto optimal, Nash equilibrium solutions, but which solution is chosen depends on the initial make-up of the group.  For these simulations, the initial population was very close to 50/50, but with a small random perturbation towards either always defect or always cooperate for each group.

The dip at the start of the curves may not make as much sense, so we should discuss it.  Think about the payoff matrix for the battle of the sexes game.  Which is better, if both players play defect or if both players play cooperate?  Since simultaneous defection  yields a (2,2) payoff, it is favored over simultaneous cooperation.  Thus, initially, both pupolations tend to favor always defect.  However, eventually the different number of cooperators and defectors in the two groups gets reinforced enough that one strategy becomes more useful for one group, with the complementary strategy becoming more fit for the other group.


Key Concepts

This example illustrates the two key elements of evolutionary games: an interaction model and a selection criterion.  For the example, the interaction model was two independent groups interacting exclusively with the other group.  For the example, the selection criterion tended to favor those strategies that had higher relative fitness.  Note that the term is relative fitness because a strategy does not have to yield a high payoff to be selected, it only needs to yield a higher payoff than other existing strategies to be selected more frequently than those other strategies.  Such a strategy is more fit relative to the set of other strategies in the population.

The selection dynamics in the example, based on relative fitness, are quite common. These selection dynamics are known as replicator dynamics.  Stating that a game uses replicator dynamics means that population proportions evolve according to relative fitness.

The type of interaction in the example is more rare.  It is much more common to have randomized pairings from individuals selected from the same group.

In the next section,  we will formally define how evolution can occur under the replicator dynamics when agents are randomly paired with each other.  For simplicity, we will restrict focus to two player games only.


Computing Replicator Dynamics and Relative Fitness


When we use replicator dynamics and when we use randomized pairings, we can formalize the way that we decide how many of each type of strategy are found in each new generation. 

Terms. 
Before proceeding, we will introduce some terminology to help us in our discussion.  We will use the term strategy for player i to be the policy used by a player i to determine which action he or she will play.  For simplicity, we will restrict attention to pure strategies.  We will use the term strategy to be a player independent policy that determines actions according to some rule.  Our goal is to compute the relative fitness of a strategy.  Relative fitness will depend on both the expected utility of a strategy for a player and the number of players that use this strategy.  We begin by computing the expected utility to a player for choosing an action.

Note that throughout our discussion, and without loss of generality, we will be using non-negative utilities.

Expected Utility. The first step in creating this formalization is to figure out how to compute the expected fitness of an action under randomized pairings.  We'll begin by computing the expected utility of a single action.  (Later, we'll take into consideration the number of agents who play this action and use this information to compute the fitness of the action.)  Let U1(ai,bk) denote the utility to player 1 when player 1's strategy says to choose action ai and when player 2's strategy says to choose bk.  The expected utility for player 1 choosing action ai is 
E(ai) =Σ jU1(ai,bj)p(bj)

where the sum is taken over all possible actions, bk, from player 2 and where p(bk) is the probability that player 1 will be randomly paired with a player from the group that plays action bk.

In a group of N agents, we can compute p(bk) by counting the number of agents playing strategy bk and normalize by N-1 (the minus one comes because I can't be paired with myself):

p(bj) = Σk d(bj, Sk) / (N-1),
where
This delta function, which appears in signal processing a lot, allows us to count the number of players that are using the strategy that chooses action bj. Putting these two equations together gives the expected utility for playing action ai.

E(ai) =Σ j Σk U1(ai,bj) d(bj, Sk) / (N-1).
In simulation, this value is computed by playing this action against every other agent and summing the utility.  In words, the expected utility of playing an action is obtained by pairing the action against every agent in the group, computing the utility for player results, summing the utility, and normalizing by the number of games that were played.  (I am normalizing by N-1 because I am assuming that action ai is being played by a player in the group.)

Relative Fitness.  Now that we have the expected utility of action ai we can perform the second step by computing the relative fitness of this action.  To compute relative fitness, we calculate the cumulative utility of every strategy in the group.  Recall that a strategy is the policy by which an agent chooses an action.  Several agents in a population may use the same strategy (which is indeed the case if the number of actions is small and the number of agents is large).  We want to compute the fitness of a strategy which may depend on the number of agents who play the strategy. 

Let Si denote the strategy used by agent i.  Since we are restricting attention to pure strategies, the fitness of a given strategy is simply the fitness of the given pure strategy   The absolute utility of a strategy is the sum of the expected utilities for all players that use that strategy.

f(S) = Σ i E(ai) d(ai,S)
where the sum is over all agents in the group.  The absolute utilityof a strategy is the accumulation of the utilities produced by all agents that play that strategy.   Thus, if a particular strategy is used by a lot of agents in the group, this strategy will have high absolute utility.

The relative fitness of a strategy depends on both the absolute utility of this strategy and on the total utility in the group.  The total utility is the accumulation of the absolute utilities for a strategy.

Some of this total utility is attributable to strategy S and some of it is attributable to other strategies.  Some of these possible strategies contribute a lot to the total utility and others contribute very little to the total utility.  The relative fitness of a strategy is the amount that it contributes to the total utility

r(S) = f(S) / Σ R f(R)

where the sum is taken over all possible strategies found in the population.

Replication of Fit Strategies. 
We now need to translate the relative fitness of these strategies into the relative number of strategies that make it into the next generation.  The gist of replicator dynamics is that the number of offspring of a strategy is proportional to the relative fitness of that strategy.  Since we won't let the total number of agents in a population grow, we'll proportion N, the number of agents in the population, into strategy groups in proportion to the relative fitness of these groups. 

To help understand how to do this, it is helpful to interpret some of the values above.
Since the relative fitness of an action is a measure of the portion of the next generation produced by that action, we can talk about how this affects the population.  Consider a population with N agents.  If every agent does as well as every other agent, then the relative fitness of each agent is 1/N.  A successful agent will have a relative fitness higher than meaning that they will contribute more to the population in the next generation than average.  An unsuccessful agent will have a relative fitness lower than 1/N meaning that they will contribute less to the population in the next generation than average.

When we consider how many agents that play a particular strategy, we can decide how many agents that play this strategy will be in the next generation.  The total proportion of agents in the next generation playing a strategy is simply the sum of relative fitnesses of all of the agents in this subpopulation.  Thus,

    o(S) = N r(S)

represents that fraction of the population that will play strategy S in the next generation.

The wrong way to do this is to simply the size of the population by the number of agents in it as follows:


 o(S) = N r(S).

The problem with doing this is that the population sizes immediately jump to the relative fitness of the strategies within the population.

A better and more correct way of doing this is to interpret the relative fitness as the rate at which a particular subpopulation will have offspring.  To make this concrete, suppose that we have a population of N agents and that there are two strategies within this population.  Call these strategies A and B.  Using random pairings, we compute the relative fitness of the subpopulations that play each of the strategies.  Let N(A) and N(B) denote the subpopulations that plays strategies A and B, respectively.  Note that N(A) + N(B) = N

Suppose now that we compute the relative fitness of each strategy to get r(A) and r(B).  Interpreting relative fitness as the birthrate of the two populations, we can compute the number of offspring of each subpopulation as N(A)·r(A) and
N(B)·r(B).  We then normalize the population based on this relative birth rates to get the new size of the subpopulations:
M = N(A)·r(A) + N(B)·r(B)
N(A) <-- N(A)
·r(A) · N/M
N(B) <-- N(B)·r(B) · N/M.

The equation for the the size of the subpopulation that will play a particular strategy in the next generation is thus
 o(S) = N(S)·r(S) / (Σ S N(S)·r(S) / N).

Since this will not in general be an integer, we need to renormalize.  One way to do this is to round the number offspring to the next integer and then either trim (if there are too many offspring >N) or add (if there are too foo offspring <N).


Imitator Dynamics

Replicator dynamics and random pairings of solutions are not the only models for evolution.  Thus, they are not the only learning models that have some claim to justification.  We will explore a different technique for selecting the proportion of strategies that evolve from one generation to another, but first we will need to explore other models for selecting which agents interact with each other.

Playing with Neighbors.  In the previous section, agents were randomly paired with other agents from the group.  From an evolutionary perspective, it sometimes makes more sense to assume that agents are paired with their neighbors rather than being randomly paired with any other agent.  This pairing with neighbors can be implemented in two ways.
  1. Agents have some way to recognize another agent.  If they are randomly paired with another agent that they do not like, they can ask to be reassigned.  The reassignment will be random, but at least they get one chance to reject an undesirable agent and they therefore get more chances to interact with their friends.
  2. Agents are physically arranged in group.  For example, agents may be arranged on a lattice and restricted so that they can only interact with their immediate neighbors.  These immediate neighbors can be defined as those agents to the N, S, E, or W of the agent, or to the N, NE, E, SE, S, SW, W, or NW of the agent.  For another example, agents may be arranged on the perimeter of a circle and only able to interact with an agent to their right or left.
Imitator Dynamics.  When agents can only play with their neighbors, we can introduce a different way (different from replicator dynamics) of selecting which strategies propagate to the next generation.  One way to do this is for an agent to imitate its most successful neighbor.  The algorithm for doing this goes something like this:
Imititator dynamics can produce vastly different results than replicator dynamics.  You will explore these differences in the lab.