## Relative Entropy in Evolutionary Dynamics

22 January, 2014

guest post by Marc Harper

In John’s information geometry series, he mentioned some of my work in evolutionary dynamics. Today I’m going to tell you about some exciting extensions!

### The replicator equation

First a little refresher. For a population of $n$ replicating types, such as individuals with different eye colors or a gene with $n$ distinct alleles, the ‘replicator equation’ expresses the main idea of natural selection: the relative rate of growth of each type should be proportional to the difference between the fitness of the type and the mean fitness in the population.

To see why this equation should be true, let $P_i$ be the population of individuals of the $i$th type, which we allow to be any nonnegative real number. We can list all these numbers and get a vector:

$P = (P_1, \dots, P_n)$

The Lotka–Volterra equation is a very general rule for how these numbers can change with time:

$\displaystyle{ \frac{d P_i}{d t} = f_i(P) P_i }$

Each population grows at a rate proportional to itself, where the ‘constant of proportionality’, $f_i(P),$ is not necessarily constant: it can be any real-valued function of $P.$ This function is called the fitness of the $i$th type. Taken all together, these functions $f_i$ are called the fitness landscape.

Let $p_i$ be the fraction of individuals who are of the $i$th type:

$\displaystyle{ p_i = \frac{P_i}{\sum_{i =1}^n P_i } }$

These numbers $p_i$ are between 0 and 1, and they add up to 1. So, we can also think of them as probabilities: $p_i$ is the probability that a randomly chosen individual is of the $i$th type. This is how probability theory, and eventually entropy, gets into the game.

Again, we can bundle these numbers into a vector:

$p = (p_1, \dots, p_n)$

which we call the population distribution. It turns out that the Lotka–Volterra equation implies the replicator equation:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(P) - \langle f(P) \rangle \right) \, p_i }$

where

$\displaystyle{ \langle f(P) \rangle = \sum_{i =1}^n f_i(P) p_i }$

is the mean fitness of all the individuals. You can see the proof in Part 9 of the information geometry series.

By the way: if each fitness $f_i(P)$ only depends on the fraction of individuals of each type, not the total numbers, we can write the replicator equation in a simpler way:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }$

Anyway, the take-home message is this: the replicator equation says the fraction of individuals of any type changes at a rate proportional to fitness of that type minus the mean fitness.

Now, it has been known since the late 1970s or early 1980s, thanks to the work of Akin, Bomze, Hofbauer, Shahshahani, and others, that the replicator equation has some very interesting properties. For one thing, it often makes ‘relative entropy’ decrease. For another, it’s often an example of ‘gradient flow’. Let’s look at both of these in turn, and then talk about some new generalizations of these facts.

### Relative entropy as a Lyapunov function

I mentioned that we can think of a population distribution as a probability distribution. This lets us take ideas from probability theory and even information theory and apply them to evolutionary dynamics! For example, given two population distributions $p$ and $q,$ the information of $q$ relative to $p$ is

$I(q,p) = \displaystyle{ \sum_i q_i \ln \left(\frac{q_i}{p_i }\right)}$

This measures how much information you gain if you have a hypothesis about some state of affairs given by the probability distribution $p,$ and then someone tells you “no, the best hypothesis is $q$!”

It may seem weird to treat a population distribution as a hypothesis, but this turns out to be a good idea. Evolution can then be seen as a learning process: a process of improving the hypothesis.

We can make this precise by seeing how the relative information changes with the passage of time. Suppose we have two population distributions $q$ and $p.$ Suppose $q$ is fixed, while $p$ evolves in time according to the replicator equation. Then

$\displaystyle{ \frac{d}{d t} I(q,p) = \sum_i f_i(P) (p_i - q_i) }$

For the proof, see Part 11 of the information geometry series.

So, the information of $q$ relative to $p$ will decrease as $p$ evolves according to the replicator equation if

$\displaystyle{ \sum_i f_i(P) (p_i - q_i) } \le 0$

If $q$ makes this true for all $p,$ we say $q$ is an evolutionarily stable state. For some reasons why, see Part 13.

What matters now is that when $q$ is an evolutionarily stable state, $I(q,p)$ says how much information the population has ‘left to learn’—and we’re seeing that this always decreases. Moreover, it turns out that we always have

$I(q,p) \ge 0$

and $I(q,p) = 0$ precisely when $p = q.$

People summarize all this by saying that relative information is a ‘Lyapunov function’. Very roughly, a Lyapunov function is something that decreases with the passage of time, and is zero only at the unique stable state. To be a bit more precise, suppose we have a differential equation like

$\displaystyle{ \frac{d}{d t} x(t) = v(x(t)) }$

where $x(t) \in \mathbb{R}^n$ and $v$ is some smooth vector field on $\mathbb{R}^n.$ Then a smooth function

$V : \mathbb{R}^n \to \mathbb{R}$

is a Lyapunov function if

$V(x) \ge 0$ for all $x$

$V(x) = 0$ iff $x$ is some particular point $x_0$

and

$\displaystyle{ \frac{d}{d t} V(x(t)) \le 0 }$ for every solution of our differential equation.

In this situation, the point $x_0$ is a stable equilibrium for our differential equation: this is Lyapunov’s theorem.

### The replicator equation as a gradient flow equation

The basic idea of Lyapunov’s theorem is that when a ball likes to roll downhill and the landscape has just one bottom point, that point will be the unique stable equilibrium for the ball.

The idea of gradient flow is similar, but different: sometimes things like to roll downhill as efficiently as possible: they move in the exactly the best direction to make some quantity smaller! Under certain conditions, the replicator equation is an example of this phenomenon.

Let’s fill in some details. For starters, suppose we have some function

$F : \mathbb{R}^n \to \mathbb{R}$

Think of $V$ as ‘height’. Then the gradient flow equation says how a point $x(t) \in \mathbb{R}^n$ will move if it’s always trying its very best to go downhill:

$\displaystyle{ \frac{d}{d t} x(t) = - \nabla V(x(t)) }$

Here $\nabla$ is the usual gradient in Euclidean space:

$\displaystyle{ \nabla V = \left(\partial_1 V, \dots, \partial_n V \right) }$

where $\partial_i$ is short for the partial derivative with respect to the $i$th coordinate.

The interesting thing is that under certain conditions, the replicator equation is an example of a gradient flow equation… but typically not one where $\nabla$ is the usual gradient in Euclidean space. Instead, it’s the gradient on some other space, the space of all population distributions, which has a non-Euclidean geometry!

The space of all population distributions is a simplex:

$\{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_{i = 1}^n p_i = 1 \} .$

For example, it’s an equilateral triangle when $n = 3.$ The equilateral triangle looks flat, but if we measure distances another way it becomes round, exactly like a portion of a sphere, and that’s the non-Euclidean geometry we need!

In fact this trick works in any dimension. The idea is to give the simplex a special Riemannian metric, the ‘Fisher information metric’. The usual metric on Euclidean space is

$\delta_{i j} = \left\{\begin{array}{ccl} 1 & \mathrm{ if } & i = j \\ 0 &\mathrm{ if } & i \ne j \end{array} \right.$

This simply says that two standard basis vectors like $(0,1,0,0)$ and $(0,0,1,0)$ have dot product zero if the 1′s are in different places, and one if they’re in the same place. The Fisher information metric is a bit more complicated:

$\displaystyle{ g_{i j} = \frac{\delta_{i j}}{p_i} }$

As before, $g_{i j}$ is a formula for the dot product of the $i$th and $j$th standard basis vectors, but now it depends on where you are in the simplex of population distributions.

We saw how this formula arises from information theory back in Part 7. I won’t repeat the calculation, but the idea is this. Fix a population distribution $p$ and consider the information of another one, say $q,$ relative to this. We get $I(q,p).$ If $q = p$ this is zero:

$\displaystyle{ \left. I(q,p)\right|_{q = p} = 0 }$

and this point is a local minimum for the relative information. So, the first derivative of $I(q,p)$ as we change $q$ must be zero:

$\displaystyle{ \left. \frac{\partial}{\partial q_i} I(q,p) \right|_{q = p} = 0 }$

But the second derivatives are not zero. In fact, since we’re at a local minimum, it should not be surprising that we get a positive definite matrix of second derivatives:

$\displaystyle{ g_{i j} = \left. \frac{\partial^2}{\partial q_i \partial q_j} I(q,p) \right|_{q = p} = 0 }$

And, this is the Fisher information metric! So, the Fisher information metric is a way of taking dot products between vectors in the simplex of population distribution that’s based on the concept of relative information.

This is not the place to explain Riemannian geometry, but any metric gives a way to measure angles and distances, and thus a way to define the gradient of a function. After all, the gradient of a function should point at right angles to the level sets of that function, and its length should equal the slope of that function:

So, if we change our way of measuring angles and distances, we get a new concept of gradient! The $i$th component of this new gradient vector field turns out to b

$(\nabla_g V)^i = g^{i j} \partial_j V$

where $g^{i j}$ is the inverse of the matrix $g_{i j},$ and we sum over the repeated index $j.$ As a sanity check, make sure you see why this is the usual Euclidean gradient when $g_{i j} = \delta_{i j}.$

Now suppose the fitness landscape is the good old Euclidean gradient of some function. Then it turns out that the replicator equation is a special case of gradient flow on the space of population distributions… but where we use the Fisher information metric to define our concept of gradient!

To get a feel for this, it’s good to start with the Lotka–Volterra equation, which describes how the total number of individuals of each type changes. Suppose the fitness landscape is the Euclidean gradient of some function $V$:

$\displaystyle{ f_i(P) = \frac{\partial V}{\partial P_i} }$

Then the Lotka–Volterra equation becomes this:

$\displaystyle{ \frac{d P_i}{d t} = \frac{\partial V}{\partial P_i} \, P_i }$

This doesn’t look like the gradient flow equation, thanks to that annoying $P_i$ on the right-hand side! It certainly ain’t the gradient flow coming from the function $V$ and the usual Euclidean gradient. However, it is gradient flow coming from $V$ and some other metric on the space

$\{ P \in \mathbb{R}^n : \; P_i \ge 0 \}$

For a proof, and the formula for this other metric, see Section 3.7 in this survey:

• Marc Harper, Information geometry and evolutionary game theory.

Now let’s turn to the replicator equation:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }$

Again, if the fitness landscape is a Euclidean gradient, we can rewrite the replicator equation as a gradient flow equation… but again, not with respect to the Euclidean metric. This time we need to use the Fisher information metric! I sketch a proof in my paper above.

In fact, both these results were first worked out by Shahshahani:

• Siavash Shahshahani, A New Mathematical Framework for the Study of Linkage and Selection, Memoirs of the AMS 17, 1979.

### New directions

All this is just the beginning! The ideas I just explained are unified in information geometry, where distance-like quantities such as the relative entropy and the Fisher information metric are studied. From here it’s a short walk to a very nice version of Fisher’s fundamental theorem of natural selection, which is familiar to researchers both in evolutionary dynamics and in information geometry.

You can see some very nice versions of this story for maximum likelihood estimators and linear programming here:

• Akio Fujiwara and Shun-ichi Amari, Gradient systems in view of information geometry, Physica D: Nonlinear Phenomena 80 (1995), 317–327.

Indeed, this seems to be the first paper discussing the similarities between evolutionary game theory and information geometry.

Dash Fryer (at Pomona College) and I have generalized this story in several interesting ways.

First, there are two famous ways to generalize the usual formula for entropy: Tsallis entropy and Rényi entropy, both of which involve a parameter $q.$ There are Tsallis and Rényi versions of relative entropy and the Fisher information metric as well. Everything I just explained about:

• conditions under which relative entropy is a Lyapunov function for the replicator equation, and

• conditions under which the replicator equation is a special case of gradient flow

generalize to these cases! However, these generalized entropies give modified versions of the replicator equation. When we set $q=1$ we get back the usual story. See

• Marc Harper, Escort evolutionary game theory.

My initial interest in these alternate entropies was mostly mathematical—what is so special about the corresponding geometries?—but now researchers are starting to find populations that evolve according to these kinds of modified population dynamics! For example:

• A. Hernando et al, The workings of the Maximum Entropy Principle in collective human behavior.

There’s an interesting special case worth some attention. Lots of people fret about the relative entropy not being a distance function obeying the axioms that mathematicians like: for example, it doesn’t obey the triangle inequality. Many describe the relative entropy as a distance-like function, and this is often a valid interpretation contextually. On the other hand, the $q=0$ relative entropy is one-half the Euclidean distance squared! In this case the modified version of the replicator equation looks like this:

$\displaystyle{ \frac{d p_i}{d t} = f_i(p) - \frac{1}{n} \sum_{j = 1}^n f_j(p) }$

This equation is called the projection dynamic.

Later, I showed that there is a reasonable definition of relative entropy for a much larger family of geometries that satisfies a similar distance minimization property.

In a different direction, Dash showed that you can change the way that selection acts by using a variety of alternative ‘incentives’, extending the story to some other well-known equations describing evolutionary dynamics. By replacing the terms $x_i f_i(x)$ in the replicator equation with a variety of other functions, called incentives, we can generate many commonly studied models of evolutionary dynamics. For instance if we exponentiate the fitness landscape (to make it always positive), we get what is commonly known as the logit dynamic. This amounts to changing the fitness landscape as follows:

$\displaystyle{ f_i \mapsto \frac{x_i e^{\beta f_i}}{\sum_j{x_j e^{\beta f_j}}} }$

where $\beta$ is known as an inverse temperature in statistical thermodynamics and as an intensity of selection in evolutionary dynamics. There are lots of modified versions of the replicator equation, like the best-reply and projection dynamics, more common in economic applications of evolutionary game theory, and they can all be captured in this family. (There are also other ways to simultaneously capture such families, such as Bill Sandholm’s revision protocols, which were introduced earlier in his exploration of the foundations of game dynamics.)

Dash showed that there is a natural generalization of evolutionarily stable states to ‘incentive stable states’, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. For the logit and projection dynamics, incentive stable states are simply evolutionarily stable states, and this happens frequently, but not always.

The third generalization is to look at different ‘time-scales’—that is, different ways of describing time! We can make up the symbol $\mathbb{T}$ for a general choice of ‘time-scale’. So far I’ve been treating time as a real number, so

$\mathbb{T} = \mathbb{R}$

But we can also treat time as coming in discrete evenly spaced steps, which amounts to treating time as an integer:

$\mathbb{T} = \mathbb{Z}$

More generally, we can make the steps have duration $h,$ where $h$ is any positive real number:

$\mathbb{T} = h\mathbb{Z}$

There is a nice way to simultaneously describe the cases $\mathbb{T} = \mathbb{R}$ and $\mathbb{T} = h\mathbb{Z}$ using the time-scale calculus and time-scale derivatives. For the time-scale $\mathbb{T} = \mathbb{R}$ the time-scale derivative is just the ordinary derivative. For the time-scale $\mathbb{T} = h\mathbb{Z},$ the time-scale derivative is given by the difference quotient from first year calculus:

$\displaystyle{ f^{\Delta}(z) = \frac{f(z+h) - f(z)}{h} }$

and using this as a substitute for the derivative gives difference equations like a discrete-time version of the replicator equation. There are many other choices of time-scale, such as the quantum time-scale given by $\mathbb{T} = q^{\mathbb{Z}},$ in which case the time-scale derivative is called the q-derivative, but that’s a tale for another time. In any case, the fact that the successive relative entropies are decreasing can be simply state by saying they have negative $\mathbb{T} = h\mathbb{Z}$ time-scale derivative. The continuous case we started with corresponds to $\mathbb{T} = \mathbb{R}.$

Remarkably, Dash and I were able to show that you can combine all three of these generalizations into one theorem, and even allow for multiple interacting populations! This produces some really neat population trajectories, such as the following two populations with three types, with fitness functions corresponding to the rock-paper-scissors game. On top we have the replicator equation, which goes along with the Fisher information metric; on the bottom we have the logit dynamic, which goes along with the Euclidean metric on the simplex:

From our theorem, it follows that the relative entropy (ordinary relative entropy on top, the $q = 0$ entropy on bottom) converges to zero along the population trajectories.

The final form of the theorem is loosely as follows. Pick a Riemannian geometry given by a metric $g$ (obeying some mild conditions) and an incentive for each population, as well as a time scale ($\mathbb{R}$ or $h \mathbb{Z}$) for every population. This gives an evolutionary dynamic with a natural generalization of evolutionarily stable states, and a suitable version of the relative entropy. Then, if there is an evolutionarily stable state in the interior of the simplex, the time-scale derivative of sum of the relative entropies for each population will decrease as the trajectories converge to the stable state!

When there isn’t such a stable state, we still get some interesting population dynamics, like the following:

See this paper for details:

• Marc Harper and Dashiell E. A. Fryer, Stability of evolutionary dynamics on time scales.

Next time we’ll see how to make the main idea work in finite populations, without derivatives or deterministic trajectories!

## Game Theory (Part 20)

11 March, 2013

Last time we tackled von Neumann’s minimax theorem:

Theorem. For every zero-sum 2-player normal form game,

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$

where $p'$ ranges over player A’s mixed strategies and $q'$ ranges over player B’s mixed strategies.

We reduced the proof to two geometrical lemmas. Now let’s prove those… and finish up the course!

But first, let me chat a bit about this theorem. Von Neumann first proved it in 1928. He later wrote:

As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved.

Von Neumann’s gave several proofs of this result:

• Tinne Hoff Kjeldesen, John von Neumann’s conception of the minimax theorem: a journey through different mathematical contexts, Arch. Hist. Exact Sci. 56 (2001) 39–68.

In 1937 he gave a proof which became quite famous, based on an important result in topology: Brouwer’s fixed point theorem. This says that if you have a ball

$B = \{ x \in \mathbb{R}^n : \|x\| \le 1 \}$

and a continuous function

$f: B \to B$

then this function has a fixed point, meaning a point $x \in B$ with

$f(x) = x$

You’ll often seen Brouwer’s fixed point theorem in a first course on algebraic topology, though John Milnor came up with a proof using just multivariable calculus and a bit more.

After von Neumann proved his minimax theorem using Brouwer’s fixed point theorem, the mathematician Shizuo Kakutani proved another fixed-point theorem in 1941, which let him get the minimax theorem in a different way. This is now called Kakutani fixed-point theorem.

In 1949, John Nash generalized von Neumann’s result to nonzero-sum games with any number of players: they all have Nash equilibria if we let ourselves use mixed strategies! His proof is just one page long, and it won him the Nobel prize!

Nash’s proof used the Kakutani fixed-point theorem. There is also a proof of Nash’s theorem using Brouwer’s fixed-point theorem; see here for the 2-player case and here for the n-player case.

Apparently when Nash explained his result to von Neumann, the latter said:

That’s trivial, you know. That’s just a fixed point theorem.

Maybe von Neumann was a bit jealous?

I don’t know a proof of Nash’s theorem that doesn’t use a fixed-point theorem. But von Neumann’s original minimax theorem seems to be easier. The proof I showed you last time comes from Andrew Colman’s book Game Theory and its Applications in the Social and Biological Sciences. In it, he writes:

In common with many people, I first encountered game theory in non-mathematical books, and I soon became intrigued by the minimax theorem but frustrated by the way the books tiptoed around it without proving it. It seems reasonable to suppose that I am not the only person who has encountered this problem, but I have not found any source to which mathematically unsophisticated readers can turn for a proper understanding of the theorem, so I have attempted in the pages that follow to provide a simple, self-contained proof with each step spelt out as clearly as possible both in symbols and words.

There are other proofs that avoid fixed-point theorems: for example, there’s one in Ken Binmore’s book Playing for Real. But this one uses transfinite induction, which seems a bit scary and distracting! So far, Colman’s proof seems simplest, but I’ll keep trying to do better.

### The lemmas

Now let’s prove the two lemmas from last time. A lemma is an unglamorous result which we use to prove a theorem we’re interested in. The mathematician Paul Taylor has written:

Lemmas do the work in mathematics: theorems, like management, just take the credit.

Let’s remember what we were doing. We had a zero-sum 2-player normal-form game with an $m \times n$ payoff matrix $A$. The entry $A_{ij}$ of this matrix says A’s payoff when player A makes choice $i$ and player B makes choice $j$. We defined this set:

$C = \{ A q' : \; q' \textrm{ is a mixed strategy for B} \} \subseteq \mathbb{R}^m$

For example, if

$\displaystyle{ A = \left( \begin{array}{rrr} 2 & 10 & 4 \\-2 & 1 & 6 \end{array} \right) }$

then $C$ looks like this:

We assumed that

$\displaystyle{ \min_{q'} \max_{p'} \; p' \cdot A q' > 0}$

This means there exists $p'$ with

$\displaystyle{ p' \cdot A q' > 0}$

and this implies that at least one of the numbers $(Aq')_i$ must be positive. So, if we define a set $N$ by

$\displaystyle{ N = \{(x_1, \dots, x_m) : x_i \le 0 \textrm{ for all } i\} \subseteq \mathbb{R}^m }$

then $Aq'$ can’t be in this set:

$\displaystyle{ Aq' \notin N }$

In other words, the set $C \cap N$ is empty.

Here’s what $C$ and $N$ look like in our example:

Next, we choose a point in $N$ and a point in $C$:

• let $r$ be a point in $N$ that’s as close as possible to $C,$

and

• let $s$ be a point in $C$ that’s as close as possible to $r,$

These points $r$ and $s$ need to be different, since $C \cap N$ is empty. Here’s what these points and the vector $s - r$ look like in our example:

To finish the job, we need to prove two lemmas:

Lemma 1. $r \cdot (s-r) = 0,$ $s_i - r_i \ge 0$ for all $i,$ and $s_i - r_i > 0$ for at least one $i.$

Proof. Suppose $r'$ is any point in $N$ whose coordinates are all the same those of $r,$ except perhaps one, namely the $i$th coordinate for one particular choice of $i.$ By the way we’ve defined $s$ and $r$, this point $r'$ can’t be closer to $s$ than $r$ is:

$\| r' - s \| \ge \| r - s \|$

This means that

$\displaystyle{ \sum_{j = 1}^m (r_j' - s_j)^2 \ge \sum_{j = 1}^m (r_j - s_j)^2 }$

But since $r_j' = r_j$ except when $j = i,$ this implies

$(r_i' - s_i)^2 \ge (r_i - s_i)^2$

Now, if $s_i \le 0$ we can take $r'_i = s_i.$ In this case we get

$0 \ge (r_i - s_i)^2$

so $r_i = s_i.$ On the other hand, if $s_i > 0$ we can take $r'_i = 0$ and get

$s_i^2 \ge (r_i - s_i)^2$

which simplifies to

$2 r_i s_i \ge r_i^2$

But $r_i \le 0$ and $s_i > 0,$ so this can only be true if $r_i = 0.$

In short, we know that either

$r_i = s_i$

or

$s_i > 0$ and $r_i = 0.$

So, either way we get

$(s_i - r_i) r_i = 0$

Since $i$ was arbitrary, this implies

$\displaystyle{ (s - r) \cdot r = \sum_{i = 1}^m (s_i - r_i) r_i = 0 }$

which is the first thing we wanted to show. Also, either way we get

$s_i - r_i \ge 0$

which is the second thing we wanted. Finally, $s_i - r_i \ge 0$ but we know $s \ne r,$ so

$s_i - r_i > 0$

for at least one choice of $i.$ And this is the third thing we wanted!   █

Lemma 2. If $Aq'$ is any point in $C$, then

$(s-r) \cdot Aq' \ge 0$

Proof. Let’s write

$Aq' = a$

for short. For any number $t$ between $0$ and $1$, the point

$ta + (1-t)s$

is on the line segment connecting the points $a$ and $s.$ Since both these points are in $C,$, so is this point $ta + (1-t)s,$ because the set $C$ is convex. So, by the way we’ve defined $s$ and $r$, this point can’t be closer to $r$ than $s$ is:

$\| r - (ta + (1-t)s) \| \ge \| r - s \|$

This means that

$\displaystyle{ (r + (1-t)s - ta) \cdot (r + (1-t)s - ta) \ge (r - s) \cdot (r - s) }$

With some algebra, this gives

$\displaystyle{ 2 (a - s)\cdot (s - r) \ge -t (a - s) \cdot (a - s) }$

Since we can make $t$ as small as we want, this implies that

$\displaystyle{ (a - s)\cdot (s - r) \ge 0 }$

or

$\displaystyle{ a \cdot (s - r) \ge s \cdot (s - r)}$

or

$\displaystyle{ a \cdot (s - r) \ge (s - r) \cdot (s - r) + r \cdot (s - r)}$

By Lemma 1 we have $r \cdot (s - r) \ge 0,$ and the dot product of any vector with itself is nonnegative, so it follows that

$\displaystyle{ a \cdot (s - r) \ge 0}$

And this is what we wanted to show!   █

### Conclusion

Proving lemmas is hard work, and unglamorous. But if you remember the big picture, you’ll see how great this stuff is.

We started with a very general concept of two-person game. Then we introduced probability theory and the concept of ‘mixed strategy’. Then we realized that the expected payoff of each player could be computed using a dot product! This brings geometry into the subject. Using geometry, we’ve seen that every zero-sum game has at least one ‘Nash equilibrium’, where neither player is motivated to change what they do—at least if they’re rational agents.

And this is how math works: by taking a simple concept and thinking about it very hard, over a long time, we can figure out things that are not at all obvious.

For game theory, the story goes much further than we went in this course. For starters, we should look at nonzero-sum games, and games with more than two players. John Nash showed these more general games still have Nash equilibria!

Then we should think about how to actually find these equilibria. Merely knowing that they exist is not good enough! For zero-sum games, finding the equilibria uses a subject called linear programming. This is a way to maximize a linear function given a bunch of linear constraints. It’s used all over the place—in planning, routing, scheduling, and so on.

Game theory is used a lot by economists, for example in studying competition between firms, and in setting up antitrust regulations. For that, try this book:

• Lynne Pepall, Dan Richards and George Norman, Industrial Organization: Contemporary Theory and Empirical Applications, Blackwell, 2008.

For these applications, we need to think about how people actually play games and make economic decisions. We aren’t always rational agents! So, psychologists, sociologists and economists do experiments to study what people actually do. The book above has a lot of case studies, and you can learn more here:

• Andrew Colman, Game Theory and its Applications in the Social and Biological Sciences, Routledge, London, 1982.

As this book title hints, we should also think about how game theory enters into biology. Evolution can be seen as a game where the winning genes reproduce and the losers don’t. But it’s not all about competition: there’s a lot of cooperation involved. Life is not a zero-sum game! Here’s a good introduction to some of the math:

• William H. Sandholm, Evolutionary game theory, 12 November 2007.

For more on the biology, get ahold of this classic text:

• John Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, 1982.

And so on. We’ve just scratched the surface!

## Game Theory (Part 19)

7 March, 2013

Okay, we’re almost done! We’ve been studying Nash equilibria for zero-sum 2-player normal form games. We proved a lot of things about them, but now we’ll wrap up the story by proving this:

Grand Theorem. For every zero-sum 2-player normal-form game, a Nash equilibrium exists. Moreover, a pair of mixed strategies $(p,q)$ for the two players is a Nash equilibrium if and only if each strategy is a maximin strategy.

### Review

Let’s remember what we’ve proved in Part 16 and Part 18:

Theorem 1. For any zero-sum 2-player normal form game,

$\displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' \ge \max_{p'} \min_{q'} \; p' \cdot A q'}$

Theorem 2. Given a zero-sum 2-player normal form game for which a Nash equilibrium exists, we have

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$     ★

Theorem 3. If $(p,q)$ is a Nash equilibrium for a zero-sum 2-player normal-form game, then $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B.

Theorem 4. Suppose we have a zero-sum 2-player normal form game for which ★ holds. If $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B, then $(p,q)$ is a Nash equilibrium.

### The plan

Today we’ll prove two more results. The first one is easy if you know some topology. The second one is the real heart of the whole subject:

Theorem 5. For every zero-sum 2-player normal-form game, a maximin strategy exists for each player.

Theorem 6. For every zero-sum 2-player normal-form game, ★ holds.

Putting all these results together, it’s easy to get our final result:

Grand Theorem. For every zero-sum 2-player normal-form game, a Nash equilibrium exists. Moreover, a pair of mixed strategies $(p,q)$ for the two players is a Nash equilibrium if and only if each strategy is a maximin strategy.

Proof. By Theorem 6 we know that ★ holds. By Theorem 5 we know that there exist maximin strategies for each player, say $p$ and $q.$. Theorem 4 says that if $p$ and $q$ are maximin strategies and ★ holds, then $(p,q)$ is a Nash equilibrium. So, a Nash equilibrium exists.

Moreover, if $(p,q)$ is any Nash equilibrium, Theorem 3 says $p$ and $q$ are maximin strategies. Conversely, since ★ holds, Theorem 4 says that if $p$ and $q$ are maximin strategies, $(p,q)$ is a Nash equilibrium.   █

### Minimax strategies exist

Okay, let’s dive in and get to work:

Theorem 5. For every zero-sum 2-player normal-form game, a maximin strategy exists for each player.

Proof. We’ll prove this only for player A, since the proof for player B is similar. Remember that a maximin strategy for player A is a mixed strategy that maximizes A’s security level, which is a function

$\displaystyle{ f(p') = \min_{q'} p' \cdot A q' }$

So, we just need to show that this function $f$ really has a maximum. To do this, we note that

$f : \{ \textrm{A's mixed strategies} \} \to \mathbb{R}$

is a continuous function defined on a compact set. As mentioned at the start of Part 17, this guarantees that $f$ has a maximum.   █

I apologize if this proof is hard to understand. All this stuff is standard if you know some topology, and a huge digression if you don’t, so I won’t go through the details. This is a nice example of how topology can be useful in other subjects!

### The key theorem

Now we finally reach the heart of the whole subject: von Neumann’s minimax theorem. Our proof will be a condensed version of the one in Andrew Colman’s 1982 book Game Theory and its Applications in the Social and Biological Sciences.

Theorem 6. For every zero-sum 2-player normal-form game,

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$     ★

holds.

Proof. Let’s write

$\displaystyle{ \max_{p'} \min_{q'} \; p' \cdot A q' = V}$

and

$\displaystyle{ \min_{q'} \max_{p'} \; p' \cdot A q' = W}$

Our goal is to prove ★, which says $V = W.$ By Theorem 1 we know

$V \le W$

So, we just need to prove

$V \ge W$

Here’s how we will do this. We will prove

$\textrm{if } W > 0 \textrm{ then } V \ge 0$

Since we’ll prove this for any game of the sort we’re studying, it’ll be true even if we add some real number $c$ to each entry of the payoff matrix $A_{ij}.$ Doing this adds $c$ to the expected payoff $p' \cdot A q',$ so it adds $c$ to $V$ and $W.$ So, it will follow that

$\textrm{if } V + c > 0 \textrm{ then } W + c\ge 0$

for any real number $c,$ and this implies

$V \ge W$

So, let’s get going.

Assume $W > 0.$ To prove that $V \ge 0$, remember that

$\displaystyle{ V = \max_{p'} \min_{q'} \; p' \cdot A q'}$

To show this is greater than or equal to zero, we just need to find some mixed strategy $p$ for player A such that

$\displaystyle{ \min_{q'} \; p \cdot A q' \ge 0}$

In other words, we need to find $p$ such that

$\displaystyle{ p \cdot A q' \ge 0}$     ★★

for all mixed strategies $q'$ for player B.

How can we find $p$ for which this ★★ is true? The key is to consider the set

$C = \{ A q' : \; q' \textrm{ is a mixed strategy for B} \} \subseteq \mathbb{R}^m$

For example, if

$\displaystyle{ A = \left( \begin{array}{rrr} 2 & 10 & 4 \\-2 & 1 & 6 \end{array} \right) }$

then $C$ looks like this:

Since $W > 0$, for any $Aq' \in C$ we have

$\displaystyle{ \max_{p'} \; p' \cdot A q' \ge \min_{q'} \max_{p'} \; p' \cdot A q' = W > 0}$

so there must exist $p'$ with

$\displaystyle{ p' \cdot A q' \ge W > 0}$

Since $p' = (p'_1, \dots, p'_m)$ is a mixed strategy, we have $p'_i \ge 0$ for all $1 \le i \le m.$ But since we’ve just seen

$\displaystyle{ \sum_{i=1}^m p'_i (Aq')_i = p' \cdot A q' \ge W > 0}$

at least one of the numbers $(Aq')_i$ must be positive. In other words, if we define a set $N$ by

$\displaystyle{ N = \{(x_1, \dots, x_m) : x_i \le 0 \textrm{ for all } i\} \subseteq \mathbb{R}^m }$

then $Aq'$ can’t be in this set:

$\displaystyle{ Aq' \notin N }$

So, we’ve seen that no point in $C$ can be in $N$:

$C \cap N = \emptyset$

Here’s what it looks like in our example:

Now the trick is to:

• let $r$ be a point in $N$ that’s as close as possible to $C,$

and

• let $s$ be a point in $C$ that’s as close as possible to $r,$

We need to use a bit of topology to be sure these points exist, since it means finding the minima of certain functions (namely, distances). But let’s not worry about that now! We’ll complete the proof with two lemmas:

Lemma 1. $r \cdot (s-r) = 0,$ $s_i - r_i \ge 0$ for all $i,$ and $s_i - r_i > 0$ for at least one $i.$

Lemma 2. If $Aq'$ is any point in $C$, then

$(s-r) \cdot Aq' \ge 0$

Here’s what the points $s$ and $r$ and the vector $s - r$ look like in our example:

Check to see that Lemmas 1 and 2 are true in this example! We’ll prove the lemmas later; right now let’s see how they get the job done.

First, by Lemma 1, the numbers $s_i - r_i$ are nonnegative and at least one is positive. So, we can define a mixed strategy $p$ for player A by defining

$\displaystyle{ p_i = \frac{1}{c} (s_i - r_i) }$

where $c > 0$ is a number chosen to make sure $\sum_i p_i = 1.$ (Remember, the probabilities $p_i$ must be $\ge 0$ and must sum to 1.) In other words,

$\displaystyle{ p = \frac{1}{c} (s - r) }$

Now, for any mixed strategy $q'$ for player B, we have $Aq' \in C$ and thus by Lemma 1

$(s-r) \cdot Aq' \ge 0$

Dividing by $c$, we get

$p \cdot Aq' \ge 0$

for all $q'.$ But this is ★★, which is what we wanted to prove! So we are done!   █

I will give the proofs of Lemmas 1 and 2 in the next part.

## Game Theory (Part 18)

5 March, 2013

We’re talking about zero-sum 2-player normal form games. Last time we saw that in a Nash equilibrium for a game like this, both players must use a maximin strategy. Now let’s try to prove the converse!

In other words: let’s try to prove that if both players use a maximin strategy, the result is a Nash equilibrium.

Today we’ll only prove this is true if a certain equation holds. It’s the cool-looking equation we saw last time:

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$

Last time we showed this cool-looking equation is true whenever our game has a Nash equilibrium. In fact, this equation is always true. In other words: it’s true for any zero-sum two-player normal form game. The reason is that any such game has a Nash equilibrium. But we haven’t showed that yet.

So, let’s do what we can easily do.

### Maximin strategies give Nash equilibria… sometimes

Let’s start by remembering some facts we saw in Part 16 and Part 17.

We’re studying a zero-sum 2-player normal form game. Player A’s payoff matrix is $A$, and player B’s payoff matrix is $-A.$

We saw that a pair of mixed strategies $(p,q),$ one for player A and one for player B, is a Nash equilibrium if and only if

1) $p \cdot A q \ge p' \cdot A q$ for all $p'$

and

2) $p \cdot A q \ge p \cdot A q'$ for all $q'.$

We saw that $p$ is a maximin strategy for player A if and only if:

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q' }$

We also saw that $q$ is a maximin strategy for player B if and only if:

$\displaystyle{ \max_{p'} \; p' \cdot A q = \min_{q'} \max_{p'} \; p' \cdot A q' }$

With these in hand, we can easily prove our big result for the day. We’ll call it Theorem 4, continuing with the theorem numbers we started last time:

Theorem 4. Suppose we have a zero-sum 2-player normal form game for which

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$     ★

holds. If $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B, then $(p,q)$ is a Nash equilibrium.

Proof. Suppose that $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B. Thus:

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q' }$

and

$\displaystyle{ \max_{p'} \; p' \cdot A q = \min_{q'} \max_{p'} \; p' \cdot A q' }$

But since ★ holds, the right sides of these two equations are equal. So, the left sides are equal too:

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \; p' \cdot A q }$     ★★

Now, since a function is always less than or equal to its maximum value, and greater than or equal to its minimum value, we have

$\displaystyle{ \min_{q'} \; p \cdot A q' \le p \cdot A q \le \max_{p'} \; p' \cdot A q }$

But ★★ says the quantity at far left here equals the quantity at far right! So, the quantity in the middle must equal both of them:

$\displaystyle{ \min_{q'} \; p \cdot A q' = p \cdot A q = \max_{p'} \; p' \cdot A q }$     ★★★

By the definition of minimum value, the first equation in ★★★:

$\displaystyle{ p \cdot A q = \min_{q'} \; p \cdot A q' }$

says that

$\displaystyle{ p \cdot A q \le p \cdot A q' }$

for all $q'.$ This is condition 2) in the definition of Nash equilibrium. Similarly, by the definition of maximum value, the second equation in ★★★:

$\displaystyle{ p \cdot A q = \max_{p'} \; p' \cdot A q }$

says that

$\displaystyle{ p \cdot A q \ge p' \cdot A q }$

for all $p'.$ This is condition 1) in the definition of Nash equilibrium. So, the pair $(p,q)$ is a Nash equilibrium. █

## Game Theory (Part 17)

27 February, 2013

Last time we saw the official definition of maximin strategy. Now we’ll prove something really important. In a Nash equilibrium for a zero-sum game, both players must be using a maximin strategy!

To prove this we will need to look at a lot of maxima and minima. We will always assume these maxima and minima exist. For what we’re doing, this is true. This can be proved using an important result from topology: given a continuous real valued function $f: X \to \mathbb{R}$ on a compact set $X,$ it has a minimum and a maximum. If you haven’t learned this yet… well, I hope you do by the time you get a degree in mathematics.

### Nash equilibria give maximin strategies

Theorem 1. For any zero-sum 2-player normal form game,

$\displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' \ge \max_{p'} \min_{q'} \; p' \cdot A q'}$

Proof. Since a function is always greater than or equal to its minimum value, for any mixed strategies $p$ and $q$ we have

$\displaystyle{ p \cdot A q \ge \min_{q'} \; p \cdot A q'}$

If one function is $\ge$ another, its maximum value is $\ge$ the other function’s maximum value. So, the above inequality gives

$\displaystyle{ \max_{p'} p' \cdot A q \ge \max_{p'} \min_{q'} \; p' \cdot A q'}$

The right side here is just a number; the left side is a function of $q$. Since this function is always greater than or equal to the right side, so is its minimum:

$\displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' \ge \max_{p'} \min_{q'} \; p' \cdot A q'}$   █

Next, we’ll show this cool-looking inequality becomes an equation when a Nash equilibrium exists. In fact a Nash equilibrium always exists, but we haven’t shown this yet. So:

Theorem 2. Given a zero-sum 2-player normal form game for which a Nash equilibrium exists,

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$

Proof. Suppose a Nash equilibrium $(p,q)$ exists. Then for any mixed strategy $p'$ for player A, we have

$\displaystyle{ p \cdot A q \ge p' \cdot A q}$

since A can’t improve their payoff by switching their mixed strategy. Similarly, for any mixed strategy $q'$ for player B, $p \cdot B q \ge p \cdot B q',$ since B can’t improve their payoff by switching their mixed strategy. But $B = -A,$ so this says

$\displaystyle{ p \cdot A q' \ge p \cdot A q}$

Combining these two inequalities, we get

$\displaystyle{ p \cdot A q' \ge p' \cdot A q}$

for all $p', q'.$ The minimum of the left side over all $q'$ must be greater than or equal to the right side, which doesn’t depend on $q':$

$\displaystyle{ \min_{q'} p \cdot A q' \ge p' \cdot A q}$

Now the maximum of the right side over all $p'$ must be less than or equal to the left side, which doesn’t depend on $p':$

$\displaystyle{ \min_{q'} p \cdot A q' \ge \max_{p'} p' \cdot A q}$

It follows that

$\begin{array}{ccl} \displaystyle{ \max_{p'} \min_{q'} p' \cdot A q'} &\ge& \displaystyle{ \min_{q'} p \cdot A q'} \\ \\ &\ge& \displaystyle{ \max_{p'} p' \cdot A q } \\ \\ &\ge& \displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' } \end{array}$

The middle inequality here is the one we saw a moment ago. The first inequality comes from the fact that the maximum value of a function is greater than or equal to any of its values:

$\textrm{for all } x, \; \displaystyle{ \max_{x'} f(x') \ge f(x) }$

so

$\displaystyle{ \max_{p'} \min_{q'} p' \cdot A q' \ge \min_{q'} p \cdot A q' }$

And the last inequality comes from the fact that the values of a function are always greater than or equal to its minimum value:

$\textrm{for all } x, \; \displaystyle{ f(x) \ge \max_{x'} f(x') }$

so

$\displaystyle{ \max_{p'} p' \cdot A q \ge \min_{q'} \max_{p'} p' \cdot A q' }$

Putting all these inequalities together, we get

$\displaystyle{ \max_{p'} \min_{q'} p \cdot A q' \ge \min_{q'} \max_{p'} p' \cdot A q}$

On the other hand, Theorem 1 gives an inequality pointing the other way:

$\displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' \ge \max_{p'} \min_{q'} \; p' \cdot A q'}$

So, the two sides must be equal:

$\displaystyle{ \max_{p'} \min_{q'} \; p' \cdot A q' = \min_{q'} \max_{p'} \; p' \cdot A q'}$

which is what we were trying to show!   █

What’s the point of this cool-looking equation? One point is it connects the terms ‘minimax’ and ‘maximin’. There’s a lot more to say about it. But right now, we need it for one big thing: it lets us prove that in a Nash equilibrium for a zero-sum game, both players must be using a maximin strategy!

Theorem 3. If $(p,q)$ is a Nash equilibrium for a zero-sum 2-player normal-form game, then $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B.

Proof. Let’s assume that $(p,q)$ is a Nash equilibrium. We need to show that $p$ is a maximin strategy for player A and $q$ is a maximin strategy for player B.

First let’s remember some things we saw in the proof of Theorem 2. We assumed that $(p,q)$ is a Nash equilibrium, and we showed

$\begin{array}{ccl} \displaystyle{ \max_{p'} \min_{q'} p' \cdot A q'} &\ge& \displaystyle{ \min_{q'} p \cdot A q'} \\ \\ &\ge& \displaystyle{ \max_{p'} p' \cdot A q } \\ \\ &\ge& \displaystyle{ \min_{q'} \max_{p'} p' \cdot A q' } \end{array}$

If this looks confusing, go back to the proof of Theorem 2. But now look at the beginning and the end of this chain of inequalities. We saw in Theorem 2 that they’re equal! So all the stuff in the middle has to be equal, too. In particular,

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q' }$

Last time we saw this means that $p$ is a maximin strategy for player A. Also,

$\displaystyle{ \max_{p'} \; p' \cdot A q = \min_{q'} \max_{p'} \; p' \cdot A q' }$

Last time we saw this means that that $q$ is a maximin strategy for player B.   █

Whew! That was quite a workout!

But we’re on a mission here, and we’re not done. We’ve shown that if $(p,q)$ is a Nash equilibrium, $p$ and $q$ are maximin strategies. Next time we’ll try to show the converse: if $p$ and $q$ are maximin strategies, then $(p,q)$ is a Nash equilibrium.

## Game Theory (Part 16)

26 February, 2013

Last time we looked at a zero-sum game and noticed that when both players use their maximin strategy, we get a Nash equilibrium. This isn’t a coincidence—it always works this way for zero-sum games! This fact is not obvious. It will take some work to prove it. This will be our first really big theorem.

But first, we need to define a maximin strategy.

I already tried to give you the rough idea: it’s where you pick a mixed strategy that maximizes your expected payoff while assuming that no matter what mixed strategy you pick, the other player will pick the mixed strategy that minimizes your expected payoff. But to prove things about this concept, we need a more precise definition.

### The setup

First, remember the setup. We’re talking about a zero-sum 2-player normal form game. So as usual, we assume:

• Player A has some set of choices $i = 1, \dots, m.$

• Player B has some set of choices $j = 1, \dots, n.$

• If player A makes choice $i$ and player B makes choice $j,$ the payoff to player A is $A_{ij}$ and the payoff to player B is $B_{ij}.$

But, because it’s zero-sum game, we also assume

$A_{ij} + B_{ij} = 0$

for all choices $i = 1, \dots, m$ and $j = 1, \dots, n.$ In other words, the payoff matrices $A$ and $B$, whose entries are these numbers $A_{ij}$ and $B_{ij},$ sum to zero:

$A + B = 0$

We’ll let $p$ to stand for A’s mixed strategy, and let $q$ stand for B’s mixed strategy. These are probability distributions. So, $p = (p_1, \dots, p_m)$ is a vector in $\mathbb{R}^m$ obeying

$0 \le p_i \le 1 , \quad \displaystyle{ \sum_{i = 1}^m p_i = 1 }$

while $q = (q_1 , \dots, q_m)$ is a vector in $\mathbb{R}^n$ obeying

$0 \le q_j \le 1 , \quad \displaystyle{ \sum_{j=1}^n q_j = 1 }$

Given these mixed strategies, A’s expected payoff is

$p \cdot A q$

while B’s expected payoff is

$p \cdot B q = - p \cdot A q$

Since $B = -A,$ we will never mention the matrix $B$ again!

### Minima and maxima

As you might guess, we’re going to talk a lot about minima and maxima. So, we should be really clear about what they are!

Definition 1. Suppose $f : X \to \mathbb{R}$ is any real-valued function defined on any set $X.$ We say $f$ has a maximum at $x \in X$ if

$f(x) \ge f(x')$

for all $x' \in X.$ In this case we call the number $f(x)$ the maximum value of $f,$ and we write

$\displaystyle{ f(x) = \max_{x' \in X} f(x') }$

Note that mathematicians use ‘maximum’ to mean an element $x$ where the function $f$ gets as big as possible… and use ‘maximum value’ to mean how big $f$ gets there. This is a bit different than ordinary English!

Also note that a maximum may not exist! And if it exists, it may not be unique. For example, the function $x^2$ on the real line has no maximum, while the sine function has lots. So unless we know for sure a function has exactly one maximum, we should talk about a maximum instead of the maximum.

Similar stuff is true for minima, too:

Definition 1. Suppose $f : X \to \mathbb{R}$ is any real-valued function defined on any set $X.$ We say $f$ has a minimum at $x \in X$ if

$f(x) \le f(x')$

for all $x' \in X.$ In this case we call the number $f(x)$ the minimum value of $f,$ and we write

$\displaystyle{ f(x) = \min_{x' \in X} f(x') }$

### Security levels

Pretend you’re player A. Since it’s a zero-sum game, we know B will try to maximize their payoff… which means minimizing your payoff. So, no matter what your mixed strategy $p$ is, you should expect that B will find a mixed strategy $q'$ that’s a minimum of

$p \cdot A q$

So, you should only expect to get a payoff of

$\displaystyle{ \min_{q' \in \{ \textrm{B's mixed strategies\}}} \; p \cdot A q' }$

We call this player A’s security level. For short, let’s write it as

$\displaystyle{ \min_{q'} \; p \cdot A q' }$

A’s security level is a function of $p.$ We graphed this function in the example we looked at last time. It’s the dark curve here:

The different lines show $p \cdot A q$ for different choices of $q.$ The minimum of all these gives the dark curve.

### Maximin strategies

Last time we found A’s maximin strategy by finding the maximum of A’s security level—the place where that dark curve is highest!

Suppose $p$ is a maximin strategy for player A. Since it maximizes A’s security level,

$\displaystyle{ \min_{q'} \; p \cdot A q' \ge \min_{q'} \; p' \cdot A q' }$

for all mixed strategies $p'$ for player A. In other words, we have

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q' }$

If you look at this formula, you can really see why we use the word ‘maximin’!

It’s a little-known false fact that the concept of maximin strategy was named after the Roman emperor Maximin. Such an emperor really does exist! But in fact, there were two Roman emperors named Maximin. So he exists, but he’s not unique.

### Definitions

Now we’re ready for some definitions that summarize what we’ve learned.

Definition 3. Given a zero-sum 2-player normal form game and a mixed strategy $p$ for player A, we define A’s security level to be

$\displaystyle{ \min_{q'} \; p \cdot A q' }$

Definition 4. Given a zero-sum 2-player normal form game, we say a mixed strategy $p$ for player A is a maximin strategy if

$\displaystyle{ \min_{q'} \; p \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q' }$

We can also make up definitions that apply to player B. We just need to remember that there’s a minus sign in B’s expected payoff:

Definition 5. Given a zero-sum 2-player normal form game and a mixed strategy $q$ for player B, we define B’s security level to be

$\displaystyle{ \min_{p'} \; - p' \cdot A q }$

Definition 6. Given a zero-sum 2-player normal form game, we say a mixed strategy $q'$ for B is a minimax strategy for B if

$\displaystyle{ \min_{p'} \; - p' \cdot A q = \max_{q'} \min_{p'} \; - p' \cdot A q' }$

But there’s an easy fact about maxima and minima that lets us simplify this last definition. To make a function $-f$ as big as possible is the same as making $f$ as small as possible and then sticking a minus sign in front:

$\displaystyle{ \max_{x \in X} -f(x) = - \min_{x \in X} f(x)}$

Similarly, to minimize $-f$ is the same as maximizing $f$ and then taking the negative of that:

$\displaystyle{ \min_{x \in X} -f(x) = - \max_{x \in X} f(x)}$

Using these rules, we get this little theorem:

Theorem 1. Given a zero-sum 2-player normal form game, $q$ is a minimax strategy for B if and only if:

$\displaystyle{ \max_{p'} \; p' \cdot A q = \min_{q'} \max_{p'} \; p' \cdot A q' }$

Proof. Suppose that $q$ is a minimax strategy for B. By Definition 6,

$\displaystyle{ \min_{p'} \; - p' \cdot A q = \max_{q'} \min_{p'} \; - p' \cdot A q' }$

Repeatedly using the rules for pushing minus signs through max and min, we see:

$\displaystyle{ - \max_{p'} p \cdot A q = \max_{q'} \left( - \max_{p'} \; p \cdot A q \right) }$

and then

$\displaystyle{ - \max_{p'} p \cdot A q = - \min_{q'} \max_{p'} \; p' \cdot A q' }$

Taking the negative of both sides, we get the equation we want:

$\displaystyle{ \max_{p'} \; p' \cdot A q = \min_{q'} \max_{p'} \; p' \cdot A q' }$

We can also reverse our calculation and show that this equation implies $q$ is a maximin strategy. So, this equation is true if and only if $q$ is a maximin strategy for B.   █

This little theorem talks about a minimum of maxima instead of a maximum of minima. This is one reason people talk about ‘minimax strategies’. In fact what we’re calling a maximin strategy, people often call a minimax strategy!

Next time we’ll start proving some really important things, which were first shown by the great mathematician John von Neumann:

• If both players in a zero-sum game use a maximin strategy, we get a Nash equilibrium.

• In any Nash equilibrium for a zero-sum game, both players must be using a maximin strategy!

• For any zero-sum game, there exists a Nash equilibrium.

Now we’re talking about mixed strategies, of course. We already saw a while back that if we only consider pure strategies, there are games like rock-paper-scissors and matching pennies that don’t have a Nash equilibrium.

Before I quit, one more false fact. Just as the maximin strategy was named after the emperor Maximin, the minimax strategy was named after the emperor Minimax! I mentioned that Emperor Maximin really exists, but is not unique. The case of Emperor Minimax is even more interesting. He’s really unique… but he does not exist!

## Game Theory (Part 15)

20 February, 2013

In Part 13 we saw an example of a Nash equilibrium where both players use a mixed strategy: that is, make their choice randomly, using a certain probability distribution on their set of mixed strategies.

We found this Nash equilibrium using the oldest method known to humanity: we guessed it. Guessing is what mathematicians do when we don’t know anything better to do. When you’re trying to solve a new kind of problem, it’s often best to start with a very easy example, where you can guess the answer. While you’re doing this, you should try to notice patterns! These can give you tricks that are useful for harder problems, where pure guessing won’t usually work so well.

So now let’s try a slightly harder example. In Part 13 we looked at the game ‘matching pennies’, which was a zero-sum game. Now we’ll look at a modified version. It will still be a zero-sum game. We’ll find a Nash equilibrium by finding each player’s ‘maximin strategy’. This technique works for zero-sum two-player normal-form games, but not most other games.

Why the funny word ‘maximin’?

In this sort of strategy, you assume that no matter what you do, the other player will do whatever minimizes your expected payoff. This is a sensible assumption in a zero-sum game! They are trying to maximize their expected payoff, but in a zero-sum game whatever they win, you lose… so they’ll minimize your expected payoff.

So, in a maximin strategy you try to maximize your expected payoff while assuming that given whatever strategy you use, your opponent will use a strategy that minimizes your expected payoff.

Think about that sentence: it’s tricky! But it explains the funny word ‘maximin’. In brief, you’re trying to maximize the minimum of your expected payoff.

We’ll make this more precise in a while… but let’s dive in and look at a new game!

### Matching pennies—modified version

In this new game, each player has a penny and must secretly turn the penny to either heads or tails. They then reveal their choices simultaneously. If the pennies do not match—one heads and one tails—B wins 10 cents from A. If the pennies are both heads, player A wins 10 cents from player B. And if the pennies are both tails, player A wins 20 cents from player B

Let’s write this game in normal form! If we say

• choice 2 is tails

then the normal form looks like this:

 1 2 1 (10,-10) (-10,10) 2 (-10,10) (20,-20)

We can also break this into two matrices in the usual way, one showing the payoff for A:

$A = \left( \begin{array}{rr} 10 & -10 \\ -10 & 20 \end{array} \right)$

and one showing the payoff for B:

$B = \left( \begin{array}{rr} -10 & 10 \\ 10 & -20 \end{array} \right)$

Before we begin our real work, here are some warmup puzzles:

Puzzle 1. Is this a zero-sum game?

Puzzle 2. Is this a symmetric game?

Puzzle 3. Does this game have a Nash equilibrium if we consider only pure strategies?

Puzzle 4. Does this game have a dominant strategy for either player A or player B?

### A’s expected payoff

Remember that we write A’s mixed strategy as

$p = (p_1, p_2)$

Here $p_1$ is the probability that A makes choice 1: that is, chooses heads. $p_2$ is the probability that A makes choice 2: that is, chooses tails. Similarly, we write B’s mixed strategy as

$q = (q_1, q_2)$

Here $q_1$ is the probability that B makes choice 1, and $q_2$ is the probability that B makes choice 2.

Given all this, the expected value of A’s payoff is

$p \cdot A q$

We saw this back in Part 12.

Now let’s actually calculate the expected value of A’s payoff. It helps to remember that probabilities add up to 1, so

$p_1 + p_2 = 1, \qquad q_1 + q_2 = 1$

It’s good to have fewer variables to worry about! So, we’ll solve for $p_2$ and $q_2$:

$p_2 = 1 - p_1, \qquad q_2 = 1 - q_1$

and write

$p = (p_1, 1 - p_1)$
$q = (q_1, 1 - q_1)$

Then we can calculate A’s expected payoff. We start by computing $A q$:

$\begin{array}{ccl} A q &=& \left( \begin{array}{rr} 10 & -10 \\ -10 & 20 \end{array} \right) \left( \begin{array}{c} q_1 \\ 1 - q_1 \end{array} \right) \\ \\ &=& \left( \begin{array}{c} 10 q_1 - 10(1-q_1) \\ -10 q_1 + 20(1 - q_1) \end{array} \right) \\ \\ &=& \left( \begin{array}{c} 20 q_1 - 10 \\ 20 - 30 q_1 \end{array} \right) \end{array}$

This gives

$\begin{array}{ccl} p \cdot A q &=& (p_1 , 1 - p_1) \cdot (20 q_1 - 10 , 20 - 30 q_1) \\ \\ &=& p_1(20 q_1 - 10) + (1-p_1)(20 - 30 q_1) \\ \\ &=& 20 - 30 p_1 - 30 q_1 + 50 p_1 q_1 \end{array}$

This is a bit complicated, so let’s graph A’s expected payoff in the extreme cases where B either makes choice 1 with probability 100%, or choice 2 with probability 100%.

If B makes choice 1 with probability 100%, then $q_1 = 1.$ Put this in the formula above and we see A’s expected payoff is

$p \cdot A q = 20 p_1 - 10$

If we graph this as a function of $p_1$ we get a straight line:

On the other hand, if B makes choice 2 with probability 100%, then $q_2 = 1$ so $q_1 = 0.$ Now A’s expected payoff is

$p \cdot A q = 20 - 30 p_1$

Graphing this, we get:

The point of these graphs becomes clearer if we draw them both together:

Some interesting things happen where the lines cross! We’ll get A’s maximin strategy by picking the value of $p_1$ where these lines cross! And in fact, there’s a Nash equilibrium where player A chooses this value of $p_1,$ and B uses the same trick to choose his value of $q_1.$

But we can already see something simpler. We’ve drawn graphs of A’s expected payoff for two extreme cases of player B’s mixed strategy: $q_1 = 1$ and $q_1 = 0.$ Suppose B uses some other mixed strategy, say $q_1 = 2/5$ for example. Now A’s expected payoff will be something between the two lines we’ve drawn. Let’s see what it is:

$\begin{array}{ccl} p \cdot A q &=& 20 - 30 p_1 - 30 q_1 + 50 p_1 q_1 \\ \\ &=& 20 - 30 p_1 - 30 \cdot \frac{2}{5} + 50 p_1 \cdot \frac{2}{5} \\ \\ &=& 8 - 10p_1 \end{array}$

If we graph this along with our other two lines, we get this:

We get a line between the other two, as we expect. But more importantly, all three lines cross at the same point!

That’s not a coincidence, that’s how it always works. If we draw lines for more different choices of B’s mixed strategy, they look like this:

The point where they all intersect looks important! It is. Soon we’ll see why.

### A’s maximin strategy

Now we’ll find A’s maximin strategy. Let me explain the idea a bit more. The idea is that A is cautious, so he wants to choose $p_1$ that maximizes his expected payoff in the worst-case scenario. In other words, A wants to maximize his expected payoff assuming that B is trying to minimize A’s expected payoff.

Read that last sentence a few times, because it’s confusing at first.

Why would B try to minimize A’s expected payoff? B isn’t nasty: he’s just a rational agent trying to maximize his own expected payoff! But if you solved Puzzle 1, you know this game is a zero-sum game. So A’s payoff is minus B’s payoff. Thus, if B is trying to maximize his own expected payoff, he’s also minimizing A’s expected payoff.

Given this, let’s figure out what A should do. A should look at different mixed strategies B can use, and graph A’s payoff as a function of $p_1.$ We did this:

Then, he should choose $p_1$ that maximizes his expected payoff in the worst-case scenario. To do this, he should focus attention on the lines that give him the smallest expected payoff:

This is the dark curve. It’s not very ‘curvy’, but mathematicians consider a broken line to be an example of a curve!

To find this dark curve, all that matters are extreme cases of B’s strategy: the case $q_1 = 1$ and the case $q_1 = 0.$ So we can ignore the intermediate cases, and simplify our picture:

The dark curve is highest where the two lines cross! This happens when

$-20 + 10 p_1 = 20 - 30 p_1$

or in other words

$p_1 = 3/5$

So, A’s maximin strategy is

$p = (3/5, 2/5)$

It’s the mixed strategy that maximizes A’s expected payoff given that B is trying to minimize A’s expected payoff.

### B’s expected payoff

Next let’s give the other guy a chance. Let’s work out B’s expected payoff and maximin strategy. The expected value of B’s payoff is

$p \cdot B q$

We could calculate this just like we calculated $p \cdot A q$, but it’s quicker to remember that this is a zero-sum game:

$B = - A$

so that

$p \cdot B q = - p \cdot A q$

and since we already saw that

$p \cdot A q = 20 - 30 p_1 - 30 q_1 + 50 p_1 q_1$

now we have

$p \cdot B q = -20 + 30 p_1 + 30 q_1 - 50 p_1 q_1$

### B’s maximin strategy

To figure out B’s maximin strategy, we’ll copy what worked for player A. First we’ll work out B’s expected payoff in two extreme cases:

• The case where A always makes choice 1: $p_1 = 1.$

• The case where A always makes choice 2: $p_1 = 0.$

We’ll get two functions of $q_1$ whose graphs are lines. Then we’ll find where these lines intersect!

Let’s get to work. When $p_1 = 1$ we get

$\begin{array}{ccl} p \cdot B q &=& -20 + 30 p_1 + 30 q_1 - 50 p_1 q_1 \\ \\ &=& 10 - 20 p_1 \end{array}$

When $p_1 = 0$ we get

$p \cdot B q = -20 + 30 q_1$

I won’t graph these two lines, but they intersect when

$10 - 20 q_1 = -20 + 30 q_1$

or

$q_1 = 3/5$

Hmm, it’s sort of a coincidence that we’re getting the same number that we got for $p_1;$ it won’t always work this way! But anyway, we see that B’s maximin strategy is

$q = (3/5, 2/5)$

### A Nash equilibrium

Now let’s put the pieces together. What happens in a zero-sum game when player A and player B both choose a maximin strategy? It’s not too surprising: we get a Nash equilibrium!

Let’s see why in this example. (We can do a general proof later.) We have

$p = (3/5, 2/5)$

and

$q = (3/5, 2/5)$

To show that $p$ and $q$ form a Nash equilibrium, we need to check that neither player could improve their expected payoff by changing to a different mixed strategy. Remember:

Definition. Given a 2-player normal form game, a pair of mixed strategies $(p,q),$ one for player A and one for player B, is a Nash equilibrium if:

1) For all mixed strategies $p'$ for player A, $p' \cdot A q \le p \cdot A q.$

2) For all mixed strategies $q'$ for player B, $p \cdot B q' \le p \cdot B q.$

I’ll check clause 1) and I’ll let you check clause 2), which is similar. For clause 1) we need to check

$p' \cdot A q \le p \cdot A q$

for all mixed strategies $p'.$ Just like we did with $p,$ we can write

$p' = (p'_1, 1 - p'_1)$

We can reuse one of our old calculations to see that

$\begin{array}{ccl} p' \cdot A q &=& 20 - 30 p'_1 - 30 q_1 + 50 p'_1 q_1 \\ \\ &=& 20 - 30 p'_1 - 30 \cdot \frac{3}{5} + 50 p'_1 \cdot \frac{3}{5} \\ \\ &=& 2 \end{array}$

Hmm, A’s expected payoff is always 2, no matter what mixed strategy he uses, as long as B uses his maximin strategy $q!$ So of course we have

$p' \cdot A q = p \cdot A q = 2$

which implies that

$p' \cdot A q \le p \cdot A q$

If you don’t believe me, we can calculate $p \cdot A q$ and see it equals $p' \cdot A q:$

$\begin{array}{ccl} p \cdot A q &=& 20 - 30 p_1 - 30 q_1 + 50 p_1 q_1 \\ \\ &=& 20 - 30 \cdot \frac{3}{5} - 30 \cdot \frac{3}{5} + 50 \cdot \frac{3}{5} \cdot \frac{3}{5} \\ \\ &=& 2 \end{array}$

Yup, it’s 2.

Puzzle 5. Finish showing that $(p,q)$ is a Nash equilibrium by showing that for all mixed strategies $q'$ for player B, $p \cdot B q' \le p \cdot B q.$

Puzzle 6. How high does this dark curve get?

You have the equations for the two lines, so you can figure this out. Explain the meaning of your answer!

## Game Theory (Part 14)

18 February, 2013

Here is the first test in our game theory course. If you’re following along on this blog, you can do it and then look at the answers below.

### Definitions

1. Define a Nash equilibrium for a 2-player normal form
game.

2. Define the expected value of some function with respect to some probability distribution.

### Proof

3. Suppose $A$ and $B$ are the payoff matrices for 2-player normal form game. Prove that if $(i,j)$ is a Nash equilibrium, there cannot exist a choice $i'$ for player A that strictly dominates choice $i.$

### 2-Player normal form games

Consider this 2-player normal form game:

$\begin{array}{rrr} (-2,1) & (4,2) & (2,-5) \\ (2,3) & (-6,2) & (5,3) \\ (1,0) & (2,-4) & (4,-3) \end{array}$

4. Find all the Nash equilibria. Draw a box around each Nash
equilibrium.

For problems 5-8 do not simplify your answers by working out the binomial coefficients, etc.

### Probabilities

5. If you draw 3 cards from a well-shuffled standard deck,
what is the probability that at least 2 are hearts?

6. If you flip 4 fair and independent coins, what is the probability that exactly 2 land heads up?

### Expected values

7. Suppose you pay $2 to enter a lottery. Suppose you have a 1% chance of winning$100, and otherwise you win nothing. What is the expected value of your payoff, including your winnings but also the money you paid?

8. Suppose you draw two cards from a well-shuffled standard deck. Suppose you win $100 if you get two aces,$10 if you get one ace, and nothing if you get no aces. What is your expected payoff?

### Extra credit

About how many ways are there to choose 3 atoms from all the atoms in the observable universe? Since this question is for extra credit, I’ll make it hard: I’ll only accept answers written in scientific notation, for example $2 \times 10^{50}.$

And here are the answers to the first test.

### Definitions

1. Given a 2-player normal form game where A’s
payoff is $A_{ij}$ and B’s payoff is $B_{ij}$, a pair of choices $(i,j)$ is a Nash equilibrium if:

1) For all $1 \le i' \le m,$ $A_{i'j} \le A_{ij}.$

2) For all $1 \le j' \le n,$ $B_{ij'} \le B_{ij}.$

2. The expected value of a function $f : X \to \mathbb{R}$ with respect to a probability distribution $p$ on the finite set $X$ is

$\sum_{i \in X} f(i) p_i$

Note that a good definition makes it clear what term is being defined, by writing it in boldface or underlining it. Also, it’s best if all variables used in the definition are explained: here they are $f, X$ and $p.$

### Proof

3. Theorem. Suppose $A$ and $B$ are the payoff matrices for 2-player normal form game. If $(i,j)$ is a Nash equilibrium, there cannot exist a choice $i'$ for player A that strictly dominates choice $i$.

Proof. Suppose that $(i,j)$ is a Nash equilibrium. Then

$A_{ij} \ge A_{i' j}$

for any choice $i'$ for player A. On the other hand, if choice $i'$ strictly dominates choice $i$, then

$A_{i'j} > A_{i j}$

This contradicts the previous inequality, so there cannot exist
a choice $i'$ for player A that strictly dominates choice $i$. █

Note that the really good way to write a proof involves:

• First writing “Theorem” and stating the theorem.
• Saying “Proof” at the start of the proof.
• Giving an argument that starts with the hypotheses and leads to the conclusion.
• Marking the end of the proof with “Q.E.D.” or “█” or something similar.

### 2-Player normal form games

4. In this 2-player normal form game, the three Nash equilibria are marked with boxes:

$\begin{array}{rrr} (-2,1) & \boxed{(4,2)} & (2,-5) \\ \boxed{(2,3)} & (-6,2) & \boxed{(5,3)} \\ (1,0) & (2,-4) & (4,-3) \end{array}$

### Probabilities

5. If you draw 3 cards from a well-shuffled standard deck,
what is the probability that at least 2 are hearts?

$\displaystyle{ \frac{\binom{13}{2} \binom{39}{1}}{\binom{52}{3}} + \frac{\binom{13}{3} \binom{39}{0}}{\binom{52}{3}} }$

since there are:

$\binom{52}{2}$ ways to choose 3 cards, all equally likely,

$\binom{13}{2}$ ways to choose 2 hearts and $\binom{39}{1}$ ways to chose 1 non-heart, and

$\binom{13}{3}$ ways to choose 2 hearts and $\binom{39}{0}$ ways to chose 0 non-hearts.

$\displaystyle{ \frac{\binom{13}{2} \cdot 39 }{\binom{52}{2}} + \frac{\binom{13}{3}} {\binom{52}{2}}}$

This is equal since $\binom{39}{0} = 1$ and $\binom{39}{1} = 39.$

6. If you flip 4 fair and independent coins, what is the probability that exactly 2 land heads up?

$\displaystyle{ \frac{\binom{4}{2}}{2^4} }$

since there are $2^4$ possible ways the coins can land, all
equally likely, and $\binom{4}{2}$ ways to choose 2 of the 4 coins to land heads up.

### Expected values

7. Suppose you pay $2 to enter a lottery. Suppose you have a 1% chance of winning$100, and otherwise you win nothing. What is the expected value of your payoff, including your winnings but also the money you paid?

$0.01 \cdot \98 \quad + \quad 0.99 \cdot (-\ 2)$

$0.01 \cdot \100 \quad + \quad 0.99 \cdot \0 \quad - \quad \2$

Of course these are equal.

8. Suppose you draw two cards from a well-shuffled standard deck. Suppose you win $100 if you get two aces,$10 if you get one ace, and nothing if you get no aces. What is your expected payoff?

$\displaystyle{ \frac{\binom{4}{2} \binom{48}{0}}{\binom{52}{2}} \cdot \100 \quad + \quad \frac{\binom{4}{1} \binom{48}{1}}{\binom{52}{2}} \cdot \10 \quad + \quad \frac{\binom{4}{0} \binom{48}{2}}{\binom{52}{2}} \cdot \ 0 }$

since $\binom{4}{n} \binom{48}{3-n}$ is the number of ways to pick $n$ aces and $3-n$ non-aces. Of course we can also leave off the last term, which is zero:

$\displaystyle{ \frac{\binom{4}{2} \binom{48}{0}}{\binom{52}{2}} \cdot \100 \quad + \quad \frac{\binom{4}{1} \binom{48}{1}}{\binom{52}{2}} \cdot \10 }$

Since $\binom{48}{0} = 1$ we can also write this as

$\displaystyle{ \frac{\binom{4}{2}}{\binom{52}{2}} \cdot \100 \quad + \quad \frac{\binom{4}{1} \binom{48}{1}}{\binom{52}{2}} \cdot \10 }$

Or, since $\binom{4}{1} = 4$ and $\binom{48}{1} = 48$ we can write this as

$\displaystyle{ \frac{\binom{4}{2}}{\binom{52}{2}} \cdot \100 + \frac{4 \cdot 48}{\binom{52}{2}} \cdot \10 }$

But I said not to bother simplifying the binomial coefficents.

### Extra credit

About how many ways are there to choose 3 atoms from all the atoms in the observable universe? Since this question is for extra credit, I’ll make it hard: I’ll only accept answers written in scientific notation, for example $2 \times 10^{50}.$

Answer. In class I said the number of atoms in the
observable universe is about $10^{80},$ and I said I might put this on the test. So, the answer is

$\begin{array}{ccl} \displaystyle{ \binom{10^{80}}{3}} &=& \displaystyle{ \frac{10^{80}(10^{80} - 1)(10^{80} - 2)}{3 \cdot 2 \cdot 1} } \\ \\ &\approx& \displaystyle{ \frac{10^{80} \cdot 10^{80} \cdot 10^{80}}{6} } \\ \\ &=& \displaystyle{ \frac{10^{240}}{6} } \\ \\ &\approx & 1.667 \times 10^{239} \end{array}$

Note that the question asked for an approximate answer, since we don’t know exactly how many atoms there are in the observable universe. The right answer to a question like this gives no more decimal places than we have in our data, so $1.667 \times 10^{239}$ is actually too precise! You only have one digit in the data I gave you, so a better answer is

$\mathbf{ 2 \times 10^{239} }$

Since the figure $10^{80}$ is very approximate, another correct answer is

$\mathbf{ 10^{239} }$

## Nash Equilibria

12 February, 2013

As you know if you’ve been reading this blog lately, I’m teaching game theory. I could use some help.

Is there a nice elementary proof of the existence of Nash equilibria for 2-person games?

Here’s the theorem I have in mind. Suppose $A$ and $B$ are $m \times n$ matrices of real numbers. Say a mixed strategy for player A is a vector $p \in \mathbb{R}^m$ with

$\displaystyle{ p_i \ge 0 , \quad \sum_i p_i = 1 }$

and a mixed strategy for player B is a vector $q \in \mathbb{R}^n$ with

$\displaystyle{ q_i \ge 0 , \quad \sum_j q_j = 1 }$

A Nash equilibrium is a pair consisting of a mixed strategy $p$ for A and a mixed strategy $q$ for B such that:

1) For every mixed strategy $p'$ for A, $p' \cdot A q \le p \cdot A q.$

2) For every mixed strategy $q'$ for B, $p \cdot B q' \le p \cdot B q.$

(The idea is that $p \cdot A q$ is the expected payoff to player A when A chooses mixed strategy $p$ and B chooses $q.$ Condition 1 says A can’t improve their payoff by unilaterally switching to some mixed strategy $p'.$ Similarly, condition 2 says B can’t improve their expected payoff by unilaterally switching to some mixed strategy $q'.$)

### Some history

The history behind my question is sort of amusing, so let me tell you about that—though I probably don’t know it all.

Nash won the Nobel Prize for a one-page proof of a more general theorem for n-person games, but his proof uses Kakutani’s fixed-point theorem, which seems like overkill, at least for the 2-person case. There is also a proof using Brouwer’s fixed-point theorem; see here for the n-person case and here for the 2-person case. But again, this seems like overkill.

Earlier, von Neumann had proved a result which implies the one I’m interested in, but only in the special case where $B = -A:$ the so-called minimax theorem. Von Neumann wrote:

As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved.

I believe von Neumann used Brouwer’s fixed point theorem, and I get the impression Kakutani proved his fixed point theorem in order to give a different proof of this result! Apparently when Nash explained his generalization to von Neumann, the latter said:

That’s trivial, you know. That’s just a fixed point theorem.

But you don’t need a fixed point theorem to prove von Neumann’s minimax theorem! There’s a more elementary proof in an appendix to Andrew Colman’s 1982 book Game Theory and its Applications in the Social and Biological Sciences. He writes:

In common with many people, I first encountered game theory in non-mathematical books, and I soon became intrigued by the minimax theorem but frustrated by the way the books tiptoed around it without proving it. It seems reasonable to suppose that I am not the only person who has encountered this problem, but I have not found any source to which mathematically unsophisticated readers can turn for a proper understanding of the theorem, so I have attempted in the pages that follow to provide a simple, self-contained proof with each step spelt out as clearly as possible both in symbols and words.

This proof is indeed very elementary. The deepest fact used is merely that a continuous function assumes a maximum on a compact set—and actually just a very special case of this. So, this is very nice.

Unfortunately, the proof is spelt out in such enormous elementary detail that I keep falling asleep halfway through! And worse, it only covers the case $B = -A.$

Is there a good references to an elementary but terse proof of the existence of Nash equilibria for 2-person games? If I don’t find one, I’ll have to work through Colman’s proof and then generalize it. But I feel sure someone must have done this already.

## Game Theory (Part 13)

11 February, 2013

Last time we introduced mixed strategies, and generalized the idea of Nash equilibrium to mixed strategies. Now let’s look at an example.

### Matching pennies

In the game of matching pennies, each player has a penny and must secretly turn the penny to either heads or tails. They then reveal their choices simultaneously. If the pennies match—both heads or both tails—player A keeps both pennies, so he wins one from player B. If the pennies do not match—one heads and one tails—B keeps both pennies, so she wins one from A.

Let’s write this game in normal form! If the pennies match, A wins one penny and B loses one, so their payoffs are (1,-1). If they pennies don’t match, A loses one and B wins one, so their payoffs are (-1,1). If we say

• choice 2 is tails

then the normal form looks like this:

 1 2 1 (1,-1) (-1,1) 2 (-1,1) (1,-1)

We can also break this into two matrices in the usual way, one showing the payoff for A:

$A = \left( \begin{array}{rr} 1 & -1 \\ -1 & 1 \end{array} \right)$

and one showing the payoff for B:

$B = \left( \begin{array}{rr} -1 & 1 \\ 1 & -1 \end{array} \right)$

If you examine this game, it’s easy to see no pure strategy is a Nash equilibrium. If A chooses heads, B will want to choose tails. But if B chooses tails, A will want to choose tails. And if A chooses tails, B will want to choose heads. And if B chooses heads, A will want to choose heads! There’s always someone who would do better by changing their choice. It’s a lot like rock, paper, scissors.

### Mixed strategies

However, there is a Nash equilibrium if we allow mixed strategies, where the players can randomly make their choice according to a certain probability distribution.

Let’s see if we can guess where this Nash equilibrium is. If you were player A, can you guess what you should do? Should you choose heads 90% of the time? 80% of the time? 70% of the time?

A student says: “50% of the time.”

Yes, that seems right. After all, if player A chooses heads most of the time, player B can win most of the time by always choosing tails! And if player A chooses tails most of the time, B can win most of the time by always choosing heads! The only way out is for player A to choose heads with probability 1/2.

But what about player B? What should player B do?

A student says: “She should also choose heads 50% of the time.”

Yes, that makes sense too—for the same sort of reason.

So, let’s see if these mixed strategies give a Nash equilibrium. Remember that we write

$p = (p_1, p_2)$

for A’s mixed strategy. $p_1$ is the probability that A makes choice 1: that is, chooses heads. $p_2$ is the probability that A makes choice 2: that is, chooses tails. So, what $p$ are we guessing here?

The students say:

$p = (1/2, 1/2)$

Okay, that’s A’s mixed strategy. And what about B’s mixed strategy?

The students say:

$q = (1/2, 1/2)$

Okay, now lets see if these are really a Nash equilibrium! Remember the definition from last time:

Definition. Given a 2-player normal form game, a pair of mixed strategies $(p,q),$ one for player A and one for player B, is a Nash equilibrium if:

1) For all mixed strategies $p'$ for player A, $p' \cdot A q \le p \cdot A q.$

2) For all mixed strategies $q'$ for player B, $p \cdot B q' \le p \cdot B q.$

We want to check this for our $p$ and $q.$ This looks hard, because we have to check part 1) for all $p'$ and part 2) for all $q'.$ But sometimes it’s easy, and I’ve picked a game where it’s easy.

Why is it easy? Well, first consider part 1). For this we need to show

$p' \cdot A q \le p \cdot A q$

To show this, let’s start by computing $A q:$

$A q = \left( \begin{array}{rr} 1 & -1 \\ -1 & 1 \end{array} \right) \left( \begin{array}{r} 1/2 \\ 1/2 \end{array} \right) = \left( \begin{array}{r} 0 \\ 0 \end{array} \right)$

Hey! It’s the zero vector, a list of all zeros! We write this simply as 0. So, we’re trying to show

$p' \cdot 0 \le p \cdot 0$

but of course this is true, because both sides equal the number zero!

What’s going on here? The point is that if B uses mixed strategy $q,$ A’s expected payoff is zero no matter what mixed strategy A uses! If A uses any mixed strategy $p',$ the expected value of his payoff is always $p' \cdot A q = 0,$ since $Aq = 0.$

Now let’s check part 2). This should be similar. Now we need to show that for any mixed strategy $q',$

$p \cdot B q' \le p \cdot B q$

Hey, this is actually a bit different! Before we could use $Aq = 0$ to get the job done. But now we see $B q'$ showing up, and $q'$ is any mixed strategy, so we can’t calculate this! What do we do?

We need to use the power of math!

We need to use a fact about matrices and dot products! For any vectors $v \in \mathbb{R}^m$ and $w \in \mathbb{R}^n$ and any $m \times n$ matrix $A,$ we have

$v \cdot A w = A^T v \cdot w$

Here $A^T$ is the transpose of the matrix $A;$ that is, the matrix we get by switching rows and columns:

Here’s the formula for the transpose:

$A^T_{ji} = A_{ij}$

Someday when I’ll in a bad mood I’ll make you prove $v \cdot A w = A^T v \cdot w$ in your homework.

Anyway, applying this equation to the matrix $B,$ we can take part 2):

$p \cdot B q' \le p \cdot B q$

and rewrite it like this:

$B^T p \cdot q' \le B^T p \cdot q$

Notice that $B^T p$ shows up on both sides now. So, we should start by computing this! We have

$B = \left( \begin{array}{rr} -1 & 1 \\ 1 & -1 \end{array} \right)$

so taking the transpose doesn’t actually do anything in this particular case:

$B^T= \left( \begin{array}{rr} -1 & 1 \\ 1 & -1 \end{array} \right)$

and we get:

$B^T p = \left( \begin{array}{rr} -1 & 1 \\ 1 & -1 \end{array} \right) \left( \begin{array}{r} 1/2 \\ 1/2 \end{array} \right) = \left( \begin{array}{r} 0 \\ 0 \end{array} \right)$

Hey! It’s the zero vector again! You should have guessed that was coming. So, the thing we’re trying to show:

$B^T p \cdot q' \le B^T p \cdot q$

simply boils down to

$0 \cdot q' \le 0 \cdot q$

which is true because both sides are zero.

Yay, we’re done! We found a Nash equilibrium! And the point of our last little calculation is that if A uses mixed strategy $p,$ B’s expected payoff is zero no matter what mixed strategy B uses. This is just the mirror image of what we saw before: if B uses mixed strategy $q,$ A’s expected payoff is zero no matter what mixed strategy A uses.

So, when both A and B are choosing heads or tails with 50% probabilities, independently, either one can unilaterally change their mixed strategy without improving their expected payoff. That’s a Nash equilibrium for you!

### The coin-matching game

Just for fun, let me warn you about a scam that involves matching pennies. I’ll quote this article:

Coin-matching game, Wikipedia.

A coin-matching game (also a coin smack or smack game) is a confidence trick in which two con artists set up one victim.

The first con artist strikes up a conversation with the victim, usually while waiting somewhere. The con artist suggests matching pennies (or other coins) to pass the time. The second con artist arrives and joins in, but soon leaves for a moment. The first con artist then suggests cheating. The victim, thinking they are going to scam the second con artist, agrees to match coins each time.

When the second con artist returns and begins losing, he accuses the two of cheating and threatens to call the police. The first con artist offers a sizable sum of hush money, and the victim pitches in. After the victim leaves, the two con artists split up the money extorted from the victim.

So, watch out if someone suggests playing ‘matching pennies’ with you! It could be legit, but if a second person joins in and starts placing bets on the two of you, it’s probably a coin smack!

For more con games, see:

I’m not trying to teach you how to con people! I’m trying to help you defend yourself. Of course the best way to avoid scams is a generally cautious, skeptical attitude, especially in any situation where someone seems to be helping you get money or get around the law. And don’t forget: successful con artists don’t usually look suspicious, like this:

They look and act trustworthy. As the saying goes:

The most important thing is sincerity. If you can fake that, you’ve got it made.