Relative Entropy in Evolutionary Dynamics

guest post by Marc Harper

In John’s information geometry series, he mentioned some of my work in evolutionary dynamics. Today I’m going to tell you about some exciting extensions!

The replicator equation

First a little refresher. For a population of n replicating types, such as individuals with different eye colors or a gene with n distinct alleles, the ‘replicator equation’ expresses the main idea of natural selection: the relative rate of growth of each type should be proportional to the difference between the fitness of the type and the mean fitness in the population.

To see why this equation should be true, let P_i be the population of individuals of the ith type, which we allow to be any nonnegative real number. We can list all these numbers and get a vector:

P = (P_1, \dots, P_n)

The Lotka–Volterra equation is a very general rule for how these numbers can change with time:

\displaystyle{ \frac{d P_i}{d t} = f_i(P) P_i }

Each population grows at a rate proportional to itself, where the ‘constant of proportionality’, f_i(P), is not necessarily constant: it can be any real-valued function of P. This function is called the fitness of the ith type. Taken all together, these functions f_i are called the fitness landscape.

Let p_i be the fraction of individuals who are of the ith type:

\displaystyle{ p_i = \frac{P_i}{\sum_{i =1}^n P_i } }

These numbers p_i are between 0 and 1, and they add up to 1. So, we can also think of them as probabilities: p_i is the probability that a randomly chosen individual is of the ith type. This is how probability theory, and eventually entropy, gets into the game.

Again, we can bundle these numbers into a vector:

p = (p_1, \dots, p_n)

which we call the population distribution. It turns out that the Lotka–Volterra equation implies the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(P) - \langle f(P) \rangle \right) \, p_i }


\displaystyle{ \langle f(P) \rangle = \sum_{i =1}^n  f_i(P)  p_i  }

is the mean fitness of all the individuals. You can see the proof in Part 9 of the information geometry series.

By the way: if each fitness f_i(P) only depends on the fraction of individuals of each type, not the total numbers, we can write the replicator equation in a simpler way:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }

From now on, when talking about this equation, that’s what I’ll do.

Anyway, the take-home message is this: the replicator equation says the fraction of individuals of any type changes at a rate proportional to fitness of that type minus the mean fitness.

Now, it has been known since the late 1970s or early 1980s, thanks to the work of Akin, Bomze, Hofbauer, Shahshahani, and others, that the replicator equation has some very interesting properties. For one thing, it often makes ‘relative entropy’ decrease. For another, it’s often an example of ‘gradient flow’. Let’s look at both of these in turn, and then talk about some new generalizations of these facts.

Relative entropy as a Lyapunov function

I mentioned that we can think of a population distribution as a probability distribution. This lets us take ideas from probability theory and even information theory and apply them to evolutionary dynamics! For example, given two population distributions p and q, the information of q relative to p is

I(q,p) = \displaystyle{ \sum_i q_i \ln \left(\frac{q_i}{p_i }\right)}

This measures how much information you gain if you have a hypothesis about some state of affairs given by the probability distribution p, and then someone tells you “no, the best hypothesis is q!”

It may seem weird to treat a population distribution as a hypothesis, but this turns out to be a good idea. Evolution can then be seen as a learning process: a process of improving the hypothesis.

We can make this precise by seeing how the relative information changes with the passage of time. Suppose we have two population distributions q and p. Suppose q is fixed, while p evolves in time according to the replicator equation. Then

\displaystyle{  \frac{d}{d t} I(q,p)  =  \sum_i f_i(P) (p_i - q_i) }

For the proof, see Part 11 of the information geometry series.

So, the information of q relative to p will decrease as p evolves according to the replicator equation if

\displaystyle{  \sum_i f_i(P) (p_i - q_i) } \le 0

If q makes this true for all p, we say q is an evolutionarily stable state. For some reasons why, see Part 13.

What matters now is that when q is an evolutionarily stable state, I(q,p) says how much information the population has ‘left to learn’—and we’re seeing that this always decreases. Moreover, it turns out that we always have

I(q,p) \ge 0

and I(q,p) = 0 precisely when p = q.

People summarize all this by saying that relative information is a ‘Lyapunov function’. Very roughly, a Lyapunov function is something that decreases with the passage of time, and is zero only at the unique stable state. To be a bit more precise, suppose we have a differential equation like

\displaystyle{  \frac{d}{d t} x(t) = v(x(t)) }

where x(t) \in \mathbb{R}^n and v is some smooth vector field on \mathbb{R}^n. Then a smooth function

V : \mathbb{R}^n \to \mathbb{R}

is a Lyapunov function if

V(x) \ge 0 for all x

V(x) = 0 iff x is some particular point x_0


\displaystyle{ \frac{d}{d t} V(x(t)) \le 0 } for every solution of our differential equation.

In this situation, the point x_0 is a stable equilibrium for our differential equation: this is Lyapunov’s theorem.

The replicator equation as a gradient flow equation

The basic idea of Lyapunov’s theorem is that when a ball likes to roll downhill and the landscape has just one bottom point, that point will be the unique stable equilibrium for the ball.

The idea of gradient flow is similar, but different: sometimes things like to roll downhill as efficiently as possible: they move in the exactly the best direction to make some quantity smaller! Under certain conditions, the replicator equation is an example of this phenomenon.

Let’s fill in some details. For starters, suppose we have some function

F : \mathbb{R}^n \to \mathbb{R}

Think of V as ‘height’. Then the gradient flow equation says how a point x(t) \in \mathbb{R}^n will move if it’s always trying its very best to go downhill:

\displaystyle{ \frac{d}{d t} x(t) = - \nabla V(x(t)) }

Here \nabla is the usual gradient in Euclidean space:

\displaystyle{ \nabla V = \left(\partial_1 V, \dots, \partial_n V \right)  }

where \partial_i is short for the partial derivative with respect to the ith coordinate.

The interesting thing is that under certain conditions, the replicator equation is an example of a gradient flow equation… but typically not one where \nabla is the usual gradient in Euclidean space. Instead, it’s the gradient on some other space, the space of all population distributions, which has a non-Euclidean geometry!

The space of all population distributions is a simplex:

\{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_{i = 1}^n p_i = 1 \} .

For example, it’s an equilateral triangle when n = 3. The equilateral triangle looks flat, but if we measure distances another way it becomes round, exactly like a portion of a sphere, and that’s the non-Euclidean geometry we need!

In fact this trick works in any dimension. The idea is to give the simplex a special Riemannian metric, the ‘Fisher information metric’. The usual metric on Euclidean space is

\delta_{i j} = \left\{\begin{array}{ccl} 1 & \mathrm{ if } & i = j \\                                       0 &\mathrm{ if } & i \ne j \end{array} \right.

This simply says that two standard basis vectors like (0,1,0,0) and (0,0,1,0) have dot product zero if the 1’s are in different places, and one if they’re in the same place. The Fisher information metric is a bit more complicated:

\displaystyle{ g_{i j} = \frac{\delta_{i j}}{p_i} }

As before, g_{i j} is a formula for the dot product of the ith and jth standard basis vectors, but now it depends on where you are in the simplex of population distributions.

We saw how this formula arises from information theory back in Part 7. I won’t repeat the calculation, but the idea is this. Fix a population distribution p and consider the information of another one, say q, relative to this. We get I(q,p). If q = p this is zero:

\displaystyle{ \left. I(q,p)\right|_{q = p} = 0 }

and this point is a local minimum for the relative information. So, the first derivative of I(q,p) as we change q must be zero:

\displaystyle{ \left. \frac{\partial}{\partial q_i} I(q,p) \right|_{q = p} = 0 }

But the second derivatives are not zero. In fact, since we’re at a local minimum, it should not be surprising that we get a positive definite matrix of second derivatives:

\displaystyle{  g_{i j} = \left. \frac{\partial^2}{\partial q_i \partial q_j} I(q,p) \right|_{q = p} = 0 }

And, this is the Fisher information metric! So, the Fisher information metric is a way of taking dot products between vectors in the simplex of population distribution that’s based on the concept of relative information.

This is not the place to explain Riemannian geometry, but any metric gives a way to measure angles and distances, and thus a way to define the gradient of a function. After all, the gradient of a function should point at right angles to the level sets of that function, and its length should equal the slope of that function:

So, if we change our way of measuring angles and distances, we get a new concept of gradient! The ith component of this new gradient vector field turns out to b

(\nabla_g V)^i = g^{i j} \partial_j V

where g^{i j} is the inverse of the matrix g_{i j}, and we sum over the repeated index j. As a sanity check, make sure you see why this is the usual Euclidean gradient when g_{i j} = \delta_{i j}.

Now suppose the fitness landscape is the good old Euclidean gradient of some function. Then it turns out that the replicator equation is a special case of gradient flow on the space of population distributions… but where we use the Fisher information metric to define our concept of gradient!

To get a feel for this, it’s good to start with the Lotka–Volterra equation, which describes how the total number of individuals of each type changes. Suppose the fitness landscape is the Euclidean gradient of some function V:

\displaystyle{ f_i(P) = \frac{\partial V}{\partial P_i} }

Then the Lotka–Volterra equation becomes this:

\displaystyle{ \frac{d P_i}{d t} = \frac{\partial V}{\partial P_i} \, P_i }

This doesn’t look like the gradient flow equation, thanks to that annoying P_i on the right-hand side! It certainly ain’t the gradient flow coming from the function V and the usual Euclidean gradient. However, it is gradient flow coming from V and some other metric on the space

\{ P \in \mathbb{R}^n : \; P_i \ge 0 \}

For a proof, and the formula for this other metric, see Section 3.7 in this survey:

• Marc Harper, Information geometry and evolutionary game theory.

Now let’s turn to the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p)  - \langle f(p) \rangle \right) \, p_i }

Again, if the fitness landscape is a Euclidean gradient, we can rewrite the replicator equation as a gradient flow equation… but again, not with respect to the Euclidean metric. This time we need to use the Fisher information metric! I sketch a proof in my paper above.

In fact, both these results were first worked out by Shahshahani:

• Siavash Shahshahani, A New Mathematical Framework for the Study of Linkage and Selection, Memoirs of the AMS 17, 1979.

New directions

All this is just the beginning! The ideas I just explained are unified in information geometry, where distance-like quantities such as the relative entropy and the Fisher information metric are studied. From here it’s a short walk to a very nice version of Fisher’s fundamental theorem of natural selection, which is familiar to researchers both in evolutionary dynamics and in information geometry.

You can see some very nice versions of this story for maximum likelihood estimators and linear programming here:

• Akio Fujiwara and Shun-ichi Amari, Gradient systems in view of information geometry, Physica D: Nonlinear Phenomena 80 (1995), 317–327.

Indeed, this seems to be the first paper discussing the similarities between evolutionary game theory and information geometry.

Dash Fryer (at Pomona College) and I have generalized this story in several interesting ways.

First, there are two famous ways to generalize the usual formula for entropy: Tsallis entropy and Rényi entropy, both of which involve a parameter q. There are Tsallis and Rényi versions of relative entropy and the Fisher information metric as well. Everything I just explained about:

• conditions under which relative entropy is a Lyapunov function for the replicator equation, and

• conditions under which the replicator equation is a special case of gradient flow

generalize to these cases! However, these generalized entropies give modified versions of the replicator equation. When we set q=1 we get back the usual story. See

• Marc Harper, Escort evolutionary game theory.

My initial interest in these alternate entropies was mostly mathematical—what is so special about the corresponding geometries?—but now researchers are starting to find populations that evolve according to these kinds of modified population dynamics! For example:

• A. Hernando et al, The workings of the Maximum Entropy Principle in collective human behavior.

There’s an interesting special case worth some attention. Lots of people fret about the relative entropy not being a distance function obeying the axioms that mathematicians like: for example, it doesn’t obey the triangle inequality. Many describe the relative entropy as a distance-like function, and this is often a valid interpretation contextually. On the other hand, the q=0 relative entropy is one-half the Euclidean distance squared! In this case the modified version of the replicator equation looks like this:

\displaystyle{ \frac{d p_i}{d t} = f_i(p) - \frac{1}{n} \sum_{j = 1}^n f_j(p) }

This equation is called the projection dynamic.

Later, I showed that there is a reasonable definition of relative entropy for a much larger family of geometries that satisfies a similar distance minimization property.

In a different direction, Dash showed that you can change the way that selection acts by using a variety of alternative ‘incentives’, extending the story to some other well-known equations describing evolutionary dynamics. By replacing the terms x_i f_i(x) in the replicator equation with a variety of other functions, called incentives, we can generate many commonly studied models of evolutionary dynamics. For instance if we exponentiate the fitness landscape (to make it always positive), we get what is commonly known as the logit dynamic. This amounts to changing the fitness landscape as follows:

\displaystyle{ f_i \mapsto \frac{x_i e^{\beta f_i}}{\sum_j{x_j e^{\beta f_j}}} }

where \beta is known as an inverse temperature in statistical thermodynamics and as an intensity of selection in evolutionary dynamics. There are lots of modified versions of the replicator equation, like the best-reply and projection dynamics, more common in economic applications of evolutionary game theory, and they can all be captured in this family. (There are also other ways to simultaneously capture such families, such as Bill Sandholm’s revision protocols, which were introduced earlier in his exploration of the foundations of game dynamics.)

Dash showed that there is a natural generalization of evolutionarily stable states to ‘incentive stable states’, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. For the logit and projection dynamics, incentive stable states are simply evolutionarily stable states, and this happens frequently, but not always.

The third generalization is to look at different ‘time-scales’—that is, different ways of describing time! We can make up the symbol \mathbb{T} for a general choice of ‘time-scale’. So far I’ve been treating time as a real number, so

\mathbb{T} = \mathbb{R}

But we can also treat time as coming in discrete evenly spaced steps, which amounts to treating time as an integer:

\mathbb{T} = \mathbb{Z}

More generally, we can make the steps have duration h, where h is any positive real number:

\mathbb{T} = h\mathbb{Z}

There is a nice way to simultaneously describe the cases \mathbb{T} = \mathbb{R} and \mathbb{T} = h\mathbb{Z} using the time-scale calculus and time-scale derivatives. For the time-scale \mathbb{T} = \mathbb{R} the time-scale derivative is just the ordinary derivative. For the time-scale \mathbb{T} = h\mathbb{Z}, the time-scale derivative is given by the difference quotient from first year calculus:

\displaystyle{ f^{\Delta}(z) = \frac{f(z+h) - f(z)}{h} }

and using this as a substitute for the derivative gives difference equations like a discrete-time version of the replicator equation. There are many other choices of time-scale, such as the quantum time-scale given by \mathbb{T} = q^{\mathbb{Z}}, in which case the time-scale derivative is called the q-derivative, but that’s a tale for another time. In any case, the fact that the successive relative entropies are decreasing can be simply state by saying they have negative \mathbb{T} = h\mathbb{Z} time-scale derivative. The continuous case we started with corresponds to \mathbb{T} = \mathbb{R}.

Remarkably, Dash and I were able to show that you can combine all three of these generalizations into one theorem, and even allow for multiple interacting populations! This produces some really neat population trajectories, such as the following two populations with three types, with fitness functions corresponding to the rock-paper-scissors game. On top we have the replicator equation, which goes along with the Fisher information metric; on the bottom we have the logit dynamic, which goes along with the Euclidean metric on the simplex:

From our theorem, it follows that the relative entropy (ordinary relative entropy on top, the q = 0 entropy on bottom) converges to zero along the population trajectories.

The final form of the theorem is loosely as follows. Pick a Riemannian geometry given by a metric g (obeying some mild conditions) and an incentive for each population, as well as a time scale (\mathbb{R} or h \mathbb{Z}) for every population. This gives an evolutionary dynamic with a natural generalization of evolutionarily stable states, and a suitable version of the relative entropy. Then, if there is an evolutionarily stable state in the interior of the simplex, the time-scale derivative of sum of the relative entropies for each population will decrease as the trajectories converge to the stable state!

When there isn’t such a stable state, we still get some interesting population dynamics, like the following:

See this paper for details:

• Marc Harper and Dashiell E. A. Fryer, Stability of evolutionary dynamics on time scales.

Next time we’ll see how to make the main idea work in finite populations, without derivatives or deterministic trajectories!

30 Responses to Relative Entropy in Evolutionary Dynamics

  1. Bruce Smith says:

    The image URL is not working for me. (Nor can I find an obvious intended one, at first glance, in the long list of images seen at

    • Bruce Smith says:

      On second glance, I bet it’s this one:

    • John Baez says:

      Yes, that’s it. Thanks—fixed! In case anyone missed it the first time, it’s an illustration of the how the simplex of population distributions gets a round geometry when we use the Fisher information metric:

      • Bruce Smith says:

        Is it meant to be exactly accurate (meaning it gets a geometry equal to part of a sphere, if I’m seeing it properly), or just illustrative of the basic concept?

        • Bruce Smith says:

          And if it is an exact sphere, is there a nice formula to map a probability distribution (a point in the simplex) to a point on a sphere embedded in space? if so, is there any interesting way of understanding that formula as being natural (i.e. as giving some other representation of the probability distribution)? I might guess the mapping p_i to p_i squared (since it does map to a sphere)… but I’m more used to seeing probabilities be amplitudes squared, than to squaring probabilities.

        • John Baez says:

          Yes, it’s an exact sphere.

          The simplex of probability distributions

          \displaystyle{ \{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_i p_i = 1 \} }

          equipped with its Fischer information metric has exactly the geometry of a portion of a perfectly round sphere of radius 2. To see this, we can use the formula for the Fisher information metric together with the map

          p_i \to 2 \sqrt{p_i}

          and do some calculations.

          But the fact that it’s a sphere of radius 2 is not a huge big deal. If we stick the right constant in front of the Fisher information metric, the simplex gets to be the same shape as

          \displaystyle{ \{ x \in \mathbb{R}^n : \; x_i \ge 0, \; \sum_i x_i^2 = 1 \} }

          In short: we are indeed taking writing probabilities as squares of ‘amplitudes’ here, though any potential connection to quantum mechanics remains quite mysterious, at least to me. For one thing, these ‘amplitudes’ are real—but that’s the least of it, there is such a thing as real quantum mechanics, so real amplitudes don’t faze me. (Pun intended.) The bigger question would be why the information metric on the space of probabilities should make them look more like quantum states.

        • Marc Harper says:

          John’s answer is spot on. The radius doesn’t really matter because it just changes the velocity of the trajectories but not the trajectories themselves. There might be some issues on the boundary when you compute the Jacobian, but certainly for the interior of the simplex the mapping is exact, and the replicator equation is forward-invariant on the interior (if it starts in the interior, it stays in the interior for any finite time period).

          Another implication of the mapping is that you essentially know what the geodesics of the Fisher metric are since they are great circle arcs on the sphere and can be pulled back to the simplex.

          There are versions of the Fisher-Information metric in quantum mechanical contexts (e.g. the Fubini-Study metric and the Bures metric), and lots of people study quantum information geometry. (I’m by no means an expert.) At the last big information geometry conference (IGAIA 2010) about half of the talks were about quantum information topics.

        • Bruce Smith says:

          Thanks for that correction — the mapping I gave was “backwards”. As for the bigger question — I don’t know, but I suspect it’s not a coincidence.

  2. domenico says:

    It is an interesting article.
    I am thinking (I don’t know if it is useful) that it is not necessary to use the normalization.
    If the fitness landscape obeys a simple constraint:

    \sum_i f_i(P) P_i = 0

    then the dynamic is on the the simplex. The replicator equation is simpler, and the constraint are in the fitness: if the fitness is a Taylor series, then the parameters have a constraint, so that only n-1 fitness have a free dynamic.
    It is interesting the application on the alleles: if it is possible to infer some property for the population dynamic with some genetic diseases, then it can be possible to infer the number of defective alleles.

    • John Baez says:

      Given the Lotka–Volterra equation

      \displaystyle{ \frac{d}{dt} P_i = f_i(P) P_i }

      the constraint you mention is equivalent to saying the total population is constant:

      \displaystyle{ \frac{d}{dt} \sum_i P_i = 0 }

      That seems to be a rather special situation, not something we usually see in populations of organisms!

      However, this constraint does hold in a situation like this: we have a population of game players with different strategies. They randomly meet in pairs and play a 2-player game. One or the other player wins, with some probability depending on both player’s strategies. Then the loser changes their strategy to that of the winner!

      In the large-number limit, where random fluctuations become small, we can write down a differential equation for the time evolution of the population P_i of players having the ith strategy. This gives a special case of the Lotka–Volterra equation. And in this special case we indeed have

      \displaystyle{ \frac{d}{dt} \sum_i P_i = 0 }

      since the total number of players never changes: they just change strategies!

      There are lots of generalizations (e.g to multiplayer games). As long as the total number of players never changes, your constraint holds and the dynamics of the populations P_i stays on a simplex.

      • domenico says:

        I thought that the equation for the probability of the population, that it is obtained from the number of individual of the population, have a complex form because of the normalization to project on the simplex.
        If the fitness in the probability equation is the true law, that can be evaluated from experimental data, then all seems easier.

  3. John Baez says:

    My answer to Domenico’s question above hints at one of my obsessions: figuring out the relation between reaction networks and evolutionary game theory!

    On this blog we’ve seen how reaction networks describe interacting collections of individuals of various types. This sounds related to evolutionary game theory… and indeed it is!

    Say we have a reaction network. When we have small populations we describe their evolution stochastically using a ‘master equation’. In the limit of large populations, we can often ignore random fluctuations and use a ‘rate equation’ to describe the time evolution of the expected number of individuals of each type. This rate equation is always of the form

    \displaystyle{ \frac{d}{dt} P_i = g_i(P)  }

    for some functions g_i.

    But sometimes ‘it takes one to make one’: every process that produces an individual of type i at output must involve an individual of type i as input. In this case, the rate equation has the special form

    \displaystyle{ \frac{d}{dt} P_i = f_i(P) P_i  }

    In other words, it reduces to the Lotka–Volterra equation!

    Now look at what we’ve seen:

    Manoj Gopalkrishan described a general theorem for reaction networks, giving conditions that ensure a certain ‘free energy’ function always decreases. Marc Harper has described a general theorem for the Lotka–Volterra equation (or its offspring, the replicator equation), giving conditions that ensure a certain ‘relative entropy’ function always decreases.

    The second theorem has got to be a special case of the first, or… well, or both are a special case of some third, even better theorem!

    So, what’s up? The theorem Manoj stated requires the existence of a ‘complex balanced equilibrium’. The theorem Marc stated requires the existence of an ‘evolutionarily stable state’. Is the second condition a special case of the first?

    It’s not obvious to me. Stay tuned, or help out!

    • Marc Harper says:

      Well… I think the next article should help! For finite population models we typically assume that the population size doesn’t change, and Dash and I found a “Lyapunov Theorem” for that context. This finite population model is a Markov chain called the Moran process; Arne Traulsen and collaborators have shown that the transition probabilities of the Markov chain can be used to define a Master equation SDE which has the replicator equation as its Langevin equation.

      Anyway, maybe I’m giving too much away, but Dash and I found that the stable states (local maxima of the stationary distribution) for the Markov chain are those that have a inflow-outflow balance (via sums of incoming and outgoing transition probabilities). We then show that these states are evolutionarily stable states! So I think we’re getting close to putting it all together!

      • John Baez says:

        Great! I don’t think “giving too much away” is a problem: generally people need to hear something twice to understand it once. So, think of it as “whetting our appetite”.

        For those eager to see these results, Marc and Dashiell Fryer get their Lyapunov functions here:

        • Marc Harper and Dashiell Fryer Stationary stability for evolutionary dynamics in finite populations.

        I find it a bit limiting to assume the total population size doesn’t change, but that assumption reminds me hugely of this paper:

        • Katalin M. Hangos, Engineering model reduction and entropy-based Lyapunov functions in chemical reaction kinetics, Entropy 12 (2010), 772–797.

        See “The entropy-based Lyapunov function” on page 11. She considers chemical reaction networks obeying the law of mass action, assuming they have a complex balanced equilibrium… and the total number of invididuals of all kinds is constant! Using this she gives a super-short proof that her ‘entropy-based Lyapunov function’ decreases with time.

        Just for spectators who are having trouble keeping score:

        Hangos’ entropy-based Lyapunov function is essentially the ‘free energy’ described in Manoj’s post:

        g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s

        where x_s is the number of individuals of type s, as a function of time, and \alpha_s is that number at the complex balanced equilibrium point. Hangos simply differentiates this with respect to time and uses some properties of the logarithm function to show the answer is \le 0.

        Now we’ve seen David Anderson present a similar ‘brutally direct’ proof without the assumption that the total number of individuals of all kinds

        \sum_{s \in S} x_s

        is constant! He says this proof can be found here:

        • Martin Feinberg, Lectures on chemical reaction networks, 1979.

        I’m not sure, but I suspect that this proof reduces to Hangos’ argument if we assume

        \sum_{s \in S} x_s

        is constant.

        I’m really glad you’ve begun to forge a connection between evolutionarily stable states and complex balanced equilibrium states; I now bet we can show any of the latter is one of the former!

      • John Baez says:

        Marc wrote:

        Arne Traulsen and collaborators have shown that the transition probabilities of the Markov chain can be used to define a Master equation SDE which has the replicator equation as its Langevin equation.

        Arne Traulsen has a lot of papers listed here. Could you point me to the right one?

        • Marc Harper says:

          These two papers, near the top: “Coevolutionary dynamics: From finite to infinite populations” and “Coevolutionary dynamics in large, but finite populations” by Traulsen et al. The second paper has more details, and the Fokker-Planck equation is the master equation in this context.

          Dash and I have added a Lyapunov stability layer to finite populations and have shown that local extrema of the stationary distributions are (if I understand the terminology correctly) complex-balanced for sufficiently large populations (N=30 is typically large enough). These states then satisfy an evolutionary stability criterion that incorporates mutation, and all the classical resutls (e.g. the replicator stability that this article starts with) are recovered for small mutation rates and large populations.

          The reason that the population needs to be sufficiently large is simply that using a finite population is essentially taking a partition of the simplex, and our approach requires a fine enough partition to get the local maxima of the stationary distribution to stabilize on the evolutionarily stable states. But usually the population doesn’t need to be that large.

          Also, unless the population size is very small, having a fixed population size isn’t really that limiting in my experience, though admittedly it seems artificial…

  4. Thanks for a really interesting post.

    Here is something that surprised me, so I just want to check that I got it right. Suppose that we have a species that play two different “games”, say survive predators and doing courtship rituals. In the first game it has strategies 1,\dots, n and the starting distribution is given by r and in the second game it has strategies 1,\dots ,k and distribution given by s. Furthermore, assume that the strategy each individual have in one game is independent from its strategy in the second game, and that its total utility is just the sum of the utilities for each of the two games. We now have nk types in total, and the starting probability of having type (i,j) is r_is_j, and we can use the above formula to compute who the system evolves. Unless I missed something we get that:

    1) The strategies in the two game will continue to be independent
    2) The distributions on strategies in game 1 will evolve exactly as if they were only playing that game
    3) Similar for game 2

    Figuratively speaking, a species can take classes in martial arts and dancing (and many other skills) at the same time, and it will still improve as much in martial arts as if it was only taking martial arts classes, improve as much at dancing as if it was only taking dancing classes and so on. This really surprised me, I would have thought that being in an arms race would slow down the evolution of other traits.

    • John Baez says:

      Hi! I’ll let Marc answer, but I think the situation you describe is equivalent to one where have two completely different species, one playing one game and one playing another, and then we define a new rather abstract kind of individual to be an ordered pair consisting of one individual of the first type and one of the second type. So, there’s no real interaction between what’s happening in the two games.

      Maybe in real life various interactions complicate the situation? If you have to spend a certain amount of time per day playing each game, you’ll have less time for each game the more games you play, so your rate of improvement on each game will be less.

      • lee bloomquist says:

        John wrote:

        “…I think the situation you describe is equivalent to one where have two completely different species, one playing one game and one playing another, and then we define a new rather abstract kind of individual to be an ordered pair consisting of one individual of the first type and one of the second type.”

        Here’s a diagram of this for an imaginary laboratory animal experiment:

        It’s a set-up that would test the existence of probability learning for nested situations: First, would lab animals use the strategy of probability learning to select between playing two games that each will detect another instance of probability learning– one being a game of foraging for food, the other a game where the reinforcement is sex. (From the abstracts I googled, sex as well as food can be used to reinforce behavior of lab animals.)

        I think the lesson for a solitary player in the probability learning game is “Be conscious of missing a reinforcement, don’t ignore information about those occasions, and learn something from the information.” In other words, “Learn form your mistakes.”

        But when involved in group (like fish schooling around different sources of prey, see the link below), the lesson that re-writes over this one seems to be: “Ignore information about missing a reinforcement given to a different group if others in your group are ignoring that information as well.”

        I think the former describes laboratory experiments with individual players while the latter describes a Nash equilibrium among multiple probability learners playing on the same game board. Here’s the link–

        There is a question about ESSs– in the same googles, I saw book sections about animals who hide the behavior of having sex with one who is not a mate. That would be hiding some of the information required to produce probability learning. For example, if the animal is foraging and therefore misses an opportunity for sex, but that information is hidden from the animal, then it would not have the information available to associate some amount of regret from foraging. (Please see the math in the above link, where there’s a model of regret.) Is behavior that hides information like this an ESS?

    • Marc Harper says:

      The fitness landscape could depend on both the r and s strategy, in which case I think that the evolution of each strategy wouldn’t be independent as suggested. That’s the case in the plots at the end of the article.

      As John suggested, the scenario described could be transformed to a single game where the types are all the (r_i, s_j) pairs with a modified fitness landscape. Then probably the solution to the combined system is the joint probability distribution of the solutions to the subsystem.

      I’d be carefully overgeneralizing from this theoretical setup, since biologically speaking, a species in an arms race could still be evolving in other dimensions and there is dependence on the amount of mutation (we’ve assumed that there is none) and other factors.

  5. calculus says:

    Wow, a bit too much of mathematical complexity in my opinion, but that’s not my critic here. To me, ‘Evolution’ has nothing to do with ‘Population Dynamic’ and ‘fitness landscape’, so a better title here would be:
    ‘Relative Entropy in Population Dynamics’

    I’ve got to define the premises, the ‘priors’. I concede to Darwinism that the ‘fittest’ will become the most numerous and successful and that elaborate mathematical descriptions of fitness landscape can describe Population Dynamic, but I don’t agree that this model fits ‘Evolution’.
    So here I go: Evolution represents a rise in Complexity (as in Kolmogorov algorithmic complexity) and/or algorithmic depth (as define by Charles Bennett) and is better explained within the conceptual frame of Information theory.
    Species with similar level of complexity could be viewed as an horizontal differentiation, but since ‘Complexity’ is not changed, they do not represent a case of ‘Evolution’, but a case of ‘Speciation’.
    Both ‘Population Dynamic’ and ‘Speciation’ are interdependent and are probably a function of fitness landscape among other things.
    But Evolution, as a rise in Complexity, is of different nature.
    An increase in complexity during an Evolutionary event might even cause a fitness damage to their carrier, making them transiently mal-adapted or misfit, leading to small numbers in the population, in perfect agreement with Darwinism that the fittest must survive more.
    I’ll go even further, what proof do we have that this ‘Complexity related Evolution’ is driven by the fittest?
    It’s easy to understand that, in a given environment, Sub-optimal individuals have actually more reasons to try new strategies and explore more ecological niches than the Alphas.
    So somehow, true Evolution may have more to thank the misfits than the fittest: not exactly what Darwin said.
    If it’s not the fittest, what drives Evolution? well, some said its a Maxwell Demon. Evolution can be seen as a series of measurements of the environment and Natural Selection act as a filter of the useful measurements. Whatever it is, this process produces an ‘increased algorithmic complexity/depth’ but is not directly related to a successful population dynamic or speciation.
    That comes only after, once mutants with increased complexity appear, they have to adapt to survive and speciation has to take place rapidly, with the fittest individuals surviving as usual, but at this point, it’s not Evolution anymore, just survival.

    • davidwlocke says:

      Speciation is a punctuated equilibrium event that happens in a local fitness rather than a global fitness. Poisson games describe a population growing from n to n+m.

  6. Now Tobias Fritz and I have finally finished our paper on this subject:

    A Bayesian characterization of relative entropy.

  7. Jon Awbrey says:

    That first link to “Information Geometry” is broken. Here’s the last working WayBack link.

  8. Marc Harper spoke about information in evolutionary game theory, and we have a nice video of that. I’ve been excited about his work for quite a while, because it shows that the analogy between ‘evolution’ and ‘learning’ can be made mathematically precise. I summarized some of his ideas in my information geometry series, and I’ve also gotten him to write two articles for this blog:

    • Marc Harper, Relative entropy in evolutionary dynamics, Azimuth, 22 January 2014.

    • Marc Harper, Stationary stability in finite populations, Azimuth, 24 March 2015.

    Here are the slides and video of his talk:

    • Marc Harper, Information transport and evolutionary dynamics.

  9. Omar Ghattas says:

    Great post, is there any chance you’d provide the code used to generate the population trajectory plots??

  10. Manoj Gopalkrishnan, who has written a couple of great posts on chemical reaction networks here on Azimuth, is talking about a way to do statistical inference with chemistry! His talk is called ‘Statistical inference with a chemical soup’.

    I like how this seems to exploit existing analogies between the approach to equilibrium in chemistry, the approach to equilibrium in evolutionary game theory, and statistical inference. You may have read Marc Harper’s post about that.

  11. While this kind of hard-coded inference dynamics and expectations might be fixed for individual agents of a class (i.e. species) within their lifetimes, these mappings can be optimised on a longer timescale over populations of agents by evolution. In fact, evolution might be seen as a very similar learning process, only on different spatial and temporal scales (c.f. the post of Marc Harper on John Baez’s Blog, and this talk by John Baez).

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.