*guest post by Marc Harper*

A while back, in the article Relative entropy minimization in evolutionary dynamics, we looked at extensions of the information geometry / evolutionary game theory story to more general time-scales, incentives, and geometries. Today we’ll see how to make this all work in finite populations!

Let’s recall the basic idea from last time, which John also described in his information geometry series. The main theorem is this: when there’s an evolutionarily stable state for a given fitness landscape, the relative entropy between the stable state and the population distribution decreases along the population trajectories as they converge to the stable state. In short, relative entropy is a Lyapunov function. This is a nice way to look at the action of a population under natural selection, and it has interesting analogies to Bayesian inference.

The replicator equation is a nice model from an intuitive viewpoint, and it’s mathematically elegant. But it has some drawbacks when it comes to modeling real populations. One major issue is that the replicator equation implicitly assumes that the population proportions of each type are differentiable functions of time, obeying a differential equation. This only makes sense in the limit of large populations. Other closely related models, such as the Lotka-Volterra model, focus on the number of individuals of each type (e.g. predators and prey) instead of the proportion. But they often assume that the number of individuals is a differentiable function of time, and a population of 3.5 isn’t very realistic either.

Real populations of replicating entities are not infinitely large; in fact they are often relatively small and of course have whole numbers of each type, at least for large biological replicators (like animals). They take up space and only so many can interact meaningfully. There are quite a few models of evolution that handle finite populations and some predate the replicator equation. Models with more realistic assumptions typically have to leave the realm of derivatives and differential equations behind, which means that the analysis of such models is more difficult, but the behaviors of the models are often much more interesting. Hopefully by the end of this post, you’ll see how all of these diagrams fit together:

One of the best-known finite population models is the Moran process, which is a Markov chain on a finite population. This is the quintessential birth-death process. For a moment consider a population of just two types and The state of the population is given by a pair of nonnegative integers with the total number of replicators in the population, and and the number of individuals of type and respectively. Though it may artificial to fix the population size this often turns out not to be that big of a deal, and you can assume the population is at its carrying capacity to make the assumption realistic. (Lots of people study populations that can change size and that have replicators spatially distributed say on a graph, but we’ll assume they can all interact with each whenever they want for now).

A Markov model works by transitioning from state to state in each round of the process, so we need to define the transitions probabilities to complete the model. Let’s put a fitness landscape on the population, given by two functions and of the population state Now we choose an individual to reproduce proportionally to fitness, e.g. we choose an individual to reproduce with probability

since there are individuals of type and they each have fitness This is analogous to the ratio of fitness to mean fitness from the discrete replicator equation, since

and the discrete replicator equation is typically similar to the continuous replicator equation (this can be made precise), so the Moran process captures the idea of natural selection in a similar way. Actually there is a way to recover the replicator equation from the Moran process in large populations—details at the end!

We’ll assume that the fitnesses are nonnegative and that the total fitness (the denominator) is never zero; if that seems artificial, some people prefer to transform the fitness landscape by which gives a ratio reminiscent of the Boltzmann or Fermi distribution from statistical physics, with the parameter playing the role of **intensity of selection** rather than inverse temperature. This is sometimes called **Fermi selection**.

That takes care of the birth part. The death part is easier: we just choose an individual at random (uniformly) to be replaced. Now we can form the transition probabilities of moving between population states. For instance the probability of moving from state to is given by the product of the birth and death probabilities, since they are independent:

since we have to chose a replicator of type to reproduce and one of type to be replaced. Similarly for to (switch all the a’s and b’s), and we can write the probability of staying in the state as

Since we only replace one individual at a time, this covers all the possible transitions, and keeps the population constant.

We’d like to analyze this model and many people have come up with clever ways to do so, computing quantities like **fixation probabilities** (also known as **absorption probabilities**), indicating the chance that the population will end up with one type completely dominating, i.e. in state or If we assume that the fitness of type is constant and simply equal to 1, and the fitness of type is we can calculate the probability that a single mutant of type will take over a population of type using standard Markov chain methods:

For neutral relative fitness (), which is the probability a neutral mutant invades by drift alone since selection is neutral. Since the two boundary states or are absorbing (no transitions out), in the long run *every* population ends up in one of these two states, i.e. the population is homogeneous. (This is the formulation referred to by Matteo Smerlak in The mathematical origins of irreversibility.)

That’s a bit different flavor of result than what we discussed previously, since we had stable states where both types were present, and now that’s impossible, and a bit disappointing. We need to make the population model a bit more complex to have more interesting behaviors, and we can do this in a very nice way by adding the effects of mutation. At the time of reproduction, we’ll allow either type to mutate into the other with probability This changes the transition probabilities to something like

Now the process never stops wiggling around, but it does have something known as a **stationary distribution**, which gives the probability that the population is in any given state *in the long run*.

For populations with more than two types the basic ideas are the same, but there are more neighboring states that the population could move to, and many more states in the Markov process. One can also use more complicated mutation matrices, but this setup is good enough to typically guarantee that no one species completely takes over. For interesting behaviors, typically is a good choice (there’s some biological evidence that mutation rates are typically inversely proportional to genome size).

Without mutation, once the population reached or it stayed there. Now the population bounces between states, either because of drift, selection, or mutation. Based on our stability theorems for evolutionarily stable states, it’s reasonable to hope that for small mutation rates and larger populations (less drift), the population should spend most of its time near the evolutionarily stable state. This can be measured by the stationary distribution which gives the long run probabilities of a process being in a given state.

Previous work by Claussen and Traulsen:

• Jens Christian Claussen and Arne Traulsen, Non-Gaussian fluctuations arising from finite populations: exact results for the evolutionary Moran process, *Physical Review E* **71** (2005), 025101.

suggested that the stationary distribution is at least sometimes maximal around evolutionarily stable states. Specifically, they showed that for a very similar model with fitness landscape given by

the stationary state is essentially a binomial distribution centered at

Unfortunately, the stationary distribution can be very difficult to compute for an arbitrary Markov chain. While it can be computed for the Markov process described above without mutation, and in the case studied by Claussen and Traulsen, there’s no general analytic formula for the process with mutation, nor for more than two types, because the processes are not *reversible*. Since we can’t compute the stationary distribution analytically, we’ll have to find another way to show that the local maxima of the stationary distribution are “evolutionarily stable”. We can approximate the stationary distribution fairly easily with a computer, so it’s easy to plot the results for just about any landscape and reasonable population size (e.g. ).

It turns out that we can use a relative entropy minimization approach, just like for the continuous replicator equation! But how? We lack some essential ingredients such as deterministic and differentiable trajectories. Here’s what we do:

• We show that the local maxima and minima of the stationary distribution satisfy a *complex balance* criterion.

• We then show that these states minimize an *expected* relative entropy.

• This will mean that the current state and the *expected next state* are ‘close’.

• Lastly, we show that these states satisfy an analogous definition of evolutionary stability (now incorporating mutation).

The relative entropy allows us to measure how close the current state is to the expected next state, which captures the idea of stability in another way. This ports the relative minimization Lyapunov result to some more realistic Markov chain models. The only downside is that we’ll assume the populations are “sufficiently large”, but in practice for populations of three types, is typically enough for common fitness landscapes (there are lots of examples here for which are prettier than the smaller populations). The reason for this is that the population state needs enough “resolution” to get sufficiently close to the stable state, which is not necessarily a ratio of integers. If you allow some wiggle room, smaller populations are still typically pretty close.

Evolutionarily stable states are closely related to Nash equilibria, which have a nice intuitive description in traditional game theory as “states that no player has an incentive to deviate from”. But in evolutionary game theory, we don’t use a game matrix to compute e.g. maximum payoff strategies, rather the game matrix defines a fitness landscape which then determines how natural selection unfolds.

We’re going to see this idea again in a moment, and to help get there let’s introduce an function called an **incentive** that encodes how a fitness landscape is used for selection. One way is to simply replace the quantities and in the fitness-proportionate selection ratio above, which now becomes (for two population types):

Here and are the incentive function components that determine how the fitness landscape is used for natural selection (if at all). We have seen two examples above:

for the Moran process and fitness-proportionate selection, and

for an alternative that incorporates a strength of selection term preventing division by zero for fitness landscapes defined by zero-sum game matrices, such as a rock-paper-scissors game. Using an incentive function also simplifies the transition probabilities and results as we move to populations of more than two types. Introducing mutation, we can describe the ratio for incentive-proportion selection with mutation for the th population type when the population is in state as

for some matrix of mutation probabilities This is just the probability that we get a new individual of the th type (by birth and/or mutation). A common choice for the mutation matrix is to use a single mutation probability and spread it out over all the types, such as letting

and

Now we are ready to define the **expected next state** for the population and see how it captures a notion of stability. For a given state population in a multitype population, using to indicate the normalized population state consider all the neighboring states that the population could move to in one step of the process (one birth-death cycle). These neighboring states are the result of increasing a population type by one (birth) and decreasing another by one (death, possibly the same type), of course excluding cases on the boundary where the number of individuals of any type drops below zero or rises above Now we can define the expected next state as the sum of neighboring states weighted by the transition probabilities

with transition probabilities given by

for states that differ in at the th coordinate and at th coordinate from Here is just the probability of the random death of an individual of the th type, so the transition probabilities are still just birth (with mutation) and death as for the Moran process we started with.

Skipping some straightforward algebraic manipulations, we can show that

Then it’s easy to see that if and only if and that if and only if So we have a nice description of ‘stability’ in terms of fixed points of the expected next state function and the incentive function

and we’ve gotten back to “no one has an incentive to deviate”. More precisely, for the Moran process

and we get back for every type. So we take as our analogous condition to an evolutionarily stable state, though it’s just the ‘no motion’ part and not also the ‘stable’ part. That’s what we need the stationary distribution for!

To turn this into a useful number that measures stability, we use the relative entropy of the expected next state and the current state, in analogy with the Lyapunov theorem for the replicator equation. The relative entropy

has the really nice property that if and only if so we can use the relative entropy as a measure of *how close to stable* any particular state is! Here the expected next state takes the place of the ‘evolutionarily stable state’ in the result described last time for the replicator equation.

Finally, we need to show that the maxima (and minima) of of the stationary distribution are these fixed points by showing that these states minimize the expected relative entropy.

Seeing that local maxima and minima of the stationary distribution minimize the expected relative entropy is a more involved, so let’s just sketch the details. In general, these Markov processes are not reversible, so they don’t satisfy the detailed-balance condition, but the stationary probabilities do satisfy something called the global balance condition, which says that for the stationary distribution we have that

When the stationary distribution is at a local maximum (or minimum), we can show essentially that this implies (up to an for a large enough population) that

a sort of probability inflow-outflow equation, which is very similar to the condition of complex balanced equilibrium described by Manoj Gopalkrishnan in this Azimuth post. With some algebraic manipulation, we can show that these states have

Now let’s look again at the figures from the start. The first shows the vector field of the replicator equation:

You can see rest points at the center, on the center of each boundary edge, and on the corner points. The center point is evolutionarily stable, the center points of the boundary are semi-stable (but stable when the population is restricted to a boundary simplex), and the corner points are unstable.

This one shows the stationary distribution for a finite population model with a Fermi incentive on the same landscape, for a population of size 80:

A fixed population size gives a partitioning of the simplex, and each triangle of the partition is colored by the value of the stationary distribution. So you can see that there are local maxima in the center and on the centers of the triangle boundary edges. In this case, the size of the mutation probability determines how much of the stationary distribution is concentrated on the center of the simplex.

This shows one-half of the Euclidean distance squared between the current state and the expected next state:

And finally, this shows the same thing but with the relative entropy as the ‘distance function’:

As you can see, the Euclidean distance is locally minimal at each of the local maxima and minima of the stationary distribution (including the corners); the relative entropy is only guaranteed so on the interior states (because the relative entropy doesn’t play nicely with the boundary, and unlike the replicator equation, the Markov process can jump on and off the boundary). It turns out that the relative Rényi entropies for between 0 and 1 also work just fine, but for the large population limit (the replicator dynamic), the relative entropy is the somehow the right choice for the replicator equation (has the derivative that easily gives Lyapunov stability), which is due to the connections between relative entropy and Fisher information in the information geometry of the simplex. The Euclidean distance is the case and the ordinary relative entropy is

As it turns out, something very similar holds for another popular finite population model, the Wright–Fisher process! This model is more complicated, so if you are interested in the details, check out our paper, which has many nice examples and figures. We also define a process that bridges the gap between the atomic nature of the Moran process and the generational nature of the Wright–Fisher process, and prove the general result for that model.

Finally, let’s see how the Moran process relates back to the replicator equation (see also the appendix in this paper), and how we recover the stability theory of the replicator equation. We can use the transition probabilities of the Moran process to define a stochastic differential equation (called a Langevin equation) with drift and diffusion terms that are essentially (for populations with two types:

As the population size gets larger, the diffusion term drops out, and the stochastic differential equation becomes essentially the replicator equation. For the stationary distribution, the variance (e.g. for the binomial example above) also has an inverse dependence on so the distribution limits to a delta-function that is zero except for at the evolutionarily stable state!

What about the relative entropy? Loosely speaking, as the population size gets larger, the iteration of the expected next state also becomes deterministic. Then the evolutionarily stable states is a fixed point of the expected next state function, and the expected relative entropy is essentially the same as the ordinary relative entropy, at least in a neighborhood of the evolutionarily stable state. This is good enough to establish local stability.

Earlier I said both the local maxima and minima minimize the expected relative entropy. Dash and I haven’t proven that the local maxima always correspond to evolutionarily stable states (and the minima to unstable states). That’s because the generalization of evolutionarily stable state we use is really just a ‘no motion’ condition, and isn’t strong enough to imply stability in a neighborhood for the deterministic replicator equation. So for now we are calling the local maxima **stationary stable** states.

We’ve also tried a similar approach to populations evolving on networks, which is a popular topic in evolutionary graph theory, and the results are encouraging! But there are many more ‘states’ in such a process, since the configuration of the network has to be taken into account, and whether the population is clustered together or not. See the end of our paper for an interesting example of a population on a cycle.

Nice post!

In connection with this simple case “For a moment consider a population of just two types A and B.” This is very similar to the following:

Suppose you have a stochastic Petri net with two species A and B and the following transitions

A+B -> 2A at rate u

A+B -> 2B at rate u

A -> B at rate v

B -> A at rate v

The total number of A’a and B’s is a constant N.This system is simple enough to find the equilibrium distribution, which is a beta-binomial distribution (http://en.wikipedia.org/wiki/Beta-binomial_distribution) with α=β=v/u. If v>u this is unimodal, with a maximum at N/2, and if u>v, this is U-shaped.

It uses continuous time and rates not transistion probabilities, and I made it symmetric, but I think its nearly the same thing. And I guess it extends to more than two types with a Dirichlet-multinomial distribution equilibrium distribution.

A couple of typos.

After “that for a very similar model with fitness landscape given by” f_b not f_B.

After “selection ratio above, which now becomes (for two population types):” raw latex.

I’ll fix the typos—thanks!

It is interesting the real evolution equation.

I have a simple idea, but I don’t know if it is possible the complete applicability.

There is a simple class of equation with only integer solution (I am thinking the Fibonacci series, for example), and the dynamic of these system can be solved with the finite difference equation.

So that the Moran process is not applicable in this case, but it is possible to write equations with integer solution using real population, and if the continuous equation have solution, then it is probable to obtain integer solutions (if the classical solution is too complex, then the integer solution is too complex).

Then it is possible an extension of gradient in the discrete equation (for a potential with discrete values, with an arbitrary assignment of integer values in a grid of i-j, and with an analogy with the classical field, but without scale invariance).

Perhaps it might be possible even an Hamiltonian, and Lagrangian, description with integer solution, and the asymptotic behavior could be a simple potential (quadratic, or linear integer potential).

If you information fans haven’t seen this, you might be interested:

http://www.american.edu/cas/economics/info-metrics/prize.cfm

Marc Harper spoke about information in evolutionary game theory, and we have a nice video of that. I’ve been excited about his work for quite a while, because it shows that the analogy between ‘evolution’ and ‘learning’ can be made mathematically precise. I summarized some of his ideas in my information geometry series, and I’ve also gotten him to write two articles for this blog:

• Marc Harper, Relative entropy in evolutionary dynamics,

Azimuth, 22 January 2014.• Marc Harper, Stationary stability in finite populations,

Azimuth, 24 March 2015.Here are the slides and video of his talk:

• Marc Harper, Information transport and evolutionary dynamics.