The Mathematics of Biodiversity (Part 4)

2 July, 2012


Today the conference part of this program is starting:

Research Program on the Mathematics of Biodiversity, June-July 2012, Centre de Recerca Matemàtica, Barcelona, Spain. Organized by Ben Allen, Silvia Cuadrado, Tom Leinster, Richard Reeve and John Woolliams.

Lou Jost kicked off the proceedings with an impassioned call to think harder about fundamental concepts:

• Lou Jost, Why biologists should care about the mathematics of biodiversity.

Then Tom Leinster gave an introduction to some of these concepts, and Lou explained how they show up in ecology, genetics, economics and physics.

Suppose we have n different species on an island. Suppose a fraction p_i of the organisms belong to the ith species. So,

\displaystyle{ \sum_{i=1}^n p_i = 1}

and mathematically we can treat these numbers as probabilities.

People have many ways to compute the ‘biodiversity’ from these numbers. Some of these can be wildly misleading when applied incorrectly, and this has led to shocking errors. For example, in genetics, a commonly used formula for determining when plants or animals on a bunch of islands will split into separate species is completely wrong.

In fact, if we’re not careful, some measures of biodiversity can fool us into thinking we’re saving most of the biodiversity when we’re actually losing almost all of it!

One good example involves measures of similarity between tropical butterflies in the canopy (the top of the forest) and the understory (the bottom). According to Lou Just, some published studies say the similarity is about 95%. That sounds like the two communities are almost the same. However, almost no butterflies living in the canopy live in the understory, and vice versa! The problem is that mathematics is being used inappropriately.

Here are four famous measures of biodiversity:

Species richness. This is just the number of species:

n

Shannon entropy. This is the expected amount of information you gain when someone tells you which species an organism belongs to:

\displaystyle{ - \sum_{i=1}^n p_i \ln(p_i) }

• The inverse Simpson index. This is the reciprocal of the probability that two randomly chosen organisms belong to the same species:

\displaystyle{ 1 \big/ \sum_{i=1}^n p_i^2 }

The probability that two organisms belong to the same species is called the Simpson index:

\displaystyle{ \sum_{i=1}^n p_i^2 }

This is used in economics as a measure of the concentration of wealth, where p_i is the fraction of wealth owned by the ith individual. Be careful: there’s a lot of different jargon in different fields, so it’s easy to get confused at first! For example, the probability that two organisms belong to different species is often called the Gini–Simpson index:

\displaystyle{ 1 - \sum_{i=1}^n p_i^2 }

It was introduced by the statistician Corrado Gini a century ago, in 1912 and the ecologist Edward H. Simpson in 1949. It’s also called the heterozygosity in genetics.

• The Berger–Parker index. This is the fraction of organisms that belong to the most common species:

\mathrm{max} \, p_i

So, unlike the other main ones I’ve listed, this quantity tends to go down when biodiversity goes up. To fix this we could take its reciprocal, as we did with the Simpson index.

What a mess, eh? But here’s some good news: all these quantities are functions of a single quantity, the Rényi entropy:

\displaystyle{ H_q(p) = \frac{1}{1 -q} \ln \sum_{i=1}^n p_i^q  }

for various values of the parameter q.

I’ve written about the Rényi entropies and their role in thermodynamics before on this blog. I’ll also talk about it later in this conference, and I’ll show you my slides. So, I won’t repeat that story here. Suffice it to say that Rényi entropies are fascinating but still a bit mysterious to me.

But one of Lou Jost’s main points is that we can make bad mistakes if we work with Rényi entropies when we should be working with their exponentials, which are called Hill numbers and denoted by a D, for ‘diversity’:

\displaystyle{ {}^qD(p) = e^{H_q(p)} =   \left(\sum_{i=1}^n p_i^q \right)^{\frac{1}{1-q}}  }

These were introduced by M. O. Hill in 1973. One reason they’re good is that they are effective numbers. This means that if all the species are equally common, the Hill number equals the number of species, regardless of q:

p_i = \frac{1}{n} \; \Longrightarrow \; {}^qD(p) = n

So, they’re a way of measuring an ‘effective’ number of species in situations where species are not all equally common.

A closely related fact is that the Hill numbers obey the replication principle. This means that if we have probability distributions on two finite sets, each with Hill number X for some choice of q, and we combine them with equal weights to get a probability distribution on the disjoint union of those sets, the resulting distribution has Hill number 2X.

Another good fact is that the Hill numbers are as large as possible when all the probabilities p_i are equal. They’re as small as possible, namely 1, when one of the p_i equals 1 and the rest are zero.

Let’s see how all the measures of biodiversity I listed are either Hill numbers or can easily be converted to Hill numbers. We’ll also see that at q = 0, the Hill number treats all species that are present in an equal way, regardless of their abundance. As q increases, it counts more abundant species more heavily, since we’re raising the probabilities p_i to a bigger power. And when q = \infty, we only care about the most abundant species: none of the others matter at all!

Here goes:

• The species richness is the limit of the Hill numbers as q \to 0 from above:

\displaystyle{ \lim_{q \to 0^+} {}^qD (p) = n }

So, we can just call this {}^0D(p).

• The exponential of the Shannon entropy is the limit of the Hill numbers as q \to 1:

\displaystyle{ \lim_{q \to 1} {}^qD(p) = \exp\left(- \sum_{i=1}^n p_i \ln(p_i)\right) }

So, we can just call this {}^1D(p).

• The inverse Simpson index is the Hill number at q = 2:

\displaystyle{  {}^2D(p) =  1 \big/ \sum_{i=1}^n p_i^2 }

• The reciprocal of the Berger–Parker index is the limit of Hill numbers as q \to +\infty:

\displaystyle{ \lim_{q \to +\infty} {}^qD(p) = 1 \big/ \mathrm{max} \, p_i }

so we can call this quantity {}^\infty D(p).

These facts mean that understanding Hill numbers will help us understand lots of measures of biodiversity! And the good properties of Hill numbers will help us avoid dangerous mistakes.

For mathematicians, a good challenge is to find theorems uniquely characterizing the Hill numbers…. preferably with assumptions that biologists will accept as plausible facts about ‘diversity’. Some theorems like this already exist for specific choices of q, but it will be better to characterize the function {}^q D for all values of q in one blow. Tom Leinster is working on such a theorem now.

Another important task is to generalize Hill numbers to take into account things like:

• ‘distances’ between species, measured either genetically, phylogenetically or functionally,

• ‘values’ for species, measured either economically or
any other way.

There’s a lot of work on this, and many of the talks here conference will discuss these generalizations.


The Mathematics of Biodiversity (Part 3)

27 June, 2012

We tend to think of biodiversity as a good thing, but sometimes it’s deadly. Yesterday Andrei Korobeinikov gave a talk on ‘Viral evolution within a host’, which was mainly about AIDS.

The virus that causes this disease, HIV, can reproduce very fast. In an untreated patient near death there are between 1010 and 1012 new virions per day! Remember, a virion is an individual virus particle. The virus also has a high mutation rate: about 3 × 10-5 mutations per generation for each base—that is, each molecule of A,T,C, or G in the RNA of the virus. That may not seem like a lot, but if you multiply it by 1012 you’ll see that a huge number of new variations of each base arise within the body of a single patient.

So, evolution is at work within you as you die.

And in fact, many scientists believe that the diversity of the virus eventually overwhelms your immune system! Although it’s apparently not quite certain, it seems that while the body generates B cells and T cells to attack different variants of HIV as they arise, they eventually can’t keep up with the sheer number of variants.

Of course, the fact that the HIV virus attacks the immune system makes the disearse even worse. Here in blue you see the number of T cells per cubic millimeter of blood, and in red you see the number of virions per cubic centimeter of blood for a typical untreated patient:

Mathematicians and physicists have looked at some very simple models to get a qualitative understanding of these issues. One famous paper that started this off is:

• Lev S. Tsimring, Herbert Levine and David A. Kessler, RNA virus evolution via a fitness-space model, Phys. Rev. Lett. 76 (1996), 4440–4443.

The idea here is to say that at any time t the viruses have a probability density p(r,t) of having fitness r. In fact the different genotypes of the virus form a cloud in a higher-dimensional space, but these authors are treating that space is 1-dimensional, with fitness as its one coordinate, just to keep things simple. They then write down an equation for how the population density changes with time:

\displaystyle{\frac{\partial }{\partial t}p(r,t) = (r - \langle r \rangle)\, p(r,t) + D \frac{\partial^2 }{\partial r}p(r,t) - \frac{\partial}{\partial r}(v_{\mathrm{drift}}\, p(r,t)) }

This is a replication-mutation-drift equation. If we just had

\displaystyle{\frac{\partial }{\partial t}p(r,t) = (r - \langle r \rangle)\, p(r,t) }

this would be a version of the replicator equation, which I explained recently in Information Geometry (Part 9). Here

\displaystyle{ \langle r \rangle = \int_0^\infty r p(r,t) dr }

is the mean fitness, and the replicator equations says that the fraction of organisms of a given type grows at a rate proportional to how much their fitness exceeds the mean fitness: that’s where the (r - \langle r \rangle) comes from.

If we just had

\displaystyle{\frac{\partial }{\partial t}p(r,t) =  D \frac{\partial^2 }{\partial r^2}p(r,t) }

this would be the heat equation, which describes diffusion occurring at a rate D. This models the mutation of the virus, though not in a very realistic way.

If we just had

\displaystyle{\frac{\partial}{\partial t} p(r,t) =  - \frac{\partial}{\partial r}(v_{\mathrm{drift}} \, p(r,t)) }

the fitness of the virus would increase at rate equal to the drift velocity v_{\mathrm{drift}}.

If we include both the diffusion and drift terms:

\displaystyle{\frac{\partial }{\partial t} p(r,t) =  D \frac{\partial^2 }{\partial r^2}p(r,t) - \frac{\partial}{\partial r}(v_{\mathrm{drift}} \, p(r,t)) }

we get the Fokker–Planck equation. This is a famous model of something that’s spreading while also drifting along at a constant velocity: for example, a drop of ink in moving water. Its solutions look like this:

Here we start with stuff concentrated at one point, and it spreads out into a Gaussian while drifting along.

By the way, watch out: what biologists call ‘genetic drift’ is actually a form of diffusion, not what physicists call ‘drift’.

More recently, people have looked at another very simple model. You can read about it here:

• Martin A. Nowak, and R. M. May, Virus Dynamics, Oxford University Press, Oxford, 2000.

In this model the variables are:

• the number of healthy human cells of some type, \mathrm{H}(t)

• the number of infected human cells of that type, \mathrm{I}(t)

• the number of virions, \mathrm{V}(t)

These are my names for variables, not theirs. It’s just a sick joke that these letters spell out ‘HIV’.

Chemists like to describe how molecules react and turn into other molecules using ‘chemical reaction networks’. You’ve seen these if you’ve taken chemistry, but I’ve been explaining more about the math of these starting in Network Theory (Part 17). We can also use them here! Though May and Nowak probably didn’t put it this way, we can consider a chemical reaction network with the following 6 reactions:

• the production of a healthy cell:

\longrightarrow \mathrm{H}

• the infection of a healthy cell by a virion:

\mathrm{H} + \mathrm{V} \longrightarrow \mathrm{I}

• the production of a virion by an infected cell:

\mathrm{I} \longrightarrow \mathrm{I} + \mathrm{V}

• the death of a healthy cell:

\mathrm{H} \longrightarrow

• the death of a infected cell:

\mathrm{I} \longrightarrow

• the death of a virion:

\mathrm{V} \longrightarrow

Using a standard recipe which I explained, we can get from this chemical reaction network to some ‘rate equations’ saying how the number of healthy cells, infected cells and virions changes with time:

\displaystyle{ \frac{d\mathrm{H}}{dt}  =   \alpha - \beta \mathrm{H}\mathrm{V} - \gamma \mathrm{H} }

\displaystyle{ \frac{d\mathrm{I}}{dt}  = \beta \mathrm{H}\mathrm{V} - \delta \mathrm{I} }

\displaystyle{ \frac{d\mathrm{V}}{dt}  = - \beta \mathrm{H}\mathrm{V} + \epsilon \mathrm{I} - \zeta \mathrm{V} }

The Greek letters are constants called ‘rate constants’, and there’s one for each of the 6 reactions. The equations we get this way are exactly those described by Nowak and May!

What Andrei Korobeinikov is to unify the ideas behind the two models I’ve described here. Alas, I don’t have the energy to explain how. Indeed, I don’t even have the energy to explain what the models I’ve described actually predict. Sad, but true.

I don’t see anything online about Korobeinikov’s new work, but you can read some of his earlier work here:

• Andrei Korobeinikov, Global properties of basic virus dynamics models.

• Suzanne M. O’Regan, Thomas C. Kelly, Andrei Korobeinikov, Michael J. A. O’Callaghan and Alexei V. Pokrovskii, Lyapunov functions for SIR and SIRS epidemic models, Appl. Math. Lett. 23 (2010), 446-448.

The SIR and SIRS models are models of disease that also arise from chemical reaction networks. I explained them back in Network Theory (Part 3). That was before I introduced the terminology of chemical reaction networks… back then I was talking about ‘stochastic Petri nets’, which are an entirely equivalent formalism. Here’s the stochastic Petri net for the SIRS model:

Puzzle: Draw the stochastic Petri net for the HIV model discussed above. It should have 3 yellow circles and 6 aqua squares.


Information Geometry (Part 13)

26 June, 2012

Last time I gave a sketchy overview of evolutionary game theory. Now let’s get serious.

I’ll start by explaining ‘Nash equilibria’ for 2-person games. These are situations where neither player can profit by changing what they’re doing. Then I’ll introduce ‘mixed strategies’, where the players can choose among several strategies with different probabilities. Then I’ll introduce evolutionary game theory, where we think of each strategy as a species, and its probability as the fraction of organisms that belong to that species.

Back in Part 9, I told you about the ‘replicator equation’, which says how these fractions change with time thanks to natural selection. Now we’ll see how this leads to the idea of an ‘evolutionarily stable strategy’. And finally, we’ll see that when evolution takes us toward such a stable strategy, the amount of information the organisms have ‘left to learn’ keeps decreasing!

Nash equilibria

We can describe a certain kind of two-person game using a payoff matrix, which is an n \times n matrix A_{ij} of real numbers. We think of A_{ij} as the payoff that either player gets if they choose strategy i and their opponent chooses strategy j.

Note that in this kind of game, there’s no significant difference between the ‘first player’ and the ‘second player’: either player wins an amount A_{ij} if they choose strategy i and their opponent chooses strategy j. So, this kind of game is called symmetric even though the matrix A_{ij} may not be symmetric. Indeed, it’s common for this matrix to be antisymmetric, meaning A_{ij} = - A_{ji}, since in this case what one player wins, the other loses. Games with this extra property are called zero-sum games. But we won’t limit ourselves to those!

We say a strategy i is a symmetric Nash equilibrium if

A_{ii} \ge A_{ji}

for all j. This means that if both players use strategy i, neither gains anything by switching to another strategy.

For example, suppose our matrix is

\left( \begin{array}{rr} -1 & -12 \\  0 & -3 \end{array} \right)

Then we’ve got the Prisoner’s Dilemma exactly as described last time! Here strategy 1 is cooperate and strategy 2 is defect. If a player cooperates and so does his opponent, he wins

A_{11} = -1

meaning he gets one month in jail. We include a minus sign because ‘winning a month in jail’ is not a good thing. If the player cooperates but his opponent defects, he gets a whole year in jail:

A_{12} = -12

If he defects but his opponent cooperates, he doesn’t go to jail at all:

A_{21} = 0

And if they both defect, they both get three months in jail:

A_{22} = -3

You can see that defecting is a Nash equilibrium, since

A_{22} \ge A_{12}

So, oddly, if our prisoners know game theory and believe Nash equilibria are best, they’ll both be worse off than if they cooperate and don’t betray each other.

    

Nash equilibria for mixed strategies

So far we’ve been assuming that with 100% certainty, each player chooses one strategy i = 1,2,3,\dots, n. Since we’ll be considering more general strategies in a minute, let’s call these pure strategies.

Now let’s throw some probability theory into the stew! Let’s allow the players to pick different pure strategies with different probabilities. So, we define a mixed strategy to be a probability distribution on the set of pure strategies. In other words, it’s a list of n nonnegative numbers

p_i \ge 0

that sum to one:

\displaystyle{ \sum_{i=1}^n p_i = 1 }

Say I choose the mixed strategy p while you, my opponent, choose the mixed strategy q. Say our choices are made independently. Then the probability that I choose the pure strategy i while you chose j is

p_i q_j

so the expected value of my winnings is

\displaystyle{ \sum_{i,j = 1}^n p_i A_{ij} q_j }

or using vector notation

p \cdot A q

where the dot is the usual dot product on \mathbb{R}^n.

We can easily adapt the concept of Nash equilibrium to mixed strategies. A mixed strategy q is a symmetric Nash equilibrium if for any other mixed strategy p,

q \cdot A q \ge  p \cdot A q

This means that if both you and I are playing the mixed strategy q, I can’t improve my expected winnings by unilaterally switching to the mixed strategy p. And neither can you, because the game is symmetric!

If this were a course on game theory, I would now do some examples. But it’s not, so I’ll just send you to page 6 of Sandholm’s paper: he looks at some famous games like ‘hawks and doves’ and ‘rock paper scissors’.

Evolutionarily stable strategies

We’re finally ready to discuss evolutionarily stable strategies. To do this, let’s reinterpret the ‘pure strategies’ i = 1,2,3, \dots n as species. Here I don’t necessarily mean species in the classic biological sense: I just mean different kinds of self-replicating entities, or replicators. For example, they could be different alleles of the same gene.

Similarly, we’ll reinterpret the ‘mixed strategy’ p as describing a mixed population of replicators, where the fraction of replicators belonging to the ith species is p_i. These numbers are still probabilities: p_i is the probability that a randomly chosen replicator will belong to the ith species.

We’ll reinterpret the payoff matrix A_{ij} as a fitness matrix. In our earlier discussion of the replicator equation, we assumed that the population P_i of the ith species grew according to the replicator equation

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) P_i }

where the fitness function f_i is any smooth function of the populations of each kind of replicator.

But in evolutionary game theory it’s common to start by looking at a simple special case where

\displaystyle{f_i(P_1, \dots, P_n)   = \sum_{j=1}^n A_{ij} p_j }

where

\displaystyle{ p_j = \frac{P_j}{\sum_k P_k} }

is the fraction of replicators who belong to the jth species.

What does this mean? The idea is that we have a well-mixed population of game players—or replicators. Each one has its own pure strategy—or species. Each one randomly roams around and ‘plays games’ with each other replicator it meets. It gets to reproduce at a rate proportional to its expected winnings.

This is unrealistic in all sorts of ways, but it’s mathematically cute, and it’s been studied a lot, so it’s good to know about. Today I’ll explain evolutionarily stable strategies only in this special case. Later I’ll go back to the general case.

Suppose that we select a sample of replicators from the overall population. What is the mean fitness of the replicators in this sample? For this, we need to know the probability that a replicator from this sample belongs to the ith species. Say it’s q_j. Then the mean fitness of our sample is

\displaystyle{ \sum_{i,j=1}^n q_i A_{ij} p_j }

This is just a weighted average of the fitnesses in our earlier formula. But using the magic of vectors, we can write this sum as

q \cdot A p

We already saw this type of expression in the last section! It’s my expected winnings if I play the mixed strategy q and you play the mixed strategy p.

John Maynard Smith defined q to be evolutionarily stable strategy if when we add a small population of ‘invaders’ distributed according to any other probability distribution p, the original population is more fit than the invaders.

In simple terms: a small ‘invading’ population will do worse than the population as a whole.

Mathematically, this means:

q \cdot A ((1-\epsilon)q + \epsilon p) >  p \cdot A ((1-\epsilon)q + \epsilon p)

for all mixed strategies p and all sufficiently small \epsilon \ge 0 . Here

(1-\epsilon)q + \epsilon p

is the population we get by replacing an \epsilon-sized portion of our original population by invaders.

Puzzle: Show that q is an evolutionarily stable strategy if and only these two conditions hold for all mixed stategies p:

q \cdot A q \ge p \cdot A q

and also, for all q \ne p,

q \cdot A q = p \cdot A q \; \implies \; q \cdot A p > p \cdot A p

The first condition says that q is a symmetric Nash equilibrium. In other words, the invaders can’t on average be better playing against the original population than members of the original population are. The second says that if the invaders are just as good at playing against the original population, they must be worse at playing against each other! The combination of these conditions means the invaders won’t take over.

Again, I should do some examples… but instead I’ll refer you to page 9 of Sandholm’s paper, and also these course notes:

• Samuel Alizon and Daniel Cownden, Evolutionary games and evolutionarily stable strategies.

• Samuel Alizon and Daniel Cownden, Replicator dynamics.

The decrease of relative information

Now comes the punchline… but with a slight surprise twist at the end. Last time we let

P = (P_1, \dots , P_n)

be a population that evolves with time according to the replicator equation, and we let p be the corresponding probability distribution. We supposed q was some fixed probability distribution. We saw that the relative information

I(q,p) = \displaystyle{ \sum_i \ln \left(\frac{q_i}{ p_i }\right) q_i }

obeys

\displaystyle{ \frac{d}{dt} I(q,p) =  (p - q) } \cdot f(P)

where f(P) is the vector of fitness functions. So, this relative information can never increase if

(p - q) \cdot f(P) \le 0

for all P.

We can adapt this to the special case we’re looking at now. Remember, right now we’re assuming

\displaystyle{f_i(P_1, \dots, P_n)   = \sum_{j=1}^n A_{ij} p_j }

so

f(P) = A p

Thus, the relative information will never increase if

(p - q) \cdot A p \le 0

or in other words,

q \cdot A p \ge p \cdot A p  \qquad \qquad \qquad \qquad \qquad \qquad  (1)

Now, this looks very similar to the conditions for an evolutionary stable strategy as stated in the Puzzle above. But it’s not the same! That’s the surprise twist.

Remember, the Puzzle says that q is an evolutionarily stable state if for all mixed strategies p we have

q \cdot A q \ge p \cdot A q  \qquad \qquad \qquad \qquad \qquad \qquad (2)

and also

q \cdot A q = p \cdot A q \; \implies \; q \cdot A p > p \cdot A p  \qquad \; (3)

Note that condition (1), the one we want, is neither condition (2) nor condition (3)! This drove me crazy for almost a day.

I kept thinking I’d made a mistake, like mixing up p and q somewhere. You’ve got to mind your p’s and q’s in this game!

But the solution turned out to be this. After Maynard Smith came up with his definition of ‘evolutionarily stable state’, another guy came up with a different definition:

• Bernhard Thomas, On evolutionarily stable sets, J. Math. Biology 22 (1985), 105–115.

For him, an evolutionarily stable strategy obeys

q \cdot A q \ge p \cdot A q  \qquad \qquad \qquad \qquad \qquad \qquad (2)

and also

q \cdot A p \ge p \cdot A p  \qquad \qquad \qquad \qquad \qquad \qquad  (1)

Condition (1) is stronger than condition (3), so he renamed Maynard Smith’s evolutionarily stable strategies weakly evolutionarily stable strategies. And condition (1) guarantees that the relative information I(q,p) can never increase. So, now we’re happy.

Except for one thing: why should we switch from Maynard Smith’s perfectly sensible concept of evolutionarily stable state to this new stronger one? I don’t really know, except that

• it’s not much stronger

and

• it lets us prove the theorem we want!

So, it’s a small mystery for me to mull over. If you have any good ideas, let me know.


Information Geometry (Part 12)

24 June, 2012

Last time we saw that if a population evolves toward an ‘evolutionarily stable state’, then the amount of information our population has ‘left to learn’ can never increase! It must always decrease or stay the same.

This result sounds wonderful: it’s a lot like the second law of thermodynamics, which says entropy must always increase. Of course there are some conditions for this wonderful result to hold. The main condition is that the population evolves according to the replicator equation. But the other is the existence of an evolutionarily stable state. Last time I wrote down the rather odd-looking definition of ‘evolutionary stable state’ without justifying it. I need to do that soon. But if you’ve never thought about evolutionary game theory, I think giving you a little background will help. So today let me try that.

Evolutionary game theory

We’ve been thinking of evolution as similar to inference or learning. In this analogy, organisms are like ‘hypotheses’, and the population ‘does experiments’ to see if these hypotheses make ‘correct predictions’ (i.e., can reproduce) or not. The successful ones are reinforced while the unsuccessful ones are weeded out. As a result, the population ‘learns’. And under the conditions of the theorem we discussed last time, the relative information—the amount ‘left to learn’—goes down!

While you might object to various points of this analogy, it’s useful—and that’s really all you can ask of an analogy. It’s useful because it lets us steal chunks of math from the subjects of Bayesian inference and machine learning and apply them to the study of biodiversity and evolution! This is what Marc Harper has been doing:

• Marc Harper, Information geometry and evolutionary game theory.

• Marc Harper, The replicator equation as an inference dynamic.

But now let’s bring in another analogy, also contained in Harper’s work. We can also think of evolution as similar to a game. In this analogy, organisms are like ‘strategies’—or if you prefer, they have strategies. The winners get to reproduce, while the losers don’t. John Maynard Smith started developing this analogy in 1973, and eventually wrote a whole book on it:

• John Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, 1982.

As far as I can tell, evolutionary game theory has brought almost as many chunks of math to game theory as it has taken from it. Maybe it’s just my ignorance showing, but it seems that game theory becomes considerably deeper when we think about games that many players play again and again, with the winners getting to reproduce, while the losers are eliminated.

According to William Sandholm:

The birth of evolutionary game theory is marked by the publication of a series of papers by mathematical biologist John Maynard Smith. Maynard Smith adapted the methods of traditional game theory, which were created to model the behavior of rational economic agents, to the context of biological natural selection. He proposed his notion of an evolutionarily stable strategy (ESS) as a way of explaining the existence of ritualized animal conflict.

Maynard Smith’s equilibrium concept was provided with an explicit dynamic foundation through a diff erential equation model introduced by Taylor and Jonker. Schuster and Sigmund, following Dawkins, dubbed this model the replicator dynamic, and recognized the close links between this game-theoretic dynamic and dynamics studied much earlier in population ecology and population genetics. By the 1980s, evolutionary game theory was a well-developed and firmly established modeling framework in biology.

Towards the end of this period, economists realized the value of the evolutionary approach to game theory in social science contexts, both as a method of providing foundations for the equilibrium concepts of traditional game theory, and as a tool for selecting among equilibria in games that admit more than one. Especially in its early stages, work by economists in evolutionary game theory hewed closely to the interpretation set out by biologists, with the notion of ESS and the replicator dynamic understood as modeling natural selection in populations of agents genetically programmed to behave in specific ways. But it soon became clear that models of essentially the same form could be used to study the behavior of populations of active decision makers. Indeed, the two approaches sometimes lead to identical models: the replicator dynamic itself can be understood not only as a model of natural selection, but also as one of imitation of successful opponents.

While the majority of work in evolutionary game theory has been undertaken by biologists and economists, closely related models have been applied to questions in a variety of fields, including transportation science, computer science, and sociology. Some paradigms from evolutionary game theory are close relatives of certain models from physics, and so have attracted the attention of workers in this field. All told, evolutionary game theory provides a common ground for workers from a wide range of disciplines.

The Prisoner’s Dilemma

In game theory, the most famous example is the Prisoner’s Dilemma. In its original form, this ‘game’ is played just once:

Two men are arrested, but the police don’t have enough information to convict them. So they separate the two men, and offer both the same deal: if one testifies against his partner (or defects), and the other remains silent (and thus cooperates with his partner), the defector goes free and the cooperator goes to jail for 12 months. If both remain silent, both are sentenced to only 1 month in jail for a minor charge. If they both defect, they both receive a 3-month sentence. Each prisoner must choose either to defect or cooperate with his partner in crime; neither gets to hear what the other decides. What will they do?

Traditional game theory emphasizes the so-called ‘Nash equilibrium’ for this game, in which both prisoners defect. Why don’t they both cooperate? They’d both be better off if they both cooperated. However, for them to both cooperate is ‘unstable’: either one could shorten their sentence by defecting! By definition, a Nash equilibrium has the property that neither player can improve his situation by unilaterally changing his strategy.

In the Prisoner’s Dilemma, the Nash equilibrium is not very nice: both parties would be happier if they’d only cooperate. That’s why it’s called a ‘dilemma’. Perhaps the most tragic example today is global warming. Even if all players would be better off if all cooperate to reduce carbon emissions, any one will be better off if everybody except themselves cooperates while they emit more carbon.

For this and many other reasons, people have been interested in ‘solving’ the Prisoner’s Dilemma: that is, finding reasons why cooperation might be favored over defection.

This book got people really excited in seeing what evolutionary game theory has to say about the Prisoner’s Dilemma:

• Robert Axelrod, The Evolution of Cooperation, Basic Books, New York, 1984. (A related article with the same title is available online.)

The idea is that under certain circumstances, strategies that are ‘nicer’ than defection will gradually take over. The most famous of these strategies is ‘tit for tat’, meaning that you cooperate the first time and after that do whatever your opponent just did. I won’t go into this further, because it’s a big digression and I’m already digressing too far. I’ll just mention that from the outlook of evolutionary game theory, the Prisoner’s Dilemma is still full of surprises. Just this week, some fascinating new work has been causing a stir:

• William Press and Freeman Dyson, Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent, Edge, 18 June 2012.

I hope I’ve succeeded in giving you a vague superficial sense of the history of evolutionary game theory and why it’s interesting. Next time I’ll get serious about the task at hand, which is to understand ‘evolutionarily stable strategies’. If you want to peek ahead, try this nice paper:

• William H. Sandholm, Evolutionary game theory, 12 November 2007.

This is where I got the long quote by Sandholm on the history of evolutionary game theory. The original quote contained lots of references; if you’re interested in those, go to page 3 of this paper.


Information Geometry (Part 11)

7 June, 2012

Last time we saw that given a bunch of different species of self-replicating entities, the entropy of their population distribution can go either up or down as time passes. This is true even in the pathetically simple case where all the replicators have constant fitness—so they don’t interact with each other, and don’t run into any ‘limits to growth’.

This is a bit of a bummer, since it would be nice to use entropy to explain how replicators are always extracting information from their environment, thanks to natural selection.

Luckily, a slight variant of entropy, called ‘relative entropy’, behaves better. When our replicators have an ‘evolutionary stable state’, the relative entropy is guaranteed to always change in the same direction as time passes!

Thanks to Einstein, we’ve all heard that times and distances are relative. But how is entropy relative?

It’s easy to understand if you think of entropy as lack of information. Say I have a coin hidden under my hand. I tell you it’s heads-up. How much information did I just give you? Maybe 1 bit? That’s true if you know it’s a fair coin and I flipped it fairly before covering it up with my hand. But what if you put the coin down there yourself a minute ago, heads up, and I just put my hand over it? Then I’ve given you no information at all. The difference is the choice of ‘prior’: that is, what probability distribution you attributed to the coin before I gave you my message.

My love affair with relative entropy began in college when my friend Bruce Smith and I read Hugh Everett’s thesis, The Relative State Formulation of Quantum Mechanics. This was the origin of what’s now often called the ‘many-worlds interpretation’ of quantum mechanics. But it also has a great introduction to relative entropy. Instead of talking about ‘many worlds’, I wish people would say that Everett explained some of the mysteries of quantum mechanics using the fact that entropy is relative.

Anyway, it’s nice to see relative entropy showing up in biology.

Relative Entropy

Inscribe an equilateral triangle in a circle. Randomly choose a line segment joining two points of this circle. What is the probability that this segment is longer than a side of the triangle?

This puzzle is called Bertrand’s paradox, because different ways of solving it give different answers. To crack the paradox, you need to realize that it’s meaningless to say you’ll “randomly” choose something until you say more about how you’re going to do it.

In other words, you can’t compute the probability of an event until you pick a recipe for computing probabilities. Such a recipe is called a probability measure.

This applies to computing entropy, too! The formula for entropy clearly involves a probability distribution, even when our set of events is finite:

S = - \sum_i p_i \ln(p_i)

But this formula conceals a fact that becomes obvious when our set of events is infinite. Now the sum becomes an integral:

S = - \int_X p(x) \ln(p(x)) \, d x

And now it’s clear that this formula makes no sense until we choose the measure d x. On a finite set we have a god-given choice of measure, called counting measure. Integrals with respect to this are just sums. But in general we don’t have such a god-given choice. And even for finite sets, working with counting measure is a choice: we are choosing to believe that in the absence of further evidence, all options are equally likely.

Taking this fact into account, it seems like we need two things to compute entropy: a probability distribution p(x), and a measure d x. That’s on the right track. But an even better way to think of it is this:

\displaystyle{ S = - \int_X  \frac{p(x) dx}{dx} \ln \left(\frac{p(x) dx}{dx}\right) \, dx }

Now we see the entropy depends two measures: the probability measure p(x)  dx we care about, but also the measure d x. Their ratio is important, but that’s not enough: we also need one of these measures to do the integral. Above I used the measure dx to do the integral, but we can also use p(x) dx if we write

\displaystyle{ S = - \int_X \ln \left(\frac{p(x) dx}{dx}\right) p(x) dx }

Either way, we are computing the entropy of one measure relative to another. So we might as well admit it, and talk about relative entropy.

The entropy of the measure d \mu relative to the measure d \nu is defined by:

\begin{array}{ccl} S(d \mu, d \nu) &=& \displaystyle{ - \int_X \frac{d \mu(x) }{d \nu(x)} \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right)  d\nu(x) } \\   \\  &=& \displaystyle{ - \int_X  \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right) d\mu(x) } \end{array}

The second formula is simpler, but the first looks more like summing -p \ln(p), so they’re both useful.

Since we’re taking entropy to be lack of information, we can also get rid of the minus sign and define relative information by

\begin{array}{ccl} I(d \mu, d \nu) &=& \displaystyle{ \int_X \frac{d \mu(x) }{d \nu(x)} \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right)  d\nu(x) } \\   \\  &=& \displaystyle{  \int_X  \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right) d\mu(x) } \end{array}

If you thought something was randomly distributed according to the probability measure d \nu, but then you you discover it’s randomly distributed according to the probability measure d \mu, how much information have you gained? The answer is I(d\mu,d\nu).

For more on relative entropy, read Part 6 of this series. I gave some examples illustrating how it works. Those should convince you that it’s a useful concept.

Okay: now let’s switch back to a more lowbrow approach. In the case of a finite set, we can revert to thinking of our two measures as probability distributions, and write the information gain as

I(q,p) = \displaystyle{  \sum_i  \ln \left(\frac{q_i}{p_i }\right) q_i}

If you want to sound like a Bayesian, call p the prior probability distribution and q the posterior probability distribution. Whatever you call them, I(q,p) is the amount of information you get if you thought p and someone tells you “no, q!”

We’ll use this idea to think about how a population gains information about its environment as time goes by, thanks to natural selection. The rest of this post will be an exposition of Theorem 1 in this paper:

• Marc Harper, The replicator equation as an inference dynamic.

Harper says versions of this theorem ave previously appeared in work by Ethan Akin, and independently in work by Josef Hofbauer and Karl Sigmund. He also credits others here. An idea this good is rarely noticed by just one person.

The change in relative information

So: consider n different species of replicators. Let P_i be the population of the ith species, and assume these populations change according to the replicator equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each function f_i depends smoothly on all the populations. And as usual, we let

\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }

be the fraction of replicators in the ith species.

Let’s study the relative information I(q,p) where q is some fixed probability distribution. We’ll see something great happens when q is a stable equilibrium solution of the replicator equation. In this case, the relative information can never increase! It can only decrease or stay constant.

We’ll think about what all this means later. First, let’s see that it’s true! Remember,

\begin{array}{ccl} I(q,p) &=& \displaystyle{ \sum_i  \ln \left(\frac{q_i}{ p_i }\right) q_i }  \\ \\ &=&  \displaystyle{ \sum_i  \Big(\ln(q_i) - \ln(p_i) \Big) q_i } \end{array}

and only p_i depends on time, not q_i, so

\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(q,p)}  &=& \displaystyle{ - \frac{d}{dt} \sum_i \ln(p_i)  q_i }\\   \\  &=& \displaystyle{ - \sum_i \frac{\dot{p}_i}{p_i} \, q_i } \end{array}

where \dot{p}_i is the rate of change of the probability p_i. We saw a nice formula for this in Part 9:

\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle  \Big) \, p_i }

where

f_i(P) = f_i(P_1, \dots, P_n)

and

\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i  }

is the mean fitness of the species. So, we get

\displaystyle{ \frac{d}{dt} I(q,p) } = \displaystyle{ - \sum_i \Big( f_i(P) - \langle f(P) \rangle  \Big) \, q_i }

Nice, but we can fiddle with this expression to get something more enlightening. Remember, the numbers q_i sum to one. So:

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt} I(q,p) } &=&  \displaystyle{  \langle f(P) \rangle - \sum_i f_i(P) q_i  } \\  \\ &=& \displaystyle{  \sum_i f_i(P) (p_i - q_i)  }  \end{array}

where in the last step I used the definition of the mean fitness. This result looks even cuter if we treat the numbers f_i(P) as the components of a vector f(P), and similarly for the numbers p_i and q_i. Then we can use the dot product of vectors to say

\displaystyle{ \frac{d}{dt} I(q,p) = f(P) \cdot (p - q) }

So, the relative information I(q,p) will always decrease if

f(P) \cdot (p - q) \le 0

for all choices of the population P.

And now something really nice happens: this is also the condition for q to be an evolutionarily stable state. This concept goes back to John Maynard Smith, the founder of evolutionary game theory. In 1982 he wrote:

A population is said to be in an evolutionarily stable state if its genetic composition is restored by selection after a disturbance, provided the disturbance is not too large.

I will explain the math next time—I need to straighten out some things in my mind first. But the basic idea is compelling: an evolutionarily stable state is like a situation where our replicators ‘know all there is to know’ about the environment and each other. In any other state, the population has ‘something left to learn’—and the amount left to learn is the relative information we’ve been talking about! But as time goes on, the information still left to learn decreases!

Note: in the real world, nature has never found an evolutionarily stable state… except sometimes approximately, on sufficiently short time scales, in sufficiently small regions. So we are still talking about an idealization of reality! But that’s okay, as long as we know it.


Information Geometry (Part 10)

4 June, 2012

Last time I began explaining the tight relation between three concepts:

• entropy,

• information—or more precisely, lack of information,

and

• biodiversity.

The idea is to consider n different species of ‘replicators’. A replicator is any entity that can reproduce itself, like an organism, a gene, or a meme. A replicator can come in different kinds, and a ‘species’ is just our name for one of these kinds. If P_i is the population of the ith species, we can interpret the fraction

\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }

as a probability: the probability that a randomly chosen replicator belongs to the ith species. This suggests that we define entropy just as we do in statistical mechanics:

\displaystyle{ S = - \sum_i p_i \ln(p_i) }

In the study of statistical inference, entropy is a measure of uncertainty, or lack of information. But now we can interpret it as a measure of biodiversity: it’s zero when just one species is present, and small when a few species have much larger populations than all the rest, but gets big otherwise.

Our goal here is play these viewpoints off against each other. In short, we want to think of natural selection, and even biological evolution, as a process of statistical inference—or in simple terms, learning.

To do this, let’s think about how entropy changes with time. Last time we introduced a simple model called the replicator equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each population grows at a rate proportional to some ‘fitness functions’ f_i. We can get some intuition by looking at the pathetically simple case where these functions are actually constants, so

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

The equation then becomes trivial to solve:

\displaystyle{ P_i(t) = e^{t f_i } P_i(0)}

Last time I showed that in this case, the entropy will eventually decrease. It will go to zero as t \to +\infty whenever one species is fitter than all the rest and starts out with a nonzero population—since then this species will eventually take over.

But remember, the entropy of a probability distribution is its lack of information. So the decrease in entropy signals an increase in information. And last time I argued that this makes perfect sense. As the fittest species takes over and biodiversity drops, the population is acquiring information about its environment.

However, I never said the entropy is always decreasing, because that’s false! Even in this pathetically simple case, entropy can increase.

Suppose we start with many replicators belonging to one very unfit species, and a few belonging to various more fit species. The probability distribution p_i will start out sharply peaked, so the entropy will start out low:

Now think about what happens when time passes. At first the unfit species will rapidly die off, while the population of the other species slowly grows:

 

So the probability distribution will, for a while, become less sharply peaked. Thus, for a while, the entropy will increase!

This seems to conflict with our idea that the population’s entropy should decrease as it acquires information about its environment. But in fact this phenomenon is familiar in the study of statistical inference. If you start out with strongly held false beliefs about a situation, the first effect of learning more is to become less certain about what’s going on!

Get it? Say you start out by assigning a high probability to some wrong guess about a situation. The entropy of your probability distribution is low: you’re quite certain about what’s going on. But you’re wrong. When you first start suspecting you’re wrong, you become more uncertain about what’s going on. Your probability distribution flattens out, and the entropy goes up.

So, sometimes learning involves a decrease in information—false information. There’s nothing about the mathematical concept of information that says this information is true.

Given this, it’s good to work out a formula for the rate of change of entropy, which will let us see more clearly when it goes down and when it goes up. To do this, first let’s derive a completely general formula for the time derivative of the entropy of a probability distribution. Following Sir Isaac Newton, we’ll use a dot to stand for a time derivative:

\begin{array}{ccl} \displaystyle{  \dot{S}} &=& \displaystyle{ -  \frac{d}{dt} \sum_i p_i \ln (p_i)} \\   \\  &=& - \displaystyle{ \sum_i \dot{p}_i \ln (p_i) + \dot{p}_i }  \end{array}

In the last term we took the derivative of the logarithm and got a factor of 1/p_i which cancelled the factor of p_i. But since

\displaystyle{  \sum_i p_i = 1 }

we know

\displaystyle{ \sum_i \dot{p}_i = 0 }

so this last term vanishes:

\displaystyle{ \dot{S}= -\sum_i \dot{p}_i \ln (p_i) }

Nice! To go further, we need a formula for \dot{p}_i. For this we might as well return to the general replicator equation, dropping the pathetically special assumption that the fitness functions are actually constants. Then we saw last time that

\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle  \Big) \, p_i }

where we used the abbreviation

f_i(P) = f_i(P_1, \dots, P_n)

for the fitness of the ith species, and defined the mean fitness to be

\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i  }

Using this cute formula for \dot{p}_i, we get the final result:

\displaystyle{ \dot{S} = - \sum_i \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i \ln (p_i) }

This is strikingly similar to the formula for entropy itself. But now each term in the sum includes a factor saying how much more fit than average, or less fit, that species is. The quantity - p_i \ln(p_i) is always nonnegative, since the graph of -x \ln(x) looks like this:

So, the ith term contributes positively to the change in entropy if the ith species is fitter than average, but negatively if it’s less fit than average.

This may seem counterintuitive!

Puzzle 1. How can we reconcile this fact with our earlier observations about the case when the fitness of each species is population-independent? Namely: a) if initially most of the replicators belong to one very unfit species, the entropy will rise at first, but b) in the long run, when the fittest species present take over, the entropy drops?

If this seems too tricky, look at some examples! The first illustrates observation a); the second illustrates observation b):

Puzzle 2. Suppose we have two species, one with fitness equal to 1 initially constituting 90% of the population, the other with fitness equal to 10 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 1, & &  p_1(0) = 0.9 \\ \\                            f_2 = 10 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

Puzzle 3. Suppose we have two species, one with fitness equal to 10 initially constituting 90% of the population, and the other with fitness equal to 1 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 10, & &  p_1(0) = 0.9 \\ \\                            f_2 = 1 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

I had to work through these examples to understand what’s going on. Now I do, and it all makes sense.

Next time

Still, it would be nice if there were some quantity that always goes down with the passage of time, reflecting our naive idea that the population gains information from its environment, and thus loses entropy, as time goes by.

Often there is such a quantity. But it’s not the naive entropy: it’s the relative entropy. I’ll talk about that next time. In the meantime, if you want to prepare, please reread Part 6 of this series, where I explained this concept. Back then, I argued that whenever you’re tempted to talk about entropy, you should talk about relative entropy. So, we should try that here.

There’s a big idea lurking here: information is relative. How much information a signal gives you depends on your prior assumptions about what that signal is likely to be. If this is true, perhaps biodiversity is relative too.


Information Geometry (Part 9)

1 June, 2012


It’s time to continue this information geometry series, because I’ve promised to give the following talk at a conference on the mathematics of biodiversity in early July… and I still need to do some of the research!

Diversity, information geometry and learning

As is well known, some measures of biodiversity are formally identical to measures of information developed by Shannon and others. Furthermore, Marc Harper has shown that the replicator equation in evolutionary game theory is formally identical to a process of Bayesian inference, which is studied in the field of machine learning using ideas from information geometry. Thus, in this simple model, a population of organisms can be thought of as a ‘hypothesis’ about how to survive, and natural selection acts to update this hypothesis according to Bayes’ rule. The question thus arises to what extent natural changes in biodiversity can be usefully seen as analogous to a form of learning. However, some of the same mathematical structures arise in the study of chemical reaction networks, where the increase of entropy, or more precisely decrease of free energy, is not usually considered a form of ‘learning’. We report on some preliminary work on these issues.

So, let’s dive in! To some extent I’ll be explaining these two papers:

• Marc Harper, Information geometry and evolutionary game theory.

• Marc Harper, The replicator equation as an inference dynamic.

However, I hope to bring in some more ideas from physics, the study of biodiversity, and the theory of stochastic Petri nets, also known as chemical reaction networks. So, this series may start to overlap with my network theory posts. We’ll see. We won’t get far today: for now, I just want to review and expand on what we did last time.

The replicator equation

The replicator equation is a simplified model of how populations change. Suppose we have n types of self-replicating entity. I’ll call these entities replicators. I’ll call the types of replicators species, but they don’t need to be species in the biological sense. For example, the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains.

Let P_i(t), or just P_i for short, be the population of the ith species at time t. Then the replicator equation says

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

So, the population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it can be any smooth function f_i of the populations of all the species. We call f_i(P_1, \dots, P_n) the fitness of the ith species.

Of course this model is absurdly general, while still leaving out lots of important effects, like the spatial variation of populations, or the ability for the population of some species to start at zero and become nonzero—which happens thanks to mutation. Nonetheless this model is worth taking a good look at.

Using the magic of vectors we can write

P = (P_1, \dots , P_n)

and

f(P) = (f_1(P), \dots, f_n(P))

This lets us write the replicator equation a wee bit more tersely as

\displaystyle{ \frac{d P}{d t} = f(P) P}

where on the right I’m multiplying vectors componentwise, the way your teachers tried to brainwash you into never doing:

f(P) P = (f(P)_1 P_1, \dots, f(P)_n P_n)

In other words, I’m thinking of P and f(P) as functions on the set \{1, \dots, n\} and multiplying them pointwise. This will be a nice way of thinking if we want to replace this finite set by some more general space.

Why would we want to do that? Well, we might be studying lizards with different length tails, and we might find it convenient to think of the set of possible tail lengths as the half-line [0,\infty) instead of a finite set.

Or, just to get started, we might want to study the pathetically simple case where f(P) doesn’t depend on P. Then we just have a fixed function f and a time-dependent function P obeying

\displaystyle{ \frac{d P}{d t} = f P}

If we’re physicists, we might write P more suggestively as \psi and write the operator multiplying by f as - H. Then our equation becomes

\displaystyle{ \frac{d \psi}{d t} = - H \psi }

This looks a lot like Schrödinger’s equation, but since there’s no factor of \sqrt{-1}, and \psi is real-valued, it’s more like the heat equation or the ‘master equation’, the basic equation of stochastic mechanics.

For an explanation of Schrödinger’s equation and the master equation, try Part 12 of the network theory series. In that post I didn’t include a minus sign in front of the H. That’s no big deal: it’s just a different convention than the one I want today. A more serious issue is that in stochastic mechanics, \psi stands for a probability distribution. This suggests that we should get probabilities into the game somehow.

The replicator equation in terms of probabilities

Luckily, that’s exactly what people usually do! Instead of talking about the population P_i of the ith species, they talk about the probability p_i that one of our organisms will belong to the ith species. This amounts to normalizing our populations:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Don’t you love it when notations work out well? Our big Population P_i has gotten normalized to give little probability p_i.

How do these probabilities p_i change with time? Now is the moment for that least loved rule of elementary calculus to come out and take a bow: the quotient rule for derivatives!

\displaystyle{ \frac{d p_i}{d t} = \left(\frac{d P_i}{d t} \sum_j P_j \quad - \quad P_i \sum_j \frac{d P_j}{d t}\right) \big{/} \left(  \sum_j P_j \right)^2 }

Using our earlier version of the replicator equation, this gives:

\displaystyle{ \frac{d p_i}{d t} =  \left(f_i(P) P_i \sum_j P_j \quad - \quad P_i \sum_j f_j(P) P_j \right) \big{/} \left(  \sum_j P_j \right)^2 }

Using the definition of p_i, this simplifies to:

\displaystyle{ \frac{d p_i}{d t} =  f_i(P) p_i \quad - \quad \left( \sum_j f_j(P) p_j \right) p_i }

The stuff in parentheses actually has a nice meaning: it’s just the mean fitness. In other words, it’s the average, or expected, fitness of an organism chosen at random from the whole population. Let’s write it like this:

\displaystyle{ \langle f(P) \rangle = \sum_j f_j(P) p_j  }

So, we get the replicator equation in its classic form:

\displaystyle{ \frac{d p_i}{d t} = \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i }

This has a nice meaning: for the fraction of organisms of the ith type to increase, their fitness must exceed the mean fitness. If you’re trying to increase market share, what matters is not how good you are, but how much better than average you are. If everyone else is lousy, you’re in luck.

Entropy

Now for something a bit new. Once we’ve gotten a probability distribution into the game, its entropy is sure to follow:

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

This says how ‘smeared-out’ the overall population is among the various different species. Alternatively, it says how much information it takes, on average, to say which species a randomly chosen organism belongs to. For example, if there are 2^N species, all with equal populations, the entropy S works out to N \ln 2. So in this case, it takes N bits of information to say which species a randomly chosen organism belongs to.

In biology, entropy is one of many ways people measure biodiversity. For a quick intro to some of the issues involved, try:

• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.

• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.

But we don’t need to understand this stuff to see how entropy is connected to the replicator equation. Marc Harper’s paper explains this in detail:

• Marc Harper, The replicator equation as an inference dynamic.

and I hope to go through quite a bit of it here. But not today! Today I just want to look at a pathetically simple, yet still interesting, example.

Exponential growth

Suppose the fitness of each species is independent of the populations of all the species. In other words, suppose each fitness f_i(P) is actually a constant, say f_i. Then the replicator equation reduces to

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

so it’s easy to solve:

P_i(t) = e^{t f_i} P_i(0)

You don’t need a detailed calculation to see what’s going to happen to the probabilities

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)}}

The most fit species present will eventually take over! If one species, say the ith one, has a fitness greater than the rest, then the population of this species will eventually grow faster than all the rest, at least if its population starts out greater than zero. So as t \to +\infty, we’ll have

p_i(t) \to 1

and

p_j(t) \to 0 \quad \mathrm{for} \quad j \ne i

Thus the probability distribution p will become more sharply peaked, and its entropy will eventually approach zero.

With a bit more thought you can see that even if more than one species shares the maximum possible fitness, the entropy will eventually decrease, though not approach zero.

In other words, the biodiversity will eventually drop as all but the most fit species are overwhelmed. Of course, this is only true in our simple idealization. In reality, biodiversity behaves in more complex ways—in part because species interact, and in part because mutation tends to smear out the probability distribution p_i. We’re not looking at these effects yet. They’re extremely important… in ways we can only fully understand if we start by looking at what happens when they’re not present.

In still other words, the population will absorb information from its environment. This should make intuitive sense: the process of natural selection resembles ‘learning’. As fitter organisms become more common and less fit ones die out, the environment puts its stamp on the probability distribution p. So, this probability distribution should gain information.

While intuitively clear, this last claim also follows more rigorously from thinking of entropy as negative information. Admittedly, it’s always easy to get confused by minus signs when relating entropy and information. A while back I said the entropy

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

was the average information required to say which species a randomly chosen organism belongs to. If this entropy is going down, isn’t the population losing information?

No, this is a classic sign error. It’s like the concept of ‘work’ in physics. We can talk about the work some system does on its environment, or the work done by the environment on the system, and these are almost the same… except one is minus the other!

When you are very ignorant about some system—say, some rolled dice—your estimated probabilities p_i for its various possible states are very smeared-out, so the entropy S(p) is large. As you gain information, you revise your probabilities and they typically become more sharply peaked, so S(p) goes down. When you know as much as you possibly can, S(p) equals zero.

So, the entropy S(p) is the amount of information you have left to learn: the amount of information you lack, not the amount you have. As you gain information, this goes down. There’s no paradox here.

It works the same way with our population of replicators—at least in the special case where the fitness of each species is independent of its population. The probability distribution p is like a ‘hypothesis’ assigning to each species i the probability p_i that it’s the best at self-replicating. As some replicators die off while others prosper, they gather information their environment, and this hypothesis gets refined. So, the entropy S(p) drops.

Next time

Of course, to make closer contact to reality, we need to go beyond the special case where the fitness of each species is a constant! Marc Harper does this, and I want to talk about his work someday, but first I have a few more remarks to make about the pathetically simple special case I’ve been focusing on. I’ll save these for next time, since I’ve probably strained your patience already.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers