It’s time to continue this information geometry series, because I’ve promised to give the following talk at a conference on the mathematics of biodiversity in early July… and I still need to do some of the research! 
Diversity, information geometry and learning
As is well known, some measures of biodiversity are formally identical to measures of information developed by Shannon and others. Furthermore, Marc Harper has shown that the replicator equation in evolutionary game theory is formally identical to a process of Bayesian inference, which is studied in the field of machine learning using ideas from information geometry. Thus, in this simple model, a population of organisms can be thought of as a ‘hypothesis’ about how to survive, and natural selection acts to update this hypothesis according to Bayes’ rule. The question thus arises to what extent natural changes in biodiversity can be usefully seen as analogous to a form of learning. However, some of the same mathematical structures arise in the study of chemical reaction networks, where the increase of entropy, or more precisely decrease of free energy, is not usually considered a form of ‘learning’. We report on some preliminary work on these issues.
So, let’s dive in! To some extent I’ll be explaining these two papers:
• Marc Harper, Information geometry and evolutionary game theory.
• Marc Harper, The replicator equation as an inference dynamic.
However, I hope to bring in some more ideas from physics, the study of biodiversity, and the theory of stochastic Petri nets, also known as chemical reaction networks. So, this series may start to overlap with my network theory posts. We’ll see. We won’t get far today: for now, I just want to review and expand on what we did last time.
The replicator equation
The replicator equation is a simplified model of how populations change. Suppose we have
types of self-replicating entity. I’ll call these entities replicators. I’ll call the types of replicators species, but they don’t need to be species in the biological sense. For example, the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains.
Let
or just
for short, be the population of the
th species at time
Then the replicator equation says

So, the population
changes at a rate proportional to
but the ‘constant of proportionality’ need not be constant: it can be any smooth function
of the populations of all the species. We call
the fitness of the
th species.
Of course this model is absurdly general, while still leaving out lots of important effects, like the spatial variation of populations, or the ability for the population of some species to start at zero and become nonzero—which happens thanks to mutation. Nonetheless this model is worth taking a good look at.
Using the magic of vectors we can write

and

This lets us write the replicator equation a wee bit more tersely as

where on the right I’m multiplying vectors componentwise, the way your teachers tried to brainwash you into never doing:

In other words, I’m thinking of
and
as functions on the set
and multiplying them pointwise. This will be a nice way of thinking if we want to replace this finite set by some more general space.
Why would we want to do that? Well, we might be studying lizards with different length tails, and we might find it convenient to think of the set of possible tail lengths as the half-line
instead of a finite set.
Or, just to get started, we might want to study the pathetically simple case where
doesn’t depend on
Then we just have a fixed function
and a time-dependent function
obeying

If we’re physicists, we might write
more suggestively as
and write the operator multiplying by
as
Then our equation becomes

This looks a lot like Schrödinger’s equation, but since there’s no factor of
and
is real-valued, it’s more like the heat equation or the ‘master equation’, the basic equation of stochastic mechanics.
For an explanation of Schrödinger’s equation and the master equation, try Part 12 of the network theory series. In that post I didn’t include a minus sign in front of the
That’s no big deal: it’s just a different convention than the one I want today. A more serious issue is that in stochastic mechanics,
stands for a probability distribution. This suggests that we should get probabilities into the game somehow.
The replicator equation in terms of probabilities
Luckily, that’s exactly what people usually do! Instead of talking about the population
of the
th species, they talk about the probability
that one of our organisms will belong to the
th species. This amounts to normalizing our populations:

Don’t you love it when notations work out well? Our big Population
has gotten normalized to give little probability 
How do these probabilities
change with time? Now is the moment for that least loved rule of elementary calculus to come out and take a bow: the quotient rule for derivatives!

Using our earlier version of the replicator equation, this gives:

Using the definition of
this simplifies to:

The stuff in parentheses actually has a nice meaning: it’s just the mean fitness. In other words, it’s the average, or expected, fitness of an organism chosen at random from the whole population. Let’s write it like this:

So, we get the replicator equation in its classic form:

This has a nice meaning: for the fraction of organisms of the
th type to increase, their fitness must exceed the mean fitness. If you’re trying to increase market share, what matters is not how good you are, but how much better than average you are. If everyone else is lousy, you’re in luck.
Entropy
Now for something a bit new. Once we’ve gotten a probability distribution into the game, its entropy is sure to follow:

This says how ‘smeared-out’ the overall population is among the various different species. Alternatively, it says how much information it takes, on average, to say which species a randomly chosen organism belongs to. For example, if there are
species, all with equal populations, the entropy
works out to
So in this case, it takes
bits of information to say which species a randomly chosen organism belongs to.
In biology, entropy is one of many ways people measure biodiversity. For a quick intro to some of the issues involved, try:
• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.
• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.
But we don’t need to understand this stuff to see how entropy is connected to the replicator equation. Marc Harper’s paper explains this in detail:
• Marc Harper, The replicator equation as an inference dynamic.
and I hope to go through quite a bit of it here. But not today! Today I just want to look at a pathetically simple, yet still interesting, example.
Exponential growth
Suppose the fitness of each species is independent of the populations of all the species. In other words, suppose each fitness
is actually a constant, say
Then the replicator equation reduces to

so it’s easy to solve:

You don’t need a detailed calculation to see what’s going to happen to the probabilities

The most fit species present will eventually take over! If one species, say the
th one, has a fitness greater than the rest, then the population of this species will eventually grow faster than all the rest, at least if its population starts out greater than zero. So as
we’ll have

and

Thus the probability distribution
will become more sharply peaked, and its entropy will eventually approach zero.
With a bit more thought you can see that even if more than one species shares the maximum possible fitness, the entropy will eventually decrease, though not approach zero.
In other words, the biodiversity will eventually drop as all but the most fit species are overwhelmed. Of course, this is only true in our simple idealization. In reality, biodiversity behaves in more complex ways—in part because species interact, and in part because mutation tends to smear out the probability distribution
We’re not looking at these effects yet. They’re extremely important… in ways we can only fully understand if we start by looking at what happens when they’re not present.
In still other words, the population will absorb information from its environment. This should make intuitive sense: the process of natural selection resembles ‘learning’. As fitter organisms become more common and less fit ones die out, the environment puts its stamp on the probability distribution
So, this probability distribution should gain information.
While intuitively clear, this last claim also follows more rigorously from thinking of entropy as negative information. Admittedly, it’s always easy to get confused by minus signs when relating entropy and information. A while back I said the entropy

was the average information required to say which species a randomly chosen organism belongs to. If this entropy is going down, isn’t the population losing information?
No, this is a classic sign error. It’s like the concept of ‘work’ in physics. We can talk about the work some system does on its environment, or the work done by the environment on the system, and these are almost the same… except one is minus the other!
When you are very ignorant about some system—say, some rolled dice—your estimated probabilities
for its various possible states are very smeared-out, so the entropy
is large. As you gain information, you revise your probabilities and they typically become more sharply peaked, so
goes down. When you know as much as you possibly can,
equals zero.
So, the entropy
is the amount of information you have left to learn: the amount of information you lack, not the amount you have. As you gain information, this goes down. There’s no paradox here.
It works the same way with our population of replicators—at least in the special case where the fitness of each species is independent of its population. The probability distribution
is like a ‘hypothesis’ assigning to each species
the probability
that it’s the best at self-replicating. As some replicators die off while others prosper, they gather information their environment, and this hypothesis gets refined. So, the entropy
drops.
Next time
Of course, to make closer contact to reality, we need to go beyond the special case where the fitness of each species is a constant! Marc Harper does this, and I want to talk about his work someday, but first I have a few more remarks to make about the pathetically simple special case I’ve been focusing on. I’ll save these for next time, since I’ve probably strained your patience already.