The Mathematics of Biodiversity (Part 2)

24 June, 2012

How likely is it that the next thing we see is one of a brand new kind? That sounds like a hard question. Last time I told you about the Good–Turing rule for answering this question.

The discussion that blog entry triggered has been very helpful! Among other things, it got Lou Jost more interested in this subject. Two days ago, he showed me the following simple argument for the Good–Turing estimate.

Suppose there are finitely many species of orchid. Suppose the fraction of orchids belonging to the ith species is p_i.

Suppose we start collecting orchids. Suppose each time we find one, the chance that it’s an orchid of the ith species is p_i. Of course this is not true in reality! For example, it’s harder to find a tiny orchid, like this:

than a big one. But never mind.

Say we collect a total of N orchids. What is the probability that we find no orchids of the ith species? It is

(1 - p_i)^N

Similarly, the probability that we find exactly one orchid of the ith species is

N p_i (1 - p_i)^{N-1}

And so on: these are the first two terms in a binomial series.

Let n_1 be the expected number of singletons: species for which we find exactly one orchid of that species. Then

\displaystyle{ n_1 = \sum_i N p_i (1 - p_i)^{N-1} }

Let D be the coverage deficit: the expected fraction of the total population consisting of species that remain undiscovered. Given our assumptions, this is the same as the chance that the next orchid we find will be of a brand new species.

Then

\displaystyle{ D = \sum_i p_i (1-p_i)^N }

since p_i is the fraction of orchids belonging to the ith species and (1-p_i)^N is the chance that this species remains undiscovered.

Lou Jost pointed out that the formulas for n_1 and D are very similar! In particular,

\displaystyle{ \frac{n_1}{N} = \sum_i p_i (1 - p_i)^{N-1} }

should be very close to

\displaystyle{ D = \sum_i p_i (1 - p_i)^N }

when N is large. So, we should have

\displaystyle{ D \approx \frac{n_1}{N} }

In other words: the chance that the next orchid we find is of a brand new species should be close to the fraction of orchids that are singletons now.

Of course it would be nice to turn these ‘shoulds’ into precise theorems! Theorem 1 in this paper does that:

• David McAllester and Robert E. Schapire, On the convergence rate of Good–Turing estimators, February 17, 2000.

By the way: the only difference between the formulas for n_1/N and D is that the first contains the exponent N-1, while the second contains the exponent N. So, Lou Jost’s argument is a version of Boris Borcic’s ‘time-reversal’ idea:

Good’s estimate is what you immediately obtain if you time-reverse your sampling procedure, e.g., if you ask for the probability that there is a change in the number of species in your sample when you randomly remove a specimen from it.


The Mathematics of Biodiversity (Part 1)

21 June, 2012

I’m in Barcelona now, and I want to blog about this:

Research Program on the Mathematics of Biodiversity, June-July 2012, Centre de Recerca Matemàtica, Barcelona, Spain. Organized by Ben Allen, Silvia Cuadrado, Tom Leinster, Richard Reeve and John Woolliams.

We’re having daily informal talks and there’s no way I can blog about all of them, talk to people here, and still get enough work done. So, I’ll just mention a few things that strike me! For example, this morning Lou Jost told me about an interesting paper by I. J. Good.

I’d known of I. J. Good as one of the guys who came up with the concept of a ‘technological singularity’. In 1963 he wrote:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

He was a British mathematician who worked as a cryptologist at Bletchley Park with Alan Turing. After World War II, he continued to work with Turing on the design of computers and Bayesian statistics at the University of Manchester. Later he moved to the US. In 1968, thanks to his interest in artificial intelligence, he served as consultant for Stanley Kubrick’s film 2001: A Space Odyssey. He died in 2009.

Good was also a big chess enthusiast, and worked on writing programs to play chess. He’s the guy in front here:

If you want to learn more about his work on chess, click on this photo!

But the paper Lou Jost mentioned is on a rather different subject:

• Irving John Good, The population frequency of species and the estimation of population parameters, Biometrika 40 (1953), 237–264.

Let me just state one result, sloppily, without any details or precise hypotheses!

Puzzle: Suppose you go into the jungles of Ecuador and start collecting orchids. You count the number of orchids of each different species that you find. You get a list of numbers, something like this:

14, 10, 8, 6, 2, 1, 1, 1

What is the chance that the next orchid you find will belong to a new species?

Good gives a rule of thumb for solving problems of this type:

\displaystyle{ \frac{n_1}{N} }

Here N is the total number of orchid you collected, and n_i is the number of species for which you found exactly i orchids of that species. In our example,

n_1 = 3

since we found just one orchid of three different species: those are the three 1’s at the end of our list. Furthermore,

N = 14 + 10 + 8 + 6 + 2 + 1 + 1 = 42

So here is Good’s estimate the chance that the next orchid you collect will be of a new species:

\displaystyle{ \frac{n_1}{N} = \frac{3}{42} }

Good’s argument is nontrivial—and of course it depends on some assumptions on the nature of the distribution of populations of different species! Since he doesn’t state these assumptions succinctly and I haven’t read the paper carefully yet, I’m afraid you’ll have to read the paper to find out what they are.

Of course the math works for samples of anything that comes in distinct types, not just species of organisms! Good considers four examples:

• moths captured in a light-trap at Rothamsted, England,

• words in American newspapers,

• nouns in Macaulay’s essay on Bacon,

• chess openings in games published by the British Chess Magazine in 1951.

By comparing a small sample to a bigger one, he studies how well his rule works in practice, and apparently it does okay.

In his paper, I. J. Good thanks Alan Turing for coming up with the basic idea. In fact he says Turing gave an ‘intuitive demonstration’ of it—but he doesn’t give this intuitive demonstration, and according to Lou Jost he actually admits somewhere that he forgot it.

You can read more about the idea here:

Good–Turing frequency estimation.

By the way, Lou Jost is not only an expert on biodiversity and its relation to entropy! He lives in the jungles of Ecuador and has discovered over 60 new species of orchids, including the world’s smallest:

He found it in Ecuador, and the petals are just a few cells thick! (Typically, the news reports say he found it in Bolivia and the petals are just one cell thick.)

He said:

I found it among the roots of another plant that I had collected, another small orchid which I took back to grow in my greenhouse to get it to flower. A few months later I saw that down among the roots was a tiny little plant that I realised was more interesting than the bigger orchid. Looking at the flower is often the best way to be able to identify which species of orchid you’ve got hold of – and can tell you whether you’re looking at an unknown species or not.


Information Geometry (Part 11)

7 June, 2012

Last time we saw that given a bunch of different species of self-replicating entities, the entropy of their population distribution can go either up or down as time passes. This is true even in the pathetically simple case where all the replicators have constant fitness—so they don’t interact with each other, and don’t run into any ‘limits to growth’.

This is a bit of a bummer, since it would be nice to use entropy to explain how replicators are always extracting information from their environment, thanks to natural selection.

Luckily, a slight variant of entropy, called ‘relative entropy’, behaves better. When our replicators have an ‘evolutionary stable state’, the relative entropy is guaranteed to always change in the same direction as time passes!

Thanks to Einstein, we’ve all heard that times and distances are relative. But how is entropy relative?

It’s easy to understand if you think of entropy as lack of information. Say I have a coin hidden under my hand. I tell you it’s heads-up. How much information did I just give you? Maybe 1 bit? That’s true if you know it’s a fair coin and I flipped it fairly before covering it up with my hand. But what if you put the coin down there yourself a minute ago, heads up, and I just put my hand over it? Then I’ve given you no information at all. The difference is the choice of ‘prior’: that is, what probability distribution you attributed to the coin before I gave you my message.

My love affair with relative entropy began in college when my friend Bruce Smith and I read Hugh Everett’s thesis, The Relative State Formulation of Quantum Mechanics. This was the origin of what’s now often called the ‘many-worlds interpretation’ of quantum mechanics. But it also has a great introduction to relative entropy. Instead of talking about ‘many worlds’, I wish people would say that Everett explained some of the mysteries of quantum mechanics using the fact that entropy is relative.

Anyway, it’s nice to see relative entropy showing up in biology.

Relative Entropy

Inscribe an equilateral triangle in a circle. Randomly choose a line segment joining two points of this circle. What is the probability that this segment is longer than a side of the triangle?

This puzzle is called Bertrand’s paradox, because different ways of solving it give different answers. To crack the paradox, you need to realize that it’s meaningless to say you’ll “randomly” choose something until you say more about how you’re going to do it.

In other words, you can’t compute the probability of an event until you pick a recipe for computing probabilities. Such a recipe is called a probability measure.

This applies to computing entropy, too! The formula for entropy clearly involves a probability distribution, even when our set of events is finite:

S = - \sum_i p_i \ln(p_i)

But this formula conceals a fact that becomes obvious when our set of events is infinite. Now the sum becomes an integral:

S = - \int_X p(x) \ln(p(x)) \, d x

And now it’s clear that this formula makes no sense until we choose the measure d x. On a finite set we have a god-given choice of measure, called counting measure. Integrals with respect to this are just sums. But in general we don’t have such a god-given choice. And even for finite sets, working with counting measure is a choice: we are choosing to believe that in the absence of further evidence, all options are equally likely.

Taking this fact into account, it seems like we need two things to compute entropy: a probability distribution p(x), and a measure d x. That’s on the right track. But an even better way to think of it is this:

\displaystyle{ S = - \int_X  \frac{p(x) dx}{dx} \ln \left(\frac{p(x) dx}{dx}\right) \, dx }

Now we see the entropy depends two measures: the probability measure p(x)  dx we care about, but also the measure d x. Their ratio is important, but that’s not enough: we also need one of these measures to do the integral. Above I used the measure dx to do the integral, but we can also use p(x) dx if we write

\displaystyle{ S = - \int_X \ln \left(\frac{p(x) dx}{dx}\right) p(x) dx }

Either way, we are computing the entropy of one measure relative to another. So we might as well admit it, and talk about relative entropy.

The entropy of the measure d \mu relative to the measure d \nu is defined by:

\begin{array}{ccl} S(d \mu, d \nu) &=& \displaystyle{ - \int_X \frac{d \mu(x) }{d \nu(x)} \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right)  d\nu(x) } \\   \\  &=& \displaystyle{ - \int_X  \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right) d\mu(x) } \end{array}

The second formula is simpler, but the first looks more like summing -p \ln(p), so they’re both useful.

Since we’re taking entropy to be lack of information, we can also get rid of the minus sign and define relative information by

\begin{array}{ccl} I(d \mu, d \nu) &=& \displaystyle{ \int_X \frac{d \mu(x) }{d \nu(x)} \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right)  d\nu(x) } \\   \\  &=& \displaystyle{  \int_X  \ln \left(\frac{d \mu(x)}{ d\nu(x) }\right) d\mu(x) } \end{array}

If you thought something was randomly distributed according to the probability measure d \nu, but then you you discover it’s randomly distributed according to the probability measure d \mu, how much information have you gained? The answer is I(d\mu,d\nu).

For more on relative entropy, read Part 6 of this series. I gave some examples illustrating how it works. Those should convince you that it’s a useful concept.

Okay: now let’s switch back to a more lowbrow approach. In the case of a finite set, we can revert to thinking of our two measures as probability distributions, and write the information gain as

I(q,p) = \displaystyle{  \sum_i  \ln \left(\frac{q_i}{p_i }\right) q_i}

If you want to sound like a Bayesian, call p the prior probability distribution and q the posterior probability distribution. Whatever you call them, I(q,p) is the amount of information you get if you thought p and someone tells you “no, q!”

We’ll use this idea to think about how a population gains information about its environment as time goes by, thanks to natural selection. The rest of this post will be an exposition of Theorem 1 in this paper:

• Marc Harper, The replicator equation as an inference dynamic.

Harper says versions of this theorem ave previously appeared in work by Ethan Akin, and independently in work by Josef Hofbauer and Karl Sigmund. He also credits others here. An idea this good is rarely noticed by just one person.

The change in relative information

So: consider n different species of replicators. Let P_i be the population of the ith species, and assume these populations change according to the replicator equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each function f_i depends smoothly on all the populations. And as usual, we let

\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }

be the fraction of replicators in the ith species.

Let’s study the relative information I(q,p) where q is some fixed probability distribution. We’ll see something great happens when q is a stable equilibrium solution of the replicator equation. In this case, the relative information can never increase! It can only decrease or stay constant.

We’ll think about what all this means later. First, let’s see that it’s true! Remember,

\begin{array}{ccl} I(q,p) &=& \displaystyle{ \sum_i  \ln \left(\frac{q_i}{ p_i }\right) q_i }  \\ \\ &=&  \displaystyle{ \sum_i  \Big(\ln(q_i) - \ln(p_i) \Big) q_i } \end{array}

and only p_i depends on time, not q_i, so

\begin{array}{ccl} \displaystyle{ \frac{d}{dt} I(q,p)}  &=& \displaystyle{ - \frac{d}{dt} \sum_i \ln(p_i)  q_i }\\   \\  &=& \displaystyle{ - \sum_i \frac{\dot{p}_i}{p_i} \, q_i } \end{array}

where \dot{p}_i is the rate of change of the probability p_i. We saw a nice formula for this in Part 9:

\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle  \Big) \, p_i }

where

f_i(P) = f_i(P_1, \dots, P_n)

and

\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i  }

is the mean fitness of the species. So, we get

\displaystyle{ \frac{d}{dt} I(q,p) } = \displaystyle{ - \sum_i \Big( f_i(P) - \langle f(P) \rangle  \Big) \, q_i }

Nice, but we can fiddle with this expression to get something more enlightening. Remember, the numbers q_i sum to one. So:

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt} I(q,p) } &=&  \displaystyle{  \langle f(P) \rangle - \sum_i f_i(P) q_i  } \\  \\ &=& \displaystyle{  \sum_i f_i(P) (p_i - q_i)  }  \end{array}

where in the last step I used the definition of the mean fitness. This result looks even cuter if we treat the numbers f_i(P) as the components of a vector f(P), and similarly for the numbers p_i and q_i. Then we can use the dot product of vectors to say

\displaystyle{ \frac{d}{dt} I(q,p) = f(P) \cdot (p - q) }

So, the relative information I(q,p) will always decrease if

f(P) \cdot (p - q) \le 0

for all choices of the population P.

And now something really nice happens: this is also the condition for q to be an evolutionarily stable state. This concept goes back to John Maynard Smith, the founder of evolutionary game theory. In 1982 he wrote:

A population is said to be in an evolutionarily stable state if its genetic composition is restored by selection after a disturbance, provided the disturbance is not too large.

I will explain the math next time—I need to straighten out some things in my mind first. But the basic idea is compelling: an evolutionarily stable state is like a situation where our replicators ‘know all there is to know’ about the environment and each other. In any other state, the population has ‘something left to learn’—and the amount left to learn is the relative information we’ve been talking about! But as time goes on, the information still left to learn decreases!

Note: in the real world, nature has never found an evolutionarily stable state… except sometimes approximately, on sufficiently short time scales, in sufficiently small regions. So we are still talking about an idealization of reality! But that’s okay, as long as we know it.


Information Geometry (Part 10)

4 June, 2012

Last time I began explaining the tight relation between three concepts:

• entropy,

• information—or more precisely, lack of information,

and

• biodiversity.

The idea is to consider n different species of ‘replicators’. A replicator is any entity that can reproduce itself, like an organism, a gene, or a meme. A replicator can come in different kinds, and a ‘species’ is just our name for one of these kinds. If P_i is the population of the ith species, we can interpret the fraction

\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }

as a probability: the probability that a randomly chosen replicator belongs to the ith species. This suggests that we define entropy just as we do in statistical mechanics:

\displaystyle{ S = - \sum_i p_i \ln(p_i) }

In the study of statistical inference, entropy is a measure of uncertainty, or lack of information. But now we can interpret it as a measure of biodiversity: it’s zero when just one species is present, and small when a few species have much larger populations than all the rest, but gets big otherwise.

Our goal here is play these viewpoints off against each other. In short, we want to think of natural selection, and even biological evolution, as a process of statistical inference—or in simple terms, learning.

To do this, let’s think about how entropy changes with time. Last time we introduced a simple model called the replicator equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each population grows at a rate proportional to some ‘fitness functions’ f_i. We can get some intuition by looking at the pathetically simple case where these functions are actually constants, so

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

The equation then becomes trivial to solve:

\displaystyle{ P_i(t) = e^{t f_i } P_i(0)}

Last time I showed that in this case, the entropy will eventually decrease. It will go to zero as t \to +\infty whenever one species is fitter than all the rest and starts out with a nonzero population—since then this species will eventually take over.

But remember, the entropy of a probability distribution is its lack of information. So the decrease in entropy signals an increase in information. And last time I argued that this makes perfect sense. As the fittest species takes over and biodiversity drops, the population is acquiring information about its environment.

However, I never said the entropy is always decreasing, because that’s false! Even in this pathetically simple case, entropy can increase.

Suppose we start with many replicators belonging to one very unfit species, and a few belonging to various more fit species. The probability distribution p_i will start out sharply peaked, so the entropy will start out low:

Now think about what happens when time passes. At first the unfit species will rapidly die off, while the population of the other species slowly grows:

 

So the probability distribution will, for a while, become less sharply peaked. Thus, for a while, the entropy will increase!

This seems to conflict with our idea that the population’s entropy should decrease as it acquires information about its environment. But in fact this phenomenon is familiar in the study of statistical inference. If you start out with strongly held false beliefs about a situation, the first effect of learning more is to become less certain about what’s going on!

Get it? Say you start out by assigning a high probability to some wrong guess about a situation. The entropy of your probability distribution is low: you’re quite certain about what’s going on. But you’re wrong. When you first start suspecting you’re wrong, you become more uncertain about what’s going on. Your probability distribution flattens out, and the entropy goes up.

So, sometimes learning involves a decrease in information—false information. There’s nothing about the mathematical concept of information that says this information is true.

Given this, it’s good to work out a formula for the rate of change of entropy, which will let us see more clearly when it goes down and when it goes up. To do this, first let’s derive a completely general formula for the time derivative of the entropy of a probability distribution. Following Sir Isaac Newton, we’ll use a dot to stand for a time derivative:

\begin{array}{ccl} \displaystyle{  \dot{S}} &=& \displaystyle{ -  \frac{d}{dt} \sum_i p_i \ln (p_i)} \\   \\  &=& - \displaystyle{ \sum_i \dot{p}_i \ln (p_i) + \dot{p}_i }  \end{array}

In the last term we took the derivative of the logarithm and got a factor of 1/p_i which cancelled the factor of p_i. But since

\displaystyle{  \sum_i p_i = 1 }

we know

\displaystyle{ \sum_i \dot{p}_i = 0 }

so this last term vanishes:

\displaystyle{ \dot{S}= -\sum_i \dot{p}_i \ln (p_i) }

Nice! To go further, we need a formula for \dot{p}_i. For this we might as well return to the general replicator equation, dropping the pathetically special assumption that the fitness functions are actually constants. Then we saw last time that

\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle  \Big) \, p_i }

where we used the abbreviation

f_i(P) = f_i(P_1, \dots, P_n)

for the fitness of the ith species, and defined the mean fitness to be

\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i  }

Using this cute formula for \dot{p}_i, we get the final result:

\displaystyle{ \dot{S} = - \sum_i \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i \ln (p_i) }

This is strikingly similar to the formula for entropy itself. But now each term in the sum includes a factor saying how much more fit than average, or less fit, that species is. The quantity - p_i \ln(p_i) is always nonnegative, since the graph of -x \ln(x) looks like this:

So, the ith term contributes positively to the change in entropy if the ith species is fitter than average, but negatively if it’s less fit than average.

This may seem counterintuitive!

Puzzle 1. How can we reconcile this fact with our earlier observations about the case when the fitness of each species is population-independent? Namely: a) if initially most of the replicators belong to one very unfit species, the entropy will rise at first, but b) in the long run, when the fittest species present take over, the entropy drops?

If this seems too tricky, look at some examples! The first illustrates observation a); the second illustrates observation b):

Puzzle 2. Suppose we have two species, one with fitness equal to 1 initially constituting 90% of the population, the other with fitness equal to 10 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 1, & &  p_1(0) = 0.9 \\ \\                            f_2 = 10 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

Puzzle 3. Suppose we have two species, one with fitness equal to 10 initially constituting 90% of the population, and the other with fitness equal to 1 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 10, & &  p_1(0) = 0.9 \\ \\                            f_2 = 1 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

I had to work through these examples to understand what’s going on. Now I do, and it all makes sense.

Next time

Still, it would be nice if there were some quantity that always goes down with the passage of time, reflecting our naive idea that the population gains information from its environment, and thus loses entropy, as time goes by.

Often there is such a quantity. But it’s not the naive entropy: it’s the relative entropy. I’ll talk about that next time. In the meantime, if you want to prepare, please reread Part 6 of this series, where I explained this concept. Back then, I argued that whenever you’re tempted to talk about entropy, you should talk about relative entropy. So, we should try that here.

There’s a big idea lurking here: information is relative. How much information a signal gives you depends on your prior assumptions about what that signal is likely to be. If this is true, perhaps biodiversity is relative too.


Information Geometry (Part 9)

1 June, 2012


It’s time to continue this information geometry series, because I’ve promised to give the following talk at a conference on the mathematics of biodiversity in early July… and I still need to do some of the research!

Diversity, information geometry and learning

As is well known, some measures of biodiversity are formally identical to measures of information developed by Shannon and others. Furthermore, Marc Harper has shown that the replicator equation in evolutionary game theory is formally identical to a process of Bayesian inference, which is studied in the field of machine learning using ideas from information geometry. Thus, in this simple model, a population of organisms can be thought of as a ‘hypothesis’ about how to survive, and natural selection acts to update this hypothesis according to Bayes’ rule. The question thus arises to what extent natural changes in biodiversity can be usefully seen as analogous to a form of learning. However, some of the same mathematical structures arise in the study of chemical reaction networks, where the increase of entropy, or more precisely decrease of free energy, is not usually considered a form of ‘learning’. We report on some preliminary work on these issues.

So, let’s dive in! To some extent I’ll be explaining these two papers:

• Marc Harper, Information geometry and evolutionary game theory.

• Marc Harper, The replicator equation as an inference dynamic.

However, I hope to bring in some more ideas from physics, the study of biodiversity, and the theory of stochastic Petri nets, also known as chemical reaction networks. So, this series may start to overlap with my network theory posts. We’ll see. We won’t get far today: for now, I just want to review and expand on what we did last time.

The replicator equation

The replicator equation is a simplified model of how populations change. Suppose we have n types of self-replicating entity. I’ll call these entities replicators. I’ll call the types of replicators species, but they don’t need to be species in the biological sense. For example, the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains.

Let P_i(t), or just P_i for short, be the population of the ith species at time t. Then the replicator equation says

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

So, the population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it can be any smooth function f_i of the populations of all the species. We call f_i(P_1, \dots, P_n) the fitness of the ith species.

Of course this model is absurdly general, while still leaving out lots of important effects, like the spatial variation of populations, or the ability for the population of some species to start at zero and become nonzero—which happens thanks to mutation. Nonetheless this model is worth taking a good look at.

Using the magic of vectors we can write

P = (P_1, \dots , P_n)

and

f(P) = (f_1(P), \dots, f_n(P))

This lets us write the replicator equation a wee bit more tersely as

\displaystyle{ \frac{d P}{d t} = f(P) P}

where on the right I’m multiplying vectors componentwise, the way your teachers tried to brainwash you into never doing:

f(P) P = (f(P)_1 P_1, \dots, f(P)_n P_n)

In other words, I’m thinking of P and f(P) as functions on the set \{1, \dots, n\} and multiplying them pointwise. This will be a nice way of thinking if we want to replace this finite set by some more general space.

Why would we want to do that? Well, we might be studying lizards with different length tails, and we might find it convenient to think of the set of possible tail lengths as the half-line [0,\infty) instead of a finite set.

Or, just to get started, we might want to study the pathetically simple case where f(P) doesn’t depend on P. Then we just have a fixed function f and a time-dependent function P obeying

\displaystyle{ \frac{d P}{d t} = f P}

If we’re physicists, we might write P more suggestively as \psi and write the operator multiplying by f as - H. Then our equation becomes

\displaystyle{ \frac{d \psi}{d t} = - H \psi }

This looks a lot like Schrödinger’s equation, but since there’s no factor of \sqrt{-1}, and \psi is real-valued, it’s more like the heat equation or the ‘master equation’, the basic equation of stochastic mechanics.

For an explanation of Schrödinger’s equation and the master equation, try Part 12 of the network theory series. In that post I didn’t include a minus sign in front of the H. That’s no big deal: it’s just a different convention than the one I want today. A more serious issue is that in stochastic mechanics, \psi stands for a probability distribution. This suggests that we should get probabilities into the game somehow.

The replicator equation in terms of probabilities

Luckily, that’s exactly what people usually do! Instead of talking about the population P_i of the ith species, they talk about the probability p_i that one of our organisms will belong to the ith species. This amounts to normalizing our populations:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Don’t you love it when notations work out well? Our big Population P_i has gotten normalized to give little probability p_i.

How do these probabilities p_i change with time? Now is the moment for that least loved rule of elementary calculus to come out and take a bow: the quotient rule for derivatives!

\displaystyle{ \frac{d p_i}{d t} = \left(\frac{d P_i}{d t} \sum_j P_j \quad - \quad P_i \sum_j \frac{d P_j}{d t}\right) \big{/} \left(  \sum_j P_j \right)^2 }

Using our earlier version of the replicator equation, this gives:

\displaystyle{ \frac{d p_i}{d t} =  \left(f_i(P) P_i \sum_j P_j \quad - \quad P_i \sum_j f_j(P) P_j \right) \big{/} \left(  \sum_j P_j \right)^2 }

Using the definition of p_i, this simplifies to:

\displaystyle{ \frac{d p_i}{d t} =  f_i(P) p_i \quad - \quad \left( \sum_j f_j(P) p_j \right) p_i }

The stuff in parentheses actually has a nice meaning: it’s just the mean fitness. In other words, it’s the average, or expected, fitness of an organism chosen at random from the whole population. Let’s write it like this:

\displaystyle{ \langle f(P) \rangle = \sum_j f_j(P) p_j  }

So, we get the replicator equation in its classic form:

\displaystyle{ \frac{d p_i}{d t} = \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i }

This has a nice meaning: for the fraction of organisms of the ith type to increase, their fitness must exceed the mean fitness. If you’re trying to increase market share, what matters is not how good you are, but how much better than average you are. If everyone else is lousy, you’re in luck.

Entropy

Now for something a bit new. Once we’ve gotten a probability distribution into the game, its entropy is sure to follow:

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

This says how ‘smeared-out’ the overall population is among the various different species. Alternatively, it says how much information it takes, on average, to say which species a randomly chosen organism belongs to. For example, if there are 2^N species, all with equal populations, the entropy S works out to N \ln 2. So in this case, it takes N bits of information to say which species a randomly chosen organism belongs to.

In biology, entropy is one of many ways people measure biodiversity. For a quick intro to some of the issues involved, try:

• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.

• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.

But we don’t need to understand this stuff to see how entropy is connected to the replicator equation. Marc Harper’s paper explains this in detail:

• Marc Harper, The replicator equation as an inference dynamic.

and I hope to go through quite a bit of it here. But not today! Today I just want to look at a pathetically simple, yet still interesting, example.

Exponential growth

Suppose the fitness of each species is independent of the populations of all the species. In other words, suppose each fitness f_i(P) is actually a constant, say f_i. Then the replicator equation reduces to

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

so it’s easy to solve:

P_i(t) = e^{t f_i} P_i(0)

You don’t need a detailed calculation to see what’s going to happen to the probabilities

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)}}

The most fit species present will eventually take over! If one species, say the ith one, has a fitness greater than the rest, then the population of this species will eventually grow faster than all the rest, at least if its population starts out greater than zero. So as t \to +\infty, we’ll have

p_i(t) \to 1

and

p_j(t) \to 0 \quad \mathrm{for} \quad j \ne i

Thus the probability distribution p will become more sharply peaked, and its entropy will eventually approach zero.

With a bit more thought you can see that even if more than one species shares the maximum possible fitness, the entropy will eventually decrease, though not approach zero.

In other words, the biodiversity will eventually drop as all but the most fit species are overwhelmed. Of course, this is only true in our simple idealization. In reality, biodiversity behaves in more complex ways—in part because species interact, and in part because mutation tends to smear out the probability distribution p_i. We’re not looking at these effects yet. They’re extremely important… in ways we can only fully understand if we start by looking at what happens when they’re not present.

In still other words, the population will absorb information from its environment. This should make intuitive sense: the process of natural selection resembles ‘learning’. As fitter organisms become more common and less fit ones die out, the environment puts its stamp on the probability distribution p. So, this probability distribution should gain information.

While intuitively clear, this last claim also follows more rigorously from thinking of entropy as negative information. Admittedly, it’s always easy to get confused by minus signs when relating entropy and information. A while back I said the entropy

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

was the average information required to say which species a randomly chosen organism belongs to. If this entropy is going down, isn’t the population losing information?

No, this is a classic sign error. It’s like the concept of ‘work’ in physics. We can talk about the work some system does on its environment, or the work done by the environment on the system, and these are almost the same… except one is minus the other!

When you are very ignorant about some system—say, some rolled dice—your estimated probabilities p_i for its various possible states are very smeared-out, so the entropy S(p) is large. As you gain information, you revise your probabilities and they typically become more sharply peaked, so S(p) goes down. When you know as much as you possibly can, S(p) equals zero.

So, the entropy S(p) is the amount of information you have left to learn: the amount of information you lack, not the amount you have. As you gain information, this goes down. There’s no paradox here.

It works the same way with our population of replicators—at least in the special case where the fitness of each species is independent of its population. The probability distribution p is like a ‘hypothesis’ assigning to each species i the probability p_i that it’s the best at self-replicating. As some replicators die off while others prosper, they gather information their environment, and this hypothesis gets refined. So, the entropy S(p) drops.

Next time

Of course, to make closer contact to reality, we need to go beyond the special case where the fitness of each species is a constant! Marc Harper does this, and I want to talk about his work someday, but first I have a few more remarks to make about the pathetically simple special case I’ve been focusing on. I’ll save these for next time, since I’ve probably strained your patience already.


Dolphins and Manatees of Amazonia

11 March, 2012

No, these aren’t mermaids. They’re sirenians!

Sirenians or ‘sea cows’ are aquatic mammals found in four places in the world. The three places shown here are home to three species called ‘manatees’:

For example, the sirenians shown above are West Indian manatees, Trichechus manatus, which live in the Caribbean. There’s also a big region stretching from the western Pacific Ocean to the eastern coast of Africa that’s home to the ‘dugong’.

Right now there’s one different species of sirenian in each place. But once there were many more species, and it’s just been discovered that there often used to be several species living in the same place:

Multiple species of sea cows once coexisted, Science Daily, 8 March 2012.

The closest living relatives of the sirenians are elephants! They kind of look similar, no? More importantly, they share some unusual features. They keep growing new teeth throughout their life, molars that slowly move to the front of the mouth as the teeth in front wear out. And quite unlike cows, say, the females have two teats—located between their front limbs.

Here’s an evolutionary tree of sirenians:

You’ll see they got their start about 50 million years ago and blossomed in the late Oligocene, about 25 million years ago. Later the Earth got colder, and they gradually retreated to their present ranges.

You’ll also notice that three branches of the tree seem to reach the present day:

Trichechus, which includes all the manatees,

Dugong, which (surprise!) is the dugong… and

Hypodamilis, which is another name for Steller’s sea cow.

Steller’s sea cow was discovered in the North Pacific in 1741, and hunted to extinction shortly thereafter. Ouch! It took 24 million years of evolution to refine and polish the information in that species, and it was wiped out without trace in just 27 years.

The Amazonian manatee, Trichechus inunguis, is of special interest to me today because it lives in many branches of the Amazon river:

How did it get there? Why does it live in rivers? Its nearest living neighbor, the West Indian Manatee, likes coastal waters but can also go up rivers. Another clue might be the wonderful Amazon river dolphin, Inia geoffrensis.

It’s also called a pink dolphin. Here’s why:

Their are some interesting myths about it… one of which connects it with the manatee!

In traditional Amazon River folklore, at night, an Amazon river dolphin becomes a handsome young man who seduces girls, impregnates them, and then returns to the river in the morning to become a dolphin again. This dolphin shapeshifter is called an encantado. It has been suggested that the myth arose partly because dolphin genitalia bear a resemblance to those of humans. Others believe the myth served (and still serves) as a way of hiding the incestuous relations which are quite common in some small, isolated communities along the river. In the area, there are tales that it is bad luck to kill a dolphin. Legend also states that if a person makes eye contact with an Amazon river dolphin, he or she will have lifelong nightmares. Local legends also state that the dolphin is the guardian of the Amazonian manatee, and that, should one wish to find a manatee, one must first make peace with the dolphin.”

Indeed, the range of the Amazon river dolphin, shown here, is similar to that of the Amazonian manatee:


Dolphins and other cetaceans are not closely related to sirenians. Dolphins are carnivores, but sirenians only eat plants. But they both started as land-dwelling mammals, and both took to the seas at roughly the same time. And it seems the Amazon river dolphin became a river dweller around 15 million years ago. Why? As sea levels dropped, what once was an inland ocean in South America gradually turned into what’s now the Amazon! According to the Wikipedia article:

It seems this species separated from its oceanic relatives during the Miocene epoch. Sea levels were higher at that time, says biologist Healy Hamilton of the California Academy of Sciences in San Francisco, and large parts of South America, including the Amazon Basin, may have been flooded by shallow, more or less brackish water. When this inland sea retreated, Hamilton hypothesizes, the Amazon dolphins remained in the river basin…

So maybe the manatees did the same thing. I don’t know. But I find the idea of an inland sea gradually turning into a river-filled jungle, and life adapting to this change, very intriguing and romantic!

This shows what South America may have looked like during the early-middle Miocene, when the Amazon river dolphin was just getting its start. The upper Amazon Basin drained into the Orinoco Basin at left, while the the lower Amazon Basin drained directly to the Atlantic Ocean at fight. This is from a paper on megafans, which are huge regions covered with river sediment:

• M. Justin Wilkinson, Larry G. Marshall, and John G. Lundberg, River behavior on megafans and potential influences on diversification and distribution of aquatic organisms, Journal of South American Earth Sciences 21 (2006), 151–172.

Almost needless to say, we’ll need to work a bit to protect the dolphins and manatees of Amazonia if we want them to survive. Check out this Amazon river dolphin in action:

This guy is swimming in the Rio Negro, a large tributary of the Amazon. But there are also Amazon river dolphins in the Orinoco, another huge river in South America, not connected to the Amazon! You can see it just north of the Rio Negro:

Was it ever connected to the Amazon? If not, what’s the story about how the same species of dolphins live in both river basins?

By the way, my joke about mermaids comes from the etymology of the word ‘sirenian’. There’s a legend that lonely sailors—very lonely, it seems—mistook sea cows for mermaids, also known as ‘sirens’.


Azimuth on Google Plus (Part 5)

1 January, 2012

Happy New Year! I’m back from Laos. Here are seven items, mostly from the Azimuth Circle on Google Plus:

1) Phil Libin is the boss of a Silicon Valley startup. When he’s off travelling, he uses a telepresence robot to keep an eye on things. It looks like a stick figure on wheels. Its bulbous head has two eyes, which are actually a camera and a laser. On its forehead is a screen, where you can see Libin’s face. It’s made by a company called Anybots, and it costs just $15,000.


I predict that within my life we’ll be using things like this to radically cut travel costs and carbon emissions for business and for conferences. It seems weird now, but so did telephones. Future models will be better to look at. But let’s try it soon!

• Laura Sydell No excuses: robots put you in two places at once, Weekend Edition Saturday, 31 December 2011.

Bruce Bartlett and I are already planning for me to use telepresence to give a lecture on mathematics and the environment at Stellenbosch University in South Africa. But we’d been planning to use old-fashioned videoconferencing technology.

Anybots is located in Mountain View, California. That’s near Google’s main campus. Can anyone help me set up a talk on energy and the environment at Google, where I use an Anybot?

(Or, for that matter, anywhere else around there?)

2) A study claims to have found a correlation between weather and the day of the week! The claim is that there are more tornados and hailstorms in the eastern USA during weekdays. One possible mechanism could be that aerosols from car exhaust help seed clouds.


I make no claims that this study is correct. But at the very least, it would be interesting to examine their use of statistics and see if it’s convincing or flawed:

• Thomas Bell and Daniel Rosenfeld, Why do tornados and hailstorms rest on weekends?, Journal of Geophysical Research 116 (2011), D20211.

Abstract. This study shows for the first time statistical evidence that when anthropogenic aerosols over the eastern United States during summertime are at their weekly mid-week peak, tornado and hailstorm activity there is also near its weekly maximum. The weekly cycle in summertime storm activity for 1995–2009 was found to be statistically significant and unlikely to be due to natural variability. It correlates well with previously observed weekly cycles of other measures of storm activity. The pattern of variability supports the hypothesis that air pollution aerosols invigorate deep convective clouds in a moist, unstable atmosphere, to the extent of inducing production of large hailstones and tornados. This is caused by the effect of aerosols on cloud drop nucleation, making cloud drops smaller and hydrometeors larger. According to simulations, the larger ice hydrometeors contribute to more hail. The reduced evaporation from the larger hydrometeors produces weaker cold pools. Simulations have shown that too cold and fast-expanding pools inhibit the formation of tornados. The statistical observations suggest that this might be the mechanism by which the weekly modulation in pollution aerosols is causing the weekly cycle in severe convective storms during summer over the eastern United States. Although we focus here on the role of aerosols, they are not a primary atmospheric driver of tornados and hailstorms but rather modulate them in certain conditions.

Here’s a discussion of it:

• Bob Yirka, New research may explain why serious thunderstorms and tornados are less prevalent on the weekends, PhysOrg, 22 December 2011.

3) And if you like to check how people use statistics, here’s a paper that would be incredibly important if its findings were correct:

• Joseph J. Mangano and Janette D. Sherman, An unexpected mortality increase in the United States follows arrival of the radioactive plume from Fukushima: is there a correlation?, International Journal of Health Services 42 (2012), 47–64.

The title has a question mark in it, but it’s been cited in very dramatic terms in many places, for example this video entitled “Peer reviewed study shows 14,000 U.S. deaths from Fukushima”:

Starting at 1:31 you’ll see an interview with one of the paper’s authors, Janette Sherman.

14,000 deaths in the US due to Fukushima? Wow! How did they get that figure? This quote from the paper explains how:

During weeks 12 to 25 [after the Fukushima disaster began], total deaths in 119 U.S. cities increased from 148,395 (2010) to 155,015 (2011), or 4.46 percent. This was nearly double the 2.34 percent rise in total deaths (142,006 to 145,324) in 104 cities for the prior 14 weeks, significant at p < 0.000001 (Table 2). This difference between actual and expected changes of +2.12 percentage points (+4.46% – 2.34%) translates to 3,286 “excess” deaths (155,015 × 0.0212) nationwide. Assuming a total of 2,450,000 U.S. deaths will occur in 2011 (47,115 per week), then 23.5 percent of deaths are reported (155,015/14 = 11,073, or 23.5% of 47,115). Dividing 3,286 by 23.5 percent yields a projected 13,983 excess U.S. deaths in weeks 12 to 25 of 2011.

Hmm. Can you think of some potential problems with this analysis?

In the interview, Janette Sherman also mentions increased death rates of children in British Columbia. Here’s the evidence the paper presents for that:

Shortly after the report [another paper by the authors] was issued, officials from British Columbia, Canada, proximate to the northwestern United States, announced that 21 residents had died of sudden infant death syndrome (SIDS) in the first half of 2011, compared with 16 SIDS deaths in all of the prior year. Moreover, the number of deaths from SIDS rose from 1 to 10 in the months of March, April, May, and June 2011, after Fukushima fallout arrived, compared with the same period in 2010. While officials could not offer any explanation for the abrupt increase, it coincides with our findings in the Pacific Northwest.

4) For the first time in 87 years, a wild gray wolf was spotted in California:

• Stephen Messenger, First gray wolf in 80 years enters California, Treehugger, 29 December 2011.

Researchers have been tracking this juvenile male using a GPS-enabled collar since it departed northern Oregon. In just a few weeks, it walked some 730 miles to California. It was last seen surfing off Malibu. Here is a photograph:

5) George Musser left the Centre for Quantum Technologies and returned to New Jersey, but not before writing a nice blog article explaining how the GRACE satellite uses the Earth’s gravitational field to measure the melting of glaciers:

• George Musser, Melting glaciers muck up Earth’s gravitational field, Scientific American, 22 December 2011.

6) The American Physical Society has started a new group: a Topical Group on the Physics of Climate! If you’re a member of the APS, and care about climate issues, you should join this.

7) Finally, here’s a cool picture taken in the Gulf of Alaska by Kent Smith:

He believes this was caused by fresher water meeting more salty water, but it doesn’t sounds like he’s sure. Can anyone figure out what’s going on? The foam where the waters meet is especially intriguing.


Follow

Get every new post delivered to your Inbox.

Join 3,094 other followers