## Information Geometry (Part 10)

4 June, 2012

Last time I began explaining the tight relation between three concepts:

• entropy,

• information—or more precisely, lack of information,

and

• biodiversity.

The idea is to consider $n$ different species of ‘replicators’. A replicator is any entity that can reproduce itself, like an organism, a gene, or a meme. A replicator can come in different kinds, and a ‘species’ is just our name for one of these kinds. If $P_i$ is the population of the $i$th species, we can interpret the fraction

$\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }$

as a probability: the probability that a randomly chosen replicator belongs to the $i$th species. This suggests that we define entropy just as we do in statistical mechanics:

$\displaystyle{ S = - \sum_i p_i \ln(p_i) }$

In the study of statistical inference, entropy is a measure of uncertainty, or lack of information. But now we can interpret it as a measure of biodiversity: it’s zero when just one species is present, and small when a few species have much larger populations than all the rest, but gets big otherwise.

Our goal here is play these viewpoints off against each other. In short, we want to think of natural selection, and even biological evolution, as a process of statistical inference—or in simple terms, learning.

To do this, let’s think about how entropy changes with time. Last time we introduced a simple model called the replicator equation:

$\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }$

where each population grows at a rate proportional to some ‘fitness functions’ $f_i$. We can get some intuition by looking at the pathetically simple case where these functions are actually constants, so

$\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }$

The equation then becomes trivial to solve:

$\displaystyle{ P_i(t) = e^{t f_i } P_i(0)}$

Last time I showed that in this case, the entropy will eventually decrease. It will go to zero as $t \to +\infty$ whenever one species is fitter than all the rest and starts out with a nonzero population—since then this species will eventually take over.

But remember, the entropy of a probability distribution is its lack of information. So the decrease in entropy signals an increase in information. And last time I argued that this makes perfect sense. As the fittest species takes over and biodiversity drops, the population is acquiring information about its environment.

However, I never said the entropy is always decreasing, because that’s false! Even in this pathetically simple case, entropy can increase.

Suppose we start with many replicators belonging to one very unfit species, and a few belonging to various more fit species. The probability distribution $p_i$ will start out sharply peaked, so the entropy will start out low:

Now think about what happens when time passes. At first the unfit species will rapidly die off, while the population of the other species slowly grows:

So the probability distribution will, for a while, become less sharply peaked. Thus, for a while, the entropy will increase!

This seems to conflict with our idea that the population’s entropy should decrease as it acquires information about its environment. But in fact this phenomenon is familiar in the study of statistical inference. If you start out with strongly held false beliefs about a situation, the first effect of learning more is to become less certain about what’s going on!

Get it? Say you start out by assigning a high probability to some wrong guess about a situation. The entropy of your probability distribution is low: you’re quite certain about what’s going on. But you’re wrong. When you first start suspecting you’re wrong, you become more uncertain about what’s going on. Your probability distribution flattens out, and the entropy goes up.

So, sometimes learning involves a decrease in information—false information. There’s nothing about the mathematical concept of information that says this information is true.

Given this, it’s good to work out a formula for the rate of change of entropy, which will let us see more clearly when it goes down and when it goes up. To do this, first let’s derive a completely general formula for the time derivative of the entropy of a probability distribution. Following Sir Isaac Newton, we’ll use a dot to stand for a time derivative:

$\begin{array}{ccl} \displaystyle{ \dot{S}} &=& \displaystyle{ - \frac{d}{dt} \sum_i p_i \ln (p_i)} \\ \\ &=& - \displaystyle{ \sum_i \dot{p}_i \ln (p_i) + \dot{p}_i } \end{array}$

In the last term we took the derivative of the logarithm and got a factor of $1/p_i$ which cancelled the factor of $p_i$. But since

$\displaystyle{ \sum_i p_i = 1 }$

we know

$\displaystyle{ \sum_i \dot{p}_i = 0 }$

so this last term vanishes:

$\displaystyle{ \dot{S}= -\sum_i \dot{p}_i \ln (p_i) }$

Nice! To go further, we need a formula for $\dot{p}_i$. For this we might as well return to the general replicator equation, dropping the pathetically special assumption that the fitness functions are actually constants. Then we saw last time that

$\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i }$

where we used the abbreviation

$f_i(P) = f_i(P_1, \dots, P_n)$

for the fitness of the $i$th species, and defined the mean fitness to be

$\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i }$

Using this cute formula for $\dot{p}_i$, we get the final result:

$\displaystyle{ \dot{S} = - \sum_i \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i \ln (p_i) }$

This is strikingly similar to the formula for entropy itself. But now each term in the sum includes a factor saying how much more fit than average, or less fit, that species is. The quantity $- p_i \ln(p_i)$ is always nonnegative, since the graph of $-x \ln(x)$ looks like this:

So, the $i$th term contributes positively to the change in entropy if the $i$th species is fitter than average, but negatively if it’s less fit than average.

This may seem counterintuitive!

Puzzle 1. How can we reconcile this fact with our earlier observations about the case when the fitness of each species is population-independent? Namely: a) if initially most of the replicators belong to one very unfit species, the entropy will rise at first, but b) in the long run, when the fittest species present take over, the entropy drops?

If this seems too tricky, look at some examples! The first illustrates observation a); the second illustrates observation b):

Puzzle 2. Suppose we have two species, one with fitness equal to 1 initially constituting 90% of the population, the other with fitness equal to 10 initially constituting just 10% of the population:

$\begin{array}{ccc} f_1 = 1, & & p_1(0) = 0.9 \\ \\ f_2 = 10 , & & p_2(0) = 0.1 \end{array}$

At what rate does the entropy change at $t = 0$? Which species is responsible for most of this change?

Puzzle 3. Suppose we have two species, one with fitness equal to 10 initially constituting 90% of the population, and the other with fitness equal to 1 initially constituting just 10% of the population:

$\begin{array}{ccc} f_1 = 10, & & p_1(0) = 0.9 \\ \\ f_2 = 1 , & & p_2(0) = 0.1 \end{array}$

At what rate does the entropy change at $t = 0$? Which species is responsible for most of this change?

I had to work through these examples to understand what’s going on. Now I do, and it all makes sense.

### Next time

Still, it would be nice if there were some quantity that always goes down with the passage of time, reflecting our naive idea that the population gains information from its environment, and thus loses entropy, as time goes by.

Often there is such a quantity. But it’s not the naive entropy: it’s the relative entropy. I’ll talk about that next time. In the meantime, if you want to prepare, please reread Part 6 of this series, where I explained this concept. Back then, I argued that whenever you’re tempted to talk about entropy, you should talk about relative entropy. So, we should try that here.

There’s a big idea lurking here: information is relative. How much information a signal gives you depends on your prior assumptions about what that signal is likely to be. If this is true, perhaps biodiversity is relative too.

## Information Geometry (Part 9)

1 June, 2012

It’s time to continue this information geometry series, because I’ve promised to give the following talk at a conference on the mathematics of biodiversity in early July… and I still need to do some of the research!

#### Diversity, information geometry and learning

As is well known, some measures of biodiversity are formally identical to measures of information developed by Shannon and others. Furthermore, Marc Harper has shown that the replicator equation in evolutionary game theory is formally identical to a process of Bayesian inference, which is studied in the field of machine learning using ideas from information geometry. Thus, in this simple model, a population of organisms can be thought of as a ‘hypothesis’ about how to survive, and natural selection acts to update this hypothesis according to Bayes’ rule. The question thus arises to what extent natural changes in biodiversity can be usefully seen as analogous to a form of learning. However, some of the same mathematical structures arise in the study of chemical reaction networks, where the increase of entropy, or more precisely decrease of free energy, is not usually considered a form of ‘learning’. We report on some preliminary work on these issues.

So, let’s dive in! To some extent I’ll be explaining these two papers:

• Marc Harper, Information geometry and evolutionary game theory.

• Marc Harper, The replicator equation as an inference dynamic.

However, I hope to bring in some more ideas from physics, the study of biodiversity, and the theory of stochastic Petri nets, also known as chemical reaction networks. So, this series may start to overlap with my network theory posts. We’ll see. We won’t get far today: for now, I just want to review and expand on what we did last time.

### The replicator equation

The replicator equation is a simplified model of how populations change. Suppose we have $n$ types of self-replicating entity. I’ll call these entities replicators. I’ll call the types of replicators species, but they don’t need to be species in the biological sense. For example, the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains.

Let $P_i(t),$ or just $P_i$ for short, be the population of the $i$th species at time $t.$ Then the replicator equation says

$\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }$

So, the population $P_i$ changes at a rate proportional to $P_i,$ but the ‘constant of proportionality’ need not be constant: it can be any smooth function $f_i$ of the populations of all the species. We call $f_i(P_1, \dots, P_n)$ the fitness of the $i$th species.

Of course this model is absurdly general, while still leaving out lots of important effects, like the spatial variation of populations, or the ability for the population of some species to start at zero and become nonzero—which happens thanks to mutation. Nonetheless this model is worth taking a good look at.

Using the magic of vectors we can write

$P = (P_1, \dots , P_n)$

and

$f(P) = (f_1(P), \dots, f_n(P))$

This lets us write the replicator equation a wee bit more tersely as

$\displaystyle{ \frac{d P}{d t} = f(P) P}$

where on the right I’m multiplying vectors componentwise, the way your teachers tried to brainwash you into never doing:

$f(P) P = (f(P)_1 P_1, \dots, f(P)_n P_n)$

In other words, I’m thinking of $P$ and $f(P)$ as functions on the set $\{1, \dots, n\}$ and multiplying them pointwise. This will be a nice way of thinking if we want to replace this finite set by some more general space.

Why would we want to do that? Well, we might be studying lizards with different length tails, and we might find it convenient to think of the set of possible tail lengths as the half-line $[0,\infty)$ instead of a finite set.

Or, just to get started, we might want to study the pathetically simple case where $f(P)$ doesn’t depend on $P.$ Then we just have a fixed function $f$ and a time-dependent function $P$ obeying

$\displaystyle{ \frac{d P}{d t} = f P}$

If we’re physicists, we might write $P$ more suggestively as $\psi$ and write the operator multiplying by $f$ as $- H.$ Then our equation becomes

$\displaystyle{ \frac{d \psi}{d t} = - H \psi }$

This looks a lot like Schrödinger’s equation, but since there’s no factor of $\sqrt{-1},$ and $\psi$ is real-valued, it’s more like the heat equation or the ‘master equation’, the basic equation of stochastic mechanics.

For an explanation of Schrödinger’s equation and the master equation, try Part 12 of the network theory series. In that post I didn’t include a minus sign in front of the $H.$ That’s no big deal: it’s just a different convention than the one I want today. A more serious issue is that in stochastic mechanics, $\psi$ stands for a probability distribution. This suggests that we should get probabilities into the game somehow.

### The replicator equation in terms of probabilities

Luckily, that’s exactly what people usually do! Instead of talking about the population $P_i$ of the $i$th species, they talk about the probability $p_i$ that one of our organisms will belong to the $i$th species. This amounts to normalizing our populations:

$\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }$

Don’t you love it when notations work out well? Our big Population $P_i$ has gotten normalized to give little probability $p_i.$

How do these probabilities $p_i$ change with time? Now is the moment for that least loved rule of elementary calculus to come out and take a bow: the quotient rule for derivatives!

$\displaystyle{ \frac{d p_i}{d t} = \left(\frac{d P_i}{d t} \sum_j P_j \quad - \quad P_i \sum_j \frac{d P_j}{d t}\right) \big{/} \left( \sum_j P_j \right)^2 }$

Using our earlier version of the replicator equation, this gives:

$\displaystyle{ \frac{d p_i}{d t} = \left(f_i(P) P_i \sum_j P_j \quad - \quad P_i \sum_j f_j(P) P_j \right) \big{/} \left( \sum_j P_j \right)^2 }$

Using the definition of $p_i,$ this simplifies to:

$\displaystyle{ \frac{d p_i}{d t} = f_i(P) p_i \quad - \quad \left( \sum_j f_j(P) p_j \right) p_i }$

The stuff in parentheses actually has a nice meaning: it’s just the mean fitness. In other words, it’s the average, or expected, fitness of an organism chosen at random from the whole population. Let’s write it like this:

$\displaystyle{ \langle f(P) \rangle = \sum_j f_j(P) p_j }$

So, we get the replicator equation in its classic form:

$\displaystyle{ \frac{d p_i}{d t} = \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i }$

This has a nice meaning: for the fraction of organisms of the $i$th type to increase, their fitness must exceed the mean fitness. If you’re trying to increase market share, what matters is not how good you are, but how much better than average you are. If everyone else is lousy, you’re in luck.

### Entropy

Now for something a bit new. Once we’ve gotten a probability distribution into the game, its entropy is sure to follow:

$\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }$

This says how ‘smeared-out’ the overall population is among the various different species. Alternatively, it says how much information it takes, on average, to say which species a randomly chosen organism belongs to. For example, if there are $2^N$ species, all with equal populations, the entropy $S$ works out to $N \ln 2.$ So in this case, it takes $N$ bits of information to say which species a randomly chosen organism belongs to.

In biology, entropy is one of many ways people measure biodiversity. For a quick intro to some of the issues involved, try:

• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.

• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.

But we don’t need to understand this stuff to see how entropy is connected to the replicator equation. Marc Harper’s paper explains this in detail:

• Marc Harper, The replicator equation as an inference dynamic.

and I hope to go through quite a bit of it here. But not today! Today I just want to look at a pathetically simple, yet still interesting, example.

### Exponential growth

Suppose the fitness of each species is independent of the populations of all the species. In other words, suppose each fitness $f_i(P)$ is actually a constant, say $f_i.$ Then the replicator equation reduces to

$\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }$

so it’s easy to solve:

$P_i(t) = e^{t f_i} P_i(0)$

You don’t need a detailed calculation to see what’s going to happen to the probabilities

$\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)}}$

The most fit species present will eventually take over! If one species, say the $i$th one, has a fitness greater than the rest, then the population of this species will eventually grow faster than all the rest, at least if its population starts out greater than zero. So as $t \to +\infty,$ we’ll have

$p_i(t) \to 1$

and

$p_j(t) \to 0 \quad \mathrm{for} \quad j \ne i$

Thus the probability distribution $p$ will become more sharply peaked, and its entropy will eventually approach zero.

With a bit more thought you can see that even if more than one species shares the maximum possible fitness, the entropy will eventually decrease, though not approach zero.

In other words, the biodiversity will eventually drop as all but the most fit species are overwhelmed. Of course, this is only true in our simple idealization. In reality, biodiversity behaves in more complex ways&mdash;in part because species interact, and in part because mutation tends to smear out the probability distribution $p_i.$ We’re not looking at these effects yet. They’re extremely important… in ways we can only fully understand if we start by looking at what happens when they’re not present.

In still other words, the population will absorb information from its environment. This should make intuitive sense: the process of natural selection resembles ‘learning’. As fitter organisms become more common and less fit ones die out, the environment puts its stamp on the probability distribution $p.$ So, this probability distribution should gain information.

While intuitively clear, this last claim also follows more rigorously from thinking of entropy as negative information. Admittedly, it’s always easy to get confused by minus signs when relating entropy and information. A while back I said the entropy

$\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }$

was the average information required to say which species a randomly chosen organism belongs to. If this entropy is going down, isn’t the population losing information?

No, this is a classic sign error. It’s like the concept of ‘work’ in physics. We can talk about the work some system does on its environment, or the work done by the environment on the system, and these are almost the same… except one is minus the other!

When you are very ignorant about some system—say, some rolled dice—your estimated probabilities $p_i$ for its various possible states are very smeared-out, so the entropy $S(p)$ is large. As you gain information, you revise your probabilities and they typically become more sharply peaked, so $S(p)$ goes down. When you know as much as you possibly can, $S(p)$ equals zero.

So, the entropy $S(p)$ is the amount of information you have left to learn: the amount of information you lack, not the amount you have. As you gain information, this goes down. There’s no paradox here.

It works the same way with our population of replicators—at least in the special case where the fitness of each species is independent of its population. The probability distribution $p$ is like a ‘hypothesis’ assigning to each species $i$ the probability $p_i$ that it’s the best at self-replicating. As some replicators die off while others prosper, they gather information their environment, and this hypothesis gets refined. So, the entropy $S(p)$ drops.

### Next time

Of course, to make closer contact to reality, we need to go beyond the special case where the fitness of each species is a constant! Marc Harper does this, and I want to talk about his work someday, but first I have a few more remarks to make about the pathetically simple special case I’ve been focusing on. I’ll save these for next time, since I’ve probably strained your patience already.

## Dolphins and Manatees of Amazonia

11 March, 2012

No, these aren’t mermaids. They’re sirenians!

Sirenians or ‘sea cows’ are aquatic mammals found in four places in the world. The three places shown here are home to three species called ‘manatees’:

For example, the sirenians shown above are West Indian manatees, Trichechus manatus, which live in the Caribbean. There’s also a big region stretching from the western Pacific Ocean to the eastern coast of Africa that’s home to the ‘dugong’.

Right now there’s one different species of sirenian in each place. But once there were many more species, and it’s just been discovered that there often used to be several species living in the same place:

Multiple species of sea cows once coexisted, Science Daily, 8 March 2012.

The closest living relatives of the sirenians are elephants! They kind of look similar, no? More importantly, they share some unusual features. They keep growing new teeth throughout their life, molars that slowly move to the front of the mouth as the teeth in front wear out. And quite unlike cows, say, the females have two teats—located between their front limbs.

Here’s an evolutionary tree of sirenians:

You’ll see they got their start about 50 million years ago and blossomed in the late Oligocene, about 25 million years ago. Later the Earth got colder, and they gradually retreated to their present ranges.

You’ll also notice that three branches of the tree seem to reach the present day:

Trichechus, which includes all the manatees,

Dugong, which (surprise!) is the dugong… and

Hypodamilis, which is another name for Steller’s sea cow.

Steller’s sea cow was discovered in the North Pacific in 1741, and hunted to extinction shortly thereafter. Ouch! It took 24 million years of evolution to refine and polish the information in that species, and it was wiped out without trace in just 27 years.

The Amazonian manatee, Trichechus inunguis, is of special interest to me today because it lives in many branches of the Amazon river:

How did it get there? Why does it live in rivers? Its nearest living neighbor, the West Indian Manatee, likes coastal waters but can also go up rivers. Another clue might be the wonderful Amazon river dolphin, Inia geoffrensis.

It’s also called a pink dolphin. Here’s why:

Their are some interesting myths about it… one of which connects it with the manatee!

In traditional Amazon River folklore, at night, an Amazon river dolphin becomes a handsome young man who seduces girls, impregnates them, and then returns to the river in the morning to become a dolphin again. This dolphin shapeshifter is called an encantado. It has been suggested that the myth arose partly because dolphin genitalia bear a resemblance to those of humans. Others believe the myth served (and still serves) as a way of hiding the incestuous relations which are quite common in some small, isolated communities along the river. In the area, there are tales that it is bad luck to kill a dolphin. Legend also states that if a person makes eye contact with an Amazon river dolphin, he or she will have lifelong nightmares. Local legends also state that the dolphin is the guardian of the Amazonian manatee, and that, should one wish to find a manatee, one must first make peace with the dolphin.”

Indeed, the range of the Amazon river dolphin, shown here, is similar to that of the Amazonian manatee:

Dolphins and other cetaceans are not closely related to sirenians. Dolphins are carnivores, but sirenians only eat plants. But they both started as land-dwelling mammals, and both took to the seas at roughly the same time. And it seems the Amazon river dolphin became a river dweller around 15 million years ago. Why? As sea levels dropped, what once was an inland ocean in South America gradually turned into what’s now the Amazon! According to the Wikipedia article:

It seems this species separated from its oceanic relatives during the Miocene epoch. Sea levels were higher at that time, says biologist Healy Hamilton of the California Academy of Sciences in San Francisco, and large parts of South America, including the Amazon Basin, may have been flooded by shallow, more or less brackish water. When this inland sea retreated, Hamilton hypothesizes, the Amazon dolphins remained in the river basin…

So maybe the manatees did the same thing. I don’t know. But I find the idea of an inland sea gradually turning into a river-filled jungle, and life adapting to this change, very intriguing and romantic!

This shows what South America may have looked like during the early-middle Miocene, when the Amazon river dolphin was just getting its start. The upper Amazon Basin drained into the Orinoco Basin at left, while the the lower Amazon Basin drained directly to the Atlantic Ocean at fight. This is from a paper on megafans, which are huge regions covered with river sediment:

• M. Justin Wilkinson, Larry G. Marshall, and John G. Lundberg, River behavior on megafans and potential influences on diversification and distribution of aquatic organisms, Journal of South American Earth Sciences 21 (2006), 151–172.

Almost needless to say, we’ll need to work a bit to protect the dolphins and manatees of Amazonia if we want them to survive. Check out this Amazon river dolphin in action:

This guy is swimming in the Rio Negro, a large tributary of the Amazon. But there are also Amazon river dolphins in the Orinoco, another huge river in South America, not connected to the Amazon! You can see it just north of the Rio Negro:

Was it ever connected to the Amazon? If not, what’s the story about how the same species of dolphins live in both river basins?

By the way, my joke about mermaids comes from the etymology of the word ‘sirenian’. There’s a legend that lonely sailors—very lonely, it seems—mistook sea cows for mermaids, also known as ‘sirens’.

## Azimuth on Google Plus (Part 5)

1 January, 2012

Happy New Year! I’m back from Laos. Here are seven items, mostly from the Azimuth Circle on Google Plus:

1) Phil Libin is the boss of a Silicon Valley startup. When he’s off travelling, he uses a telepresence robot to keep an eye on things. It looks like a stick figure on wheels. Its bulbous head has two eyes, which are actually a camera and a laser. On its forehead is a screen, where you can see Libin’s face. It’s made by a company called Anybots, and it costs just $15,000. I predict that within my life we’ll be using things like this to radically cut travel costs and carbon emissions for business and for conferences. It seems weird now, but so did telephones. Future models will be better to look at. But let’s try it soon! • Laura Sydell No excuses: robots put you in two places at once, Weekend Edition Saturday, 31 December 2011. Bruce Bartlett and I are already planning for me to use telepresence to give a lecture on mathematics and the environment at Stellenbosch University in South Africa. But we’d been planning to use old-fashioned videoconferencing technology. Anybots is located in Mountain View, California. That’s near Google’s main campus. Can anyone help me set up a talk on energy and the environment at Google, where I use an Anybot? (Or, for that matter, anywhere else around there?) 2) A study claims to have found a correlation between weather and the day of the week! The claim is that there are more tornados and hailstorms in the eastern USA during weekdays. One possible mechanism could be that aerosols from car exhaust help seed clouds. I make no claims that this study is correct. But at the very least, it would be interesting to examine their use of statistics and see if it’s convincing or flawed: • Thomas Bell and Daniel Rosenfeld, Why do tornados and hailstorms rest on weekends?, Journal of Geophysical Research 116 (2011), D20211. Abstract. This study shows for the first time statistical evidence that when anthropogenic aerosols over the eastern United States during summertime are at their weekly mid-week peak, tornado and hailstorm activity there is also near its weekly maximum. The weekly cycle in summertime storm activity for 1995–2009 was found to be statistically significant and unlikely to be due to natural variability. It correlates well with previously observed weekly cycles of other measures of storm activity. The pattern of variability supports the hypothesis that air pollution aerosols invigorate deep convective clouds in a moist, unstable atmosphere, to the extent of inducing production of large hailstones and tornados. This is caused by the effect of aerosols on cloud drop nucleation, making cloud drops smaller and hydrometeors larger. According to simulations, the larger ice hydrometeors contribute to more hail. The reduced evaporation from the larger hydrometeors produces weaker cold pools. Simulations have shown that too cold and fast-expanding pools inhibit the formation of tornados. The statistical observations suggest that this might be the mechanism by which the weekly modulation in pollution aerosols is causing the weekly cycle in severe convective storms during summer over the eastern United States. Although we focus here on the role of aerosols, they are not a primary atmospheric driver of tornados and hailstorms but rather modulate them in certain conditions. Here’s a discussion of it: • Bob Yirka, New research may explain why serious thunderstorms and tornados are less prevalent on the weekends, PhysOrg, 22 December 2011. 3) And if you like to check how people use statistics, here’s a paper that would be incredibly important if its findings were correct: • Joseph J. Mangano and Janette D. Sherman, An unexpected mortality increase in the United States follows arrival of the radioactive plume from Fukushima: is there a correlation?, International Journal of Health Services 42 (2012), 47–64. The title has a question mark in it, but it’s been cited in very dramatic terms in many places, for example this video entitled “Peer reviewed study shows 14,000 U.S. deaths from Fukushima”: Starting at 1:31 you’ll see an interview with one of the paper’s authors, Janette Sherman. 14,000 deaths in the US due to Fukushima? Wow! How did they get that figure? This quote from the paper explains how: During weeks 12 to 25 [after the Fukushima disaster began], total deaths in 119 U.S. cities increased from 148,395 (2010) to 155,015 (2011), or 4.46 percent. This was nearly double the 2.34 percent rise in total deaths (142,006 to 145,324) in 104 cities for the prior 14 weeks, significant at p < 0.000001 (Table 2). This difference between actual and expected changes of +2.12 percentage points (+4.46% – 2.34%) translates to 3,286 “excess” deaths (155,015 × 0.0212) nationwide. Assuming a total of 2,450,000 U.S. deaths will occur in 2011 (47,115 per week), then 23.5 percent of deaths are reported (155,015/14 = 11,073, or 23.5% of 47,115). Dividing 3,286 by 23.5 percent yields a projected 13,983 excess U.S. deaths in weeks 12 to 25 of 2011. Hmm. Can you think of some potential problems with this analysis? In the interview, Janette Sherman also mentions increased death rates of children in British Columbia. Here’s the evidence the paper presents for that: Shortly after the report [another paper by the authors] was issued, officials from British Columbia, Canada, proximate to the northwestern United States, announced that 21 residents had died of sudden infant death syndrome (SIDS) in the first half of 2011, compared with 16 SIDS deaths in all of the prior year. Moreover, the number of deaths from SIDS rose from 1 to 10 in the months of March, April, May, and June 2011, after Fukushima fallout arrived, compared with the same period in 2010. While officials could not offer any explanation for the abrupt increase, it coincides with our findings in the Pacific Northwest. 4) For the first time in 87 years, a wild gray wolf was spotted in California: • Stephen Messenger, First gray wolf in 80 years enters California, Treehugger, 29 December 2011. Researchers have been tracking this juvenile male using a GPS-enabled collar since it departed northern Oregon. In just a few weeks, it walked some 730 miles to California. It was last seen surfing off Malibu. Here is a photograph: 5) George Musser left the Centre for Quantum Technologies and returned to New Jersey, but not before writing a nice blog article explaining how the GRACE satellite uses the Earth’s gravitational field to measure the melting of glaciers: • George Musser, Melting glaciers muck up Earth’s gravitational field, Scientific American, 22 December 2011. 6) The American Physical Society has started a new group: a Topical Group on the Physics of Climate! If you’re a member of the APS, and care about climate issues, you should join this. 7) Finally, here’s a cool picture taken in the Gulf of Alaska by Kent Smith: He believes this was caused by fresher water meeting more salty water, but it doesn’t sounds like he’s sure. Can anyone figure out what’s going on? The foam where the waters meet is especially intriguing. ## The Global Amphibian Crisis 8 December, 2011 There’s a fungus that infects many kinds of amphibians. Some get wiped out entirely—but it’s harbored harmlessly by others, so it’s impossible to eradicate. Over a hundred species have disappeared in the last 20 years! You’ve got to read this: • Joseph R. Mendelson III, Lessons of the lost, American Scientist 99 (November-December 2011), 438. The fungus causes a disease called chytridiomycosis. The effects are gruesome: when spores land on a susceptible amphibian, they quickly sprout and form a vase-shaped structure that harvests energy from the animal’s skin. This produces more spores, which swim around using flagella and spread. The disease progresses as these reinfect the host. The victim may become lethargic, lose skin over its body, go into convulsions, and die. Amphibian populations have been dropping rapidly worldwide since the 1980s. There were about 6500 species, but now 30% of these are endangered, about 130 are ‘missing’, and about 30 are extinct in the wild. There were many theories about the cause of this decline, but now we know this disease is playing a big role. As Mendelson says: Herpetologists and wildlife biologists began observing inexplicable disappearances of amphibians around the globe in the mid-1970s and especially by the mid-1980s but were at a complete loss to explain them. Finally, in the late 1990s, an insightful team of pathologists at the U.S. National Zoo, led by Don Nichols, collaborated with one of the few chytrid fungus scholars in the world, Joyce Longcore, and identified this quite unusual new genus and species. Conservationists and disease ecologists were unprepared for the reality of a pathogen capable of directly and rapidly—mere months!—causing the elimination of a population or an entire species that was otherwise robust. Classical host-pathogen theory held that such dramatic consequences to the host population or species were only realized when the host population was already drastically reduced in size or otherwise compromised. The concept of a lightning extinction was foreign to researchers and conservationists, and we argued vehemently about it throughout the 1990s at symposia worldwide. In retrospect, the scenario of a spreading pathogen is parsimonious and clear, but in the midst of the massacre we were entangled in logical quagmires along these lines: “The disappearances cannot be the result of disease; diseases are not capable of such.” Not to mention the fact that the smoking gun, the pathogen itself, was not described until 1999. While we were debating the issue, a terrible lesson was playing out for us around the world as an unknown disease decimated amphibian populations. What are the ‘lessons’ that Mendelson is talking about? Here are some: Our powerlessness in this terrible crisis must be balanced by increased efforts in realms that we can control, such as reducing carbon emissions to protect what habitat remains from chemical and physical disruption. We can go further and restore what has been wounded but can still be salvaged. We need to inspire and fund truly innovative research on pathogens in order to better predict and thwart emerging infectious diseases. The lessons we learn here will extend far beyond the amphibians. We must support funding for programs such as the Amphibian Ark and the Amphibian Survival Alliance. We must keep looking for species gone missing, and continue biodiversity surveys, despite the sometimes paralyzing depression that both activities can induce in this era. But especially, we need to pay close attention to the lessons that legions of dead amphibians are teaching us. I note with some satisfaction that our colleagues in bat research and conservation did not spend a decade arguing whether the fungus that causes white-nose syndrome could possibly eliminate entire colonies of bats in a single season. Our colleagues assumed that it was possible and reacted quickly. We can thank the amphibians for leaving us that lesson, but at such cost. Yes, millions of bats in America have died from a new fungal disease called white-nose syndrome. What role, if any, do people play in the spread of these new diseases? Why are they happening now? In the case of amphibians, people helped spread American bullfrogs. These are resistant to the disease, but carry it. They’ve largely taken over here in Singapore. Global warming seems not to be responsible, because the worst outbreaks happen at high elevations, where it’s cool: that’s where the fungus thrives. As for the bats, the same fungus that’s killing bats in America is found in healthy bats in Europe, which suggests the disease spread from there. People might carry spores on their clothes from infected caves to not-yet-infected ones, so visitors to caves with bats are being asked to limit their activities, and disinfect clothing and equipment. It’s completely against the rules to visit some caves now. There have been successful attempts to cure some amphibians of chytridiomycosis: Reid Harris of James Madison University has claims that coating frogs with Janthinobacterium lividum protects them from chytridiomycosis. • A team of scientists published a paper claiming that Archey’s frog (Leiopelma archeyi), a critically endangered species in New Zealand, was successfully cured of chytridiomycosis by applying chloramphenicol topically. • Don Nichols claims to have cured several species of frogs using a drug called itraconazole. • Jay Redmond at WWT Slimbridge, Gloucestershire claims that raising poison dart frogs in water containing Rooibos tea (Aspalathus linearis) wards off chytridiomycosis. The Amphibian Ark is trying to keep populations alive that have died in the wild. They have a list of suggestions on what you can do to help. For starters: • Don’t ever release pet amphibians into the wild. • Build a frog pond: here’s how. Even in arid places like Riverside California, our friends who built some ponds soon found them occupied by sweetly chirping frogs. • Get involved in collaborations that promote sustainable breeding and management, like the Amphibian Steward Network. • Figure things out. Zoos don’t even know how to breed common toads without using artificial hormone injections! If you could find a way, maybe the same technique could be used with threatened species. • If you’re a student, go to James Madison University and work with Reid Harris: or go to the University of Maine and work with Joyce Longcore: or find a university closer to you with someone leading a group that studies chytridiomycosis! (Click on the pictures for even more info.) I thank Allen Knutson for pointing out the American Scientist article. This is the best popular science magazine in the English language, but I let my subscription lapse when I came to Singapore! ## Wild Cats of Arizona 5 December, 2011 Here’s a quick followup to our discussion of the wild cats of Sumatra caught on camera by the World Wildlife Foundation. Recently there have been sightings of rare wild cats in Arizona! • Marc Lacey, In southern Arizona, rare sightings of ocelots and jaguars send shivers, New York Times, 4 December 2011. Guide describes roaring, powerful jaguar, Arizona Daily Star, 23 November 2011. For example, consider Donnie Fenn, who specializes in hunting and killing mountain lions (also known as cougars or pumas). He was taking his 10-year-old daughter out on her first hunt when his pack of hounds took off and cornered something in a tree. He then saw with the telephoto lens of his camera that it wasn’t a mountain lion—it was a jaguar, which is about twice as big! “It’s the most amazing thing that’s ever happened to me,” said Fenn, who leads hunters to mountain lions with his dogs. “To be honest with you—I got to see it in real life, my daughter got to see it, but I hope never to encounter it again. “I was nervous, scared, everything. It was just the aggressiveness—the power it had, the snarling. It wasn’t a snarl like a lion. It was a roar. I’ve never heard anything like it.” Fenn was thrilled as well as scared. He had never expected to see such a large, endangered cat so early in his life, at age 32, he said. A lifelong hunter and Benson resident, he runs the mountain lion guide service as a sideline while working full time in an excavating business. He described his one-hour encounter with the jaguar as “a dream come true.” He came away respectful of its power, speed and size. “All my dogs took a pretty good beating. They had puncture wounds. … I got to see it in real life, and I’m glad, but I hope to never encounter it again,” he repeated. He crept up close and took photos and a video of the jaguar: He also notified state wildlife officials, who were later able to find hair samples left behind by the animal and a tree trunk that showed signs of being climbed by a large clawed animal. They believe he saw an adult male jaguar that weighed about 90 kilograms. The jaguar, Panthera onca, is the third-largest cat in the world, only outranked by the lion and tiger. It’s the only surviving New World member of the genus Panthera. For example, there was once an American lion, but that went extinct 10,000 years ago, along with a lot of other large mammals, after people showed up. DNA evidence shows that the lion, tiger, leopard, jaguar, snow leopard, and clouded leopard share a common ancestor, and that this whole gang is between 6 and 10 million years old. (The so-called ‘mountain lion’, Puma concolor, is not in this group.) Jaguars have mostly been killed off in the United States, but they survive from Mexico to Central and South America all the way down to Paraguay and northern Argentina. They are listed as ‘near threatened’ by the IUCN, or International Union for the Conservation of Nature. The Arizona Fish and Game Department has also announced two reliable sightings of ocelots this year! The ocelot, Leopardus pardalis, is a much smaller fellow, about the size of a domestic cat. Ocelots live in many parts of South and Central America and Mexico, and they’re listed as being of ‘least concern’ by the IUCN. Once their range extended up into the chaparral thickets of the Gulf Coast of south and eastern Texas, as well as part of Arizona, Louisiana, and Arkansas. But by now they are very hard to find in the United States. They seem to eke out an existence only in several small areas of dense thicket in South Texas… and, we now know, Arizona! ## Wild Cats of Sumatra 17 November, 2011 Sumatra is just half an hour from here. I’ve never visited it, but I’m awfully curious. So, I was excited to hear today that the World Wildlife Fund put cameras in the forest there and caught pictures of 5 species of wild cats! You can see them here: • World Wildlife Fund, Remarkable images of big cats urge forest protection. I’ve been curious about the smaller wild cats of Asia ever since I met this absurdly sweet thing which belongs to a friend of mine named Julia Strauss: Julia lives in London, but she went all the way to Wales to buy this cat. Why? Because it’s a Bengal. That means it’s a crossbreed of an ordinary domestic cat with a leopard cat! The leopard cat, Prionailurus bengalensis, is the most widespread of the Asian small cats. It has a huge range, from the Amur region in the Russian Far East through Korea, China, Indochina, India… all the way to the Pakistan in the west… and to Philippines and some islands in Indonesia in the south. It’s listed as ‘least vulnerable’ to extinction. So, it’s not surprising that leopard cats are one of the kinds the WWF saw in Sumatra. Here’s one in the Berlin Zoo, photographed by F. Spangenberg: Ain’t it cute? It’s about the size of a domestic cat, but it has a different number of chromosomes than Felis domesticus, so it’s a bit remarkable that they can interbreed. The resulting Bengals share some traits with the leopard cat: or example, leopard cats like to fish, and Bengals like to play around in their water bowls! Another cat the WWF saw in Sumatra is the marbled cat, Pardofelis marmorata. It’s again about the same size as a house cat, but it likes to hunt while climbing around in trees! This feisty fellow is listed as ‘vulnerable;—there are probably about 10,000 of them in the world, not counting kittens. They live from the Himalayan foothills westward into Nepal and eastward into southwest China, and also on Sumatra and Borneo. Then there’s the Asian golden cat, Pardofelis temminckii. These guys are two to three times as big as a domestic cat! I saw one at the Night Safari in Singapore—a kind of zoo for nocturnal animals. Here’s a picture of one taken by Karen Stout: They live all the way from Tibet, Nepal, and India to Thailand, Cambodia, Laos, and Vietnam, to down here around Malaysia and Sumatra. However, they’re listed as ‘near threatened’, due to hunting and habitat loss. They’re hunted for the illegal wildlife trade, and some people kill it for eating poultry—and also supposedly sheep, goats and buffalo calves. Moving further up the size ladder, we meet the Sunda clouded leopard, Neofelis diardi. Here’s a great photo by ‘spencer77’: These guys are special! They only live on Borneo and Sumatra, they’re listed as “vulnerable”, and scientists only realized they’re a separate species in 2006! Before that, people thought they were the same as the ordinary kind of clouded leopard, Neofelis nebulosa. But genetic testing showed that they diverged from that species about 1.4 million years ago, after having crossed a now submerged land bridge to reach Borneo and Sumatra. I’ve seen the ordinary kind of clouded leopard at the Night Safari. But calling them ‘ordinary’ is not really fair: they’re beautiful, mysterious, well-camouflaged beasts—very hard to see even if you know they’re right in front of you! Indeed, very little is known about either kind of clouded leopard, because they’re so elusive and reclusive. And finally, the biggest kitty on the island: the Sumatran tiger, Panthera tigris sumatrae! It’s a subspecies of tiger that only lives on Sumatra. It’s listed as “critically endangered”. The World Wildlife Fund estimates there are fewer than 500 of these tigers left in the wild—maybe a lot fewer. This beauty was photographed in the Berlin Zoo by ‘Captain Herbert’: Sumatran tigers have webbing between their toes, which makes them really good swimmers! They get up to 2.5 meters long, but they’re is the smallest of tigers, as you might expect from a species on a hot tropical island. (The biggest is the Siberian tiger, which I talked about earlier.) There are lots of palm oil plantations in Sumatra. People burn down the jungle to plant palms, and the smoke sometimes creates a thick smelly haze even here in Singapore. It’s horrible. This deforestation is the main threat to the Sumatran Tiger. Also, many tigers are killed every year by poachers. On the bright side, in 2006 the Indonesia Forestry Service, the Natural Resources and Conservational Agency, and the Sumatran Tiger Conservation Program sat down with companies including Asia Pulp & Paper and set up the Senepis Buluhala Tiger Sanctuary, which is 106,000 hectares in size. There’s also a large area for tigers called the Tambling Wildlife Nature Conservation on the southern tip of Sumatra. And the Australia Zoo has a program of reintroducing tigers to their natural habitat in Sumatra. Okay, that’s it for now. I’ve got you all softened up, and I’m not even going to ask you for donations. Just be nice to cats, okay? Or better yet, be nice to life in general. We’re all in this together. ## Azimuth on Google Plus (Part 4) 11 November, 2011 Again, some eye candy to start the show. Stare fixedly at the + sign here until the pink dots completely disappear: In a semiconductor, a ‘hole’ is the absence of an electron, and it can move around a as if it were a particle. If you have a hole moving to the right, in reality you have electrons moving to the left. Here pink dots moving counterclockwise look like a green dot moving clockwise! A related puzzle: what happens when you hold a helium balloon on a string while you’re driving in a car with the windows closed… and then you make a sharp right turn? I’ve done it, so I know from experience. Now for the real stuff: • Tom Murphy, a physics professor at U.C. San Diego, has a blog worth visiting: Do the Math. He uses physics and math to make informed guesses about the future of energy production. Try out his overview on ‘peak oil’. • Hundreds of top conservation scientists took a survey, and 99.5% felt that a serious loss of biodiversity is either ‘likely’, ‘very likely’, or ‘virtually certain’. Tropical coral ecosystems were perceived as the most seriously affected. A slim majority think we need to decide on rules for ‘triage’: deciding which species to save and which to give up on. • Climate change is causing a massive change in tree species across Western USA. “Ecosystems are always changing at the landscape level, but normally the rate of change is too slow for humans to notice,” said Steven Running, a co-author of a study on this at the University of Montana. “Now the rate of change is fast enough we can see it.” The study used remote sensing of large areas over a four-year period. • The James Dyson Award calls on design and engineering students to create innovative, practical, elegant solutions to the challenges that face us. This year, Edward Linacre won for a self-powering device that extracts water from the air for irrigation purposes. Linacre comes from the drought-afflicted continent of Australia. But his invention borrows some tricks from the Namib beetle, which survives some of the driest deserts in Africa by harvesting the moisture that condenses on its back during the early morning. That’s called biomimicry. • The New York Times has a great profile of Jeremy Grantham. He heads a successful firm managing$100 billion assets, and now he’s 72. So why is he saying this?

… it’s very important to me to make a lot of money now, much more than when I was 40 or 50.

Not because he has a brand new gold-digger ‘trophy wife’ or spendthrift heirs. No, he puts all the money into the Grantham Foundation for the Protection of the Environment. He’s famous for his quarterly letters on future trends—you can read them free online! And thanks to this, he has some detailed ideas about what’s coming up, and what we should do about it:

Energy “will give us serious and sustained problems” over the next 50 years as we make the transition from hydrocarbons—oil, coal, gas—to solar, wind, nuclear and other sources, but we’ll muddle through to a solution to Peak Oil and related challenges. Peak Everything Else will prove more intractable for humanity. Metals, for instance, “are entropy at work . . . from wonderful metal ores to scattered waste,” and scarcity and higher prices “will slowly increase forever,” but if we scrimp and recycle, we can make do for another century before tight constraint kicks in.

Agriculture is more worrisome. Local water shortages will cause “persistent irritation”—wars, famines. Of the three essential macro nutrient fertilizers, nitrogen is relatively plentiful and recoverable, but we’re running out of potassium and phosphorus, finite mined resources that are “necessary for all life.” Canada has large reserves of potash (the source of potassium), which is good news for Americans, but 50 to 75 percent of the known reserves of phosphate (the source of phosphorus) are located in Morocco and the western Sahara. Assuming a 2 percent annual increase in phosphorus consumption, Grantham believes the rest of the world’s reserves won’t last more than 50 years, so he expects “gamesmanship” from the phosphate-rich.

And he rates soil erosion as the biggest threat of all. The world’s population could reach 10 billion within half a century—perhaps twice as many human beings as the planet’s overtaxed resources can sustainably support, perhaps six times too many.

It’s not that he doesn’t take climate change seriously. However, he seems to have almost given up on the US political establishment doing anything about it. So he’s shifted his focus:

Grantham put his own influence and money behind the climate-change bill passed by the House in 2009. “But even \$100 million wouldn’t have gotten it through the Senate,” he said. “The recession more or less ruled it out. It pushed anything having to do with the environment down 10 points, across the board. Unemployment and interest in environmental issues move inversely.”

Having missed a once-in-a-generation legislative opportunity to address climate change, American environmentalists are looking for new strategies. Grantham believes that the best approach may be to recast global warming, which depresses crop yields and worsens soil erosion, as a factor contributing to resource depletion. “People are naturally much more responsive to finite resources than they are to climate change,” he said. “Global warming is bad news. Finite resources is investment advice.” He believes this shift in emphasis plays to Americans’ strength. “Americans are just about the worst at dealing with long-term problems, down there with Uzbekistan,” he said, “but they respond to a market signal better than almost anyone. They roll the dice bigger and quicker than most.”

Let’s wrap up with some more fun stuff: impressive volcanos!

Morgan Abbou explains:

Volcanic lightning photograph by Francisco Negroni. In a scene no human could have witnessed, an apocalyptic agglomeration of lightning bolts illuminates an ash cloud above Chile’s Puyehue volcano in June 2011. The minutes-long exposure shows individual bolts as if they’d all occurred at the same moment and, due to the Earth’s rotation, renders stars (left) as streaks. Lightning to the right of the ash cloud appears to have illuminated nearby clouds.hence the apparent absence of stars on that side of the picture. After an ominous series of earthquakes on the previous day, the volcano erupted that afternoon, convincing authorities to evacuate some 3,500 area residents. Eruptions over the course of the weekend resulted in heavy ashfalls, including in Argentine towns 60 miles (a hundred kilometers) away.

Here’s another shot of the same volcano:

And here’s Mount Etna blowing out a smoke ring in March of 2000. By its shadow, this ring was estimated to be 200 meters in diameter!

## Measuring Biodiversity

7 November, 2011

guest post by Tom Leinster

Even if there weren’t a global biodiversity crisis, we’d want to know how to put a number on biodiversity. As Lord Kelvin famously put it:

When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science.

In this post, I’ll talk about what happens when you take a mass of biological data and try to turn it into a single number, intended to measure biodiversity.

There have been more than 50 years of debate about how to measure diversity. While the idea of putting a number on biological diversity goes back to the 1940s at least, the debate really seems to have got going in the wake of pioneering work by the great ecologist Robert Whittaker in the 1960s.

There followed several decades in which progress was made… but there was a lot of talking at cross-purposes. In fact, there was so much confusion that some people gave up on the diversity concept altogether. The mood is summed up by the title of an excellent and much-cited paper of Stuart Hurlbert:

• S. H. Hurlbert, The nonconcept of species diversity: A critique and alternative parameters. Ecology 52:577–586, 1971.

So why all the confusion?

One reason is that the word “diversity” is used by different people in many different ways. We all know that diversity is important: so if you found a quantity that seemed to measure biological variation in a sensible way, you might be tempted to call it “diversity” and publish a paper promoting your quantity over all other quantities that have ever been given that name. There are literally dozens of measures of diversity in the literature. Here are two simple ones:

• Species richness is simply the number of species in the community concerned.
• The Shannon entropy is $-\sum_{i = 1}^S p_i \log(p_i)$, where our community consists of $S$ species in proportions $p_1, \ldots, p_S$.

Which quantity should we call “diversity”? Do all these quantities really measure the same kind of thing? If community A has greater than species richness than community B, but lower Shannon entropy, what does it mean?

Another cause for confusion is a blurring between the questions

Which quantities deserve to be called diversity?

and

Which quantities are we capable of measuring experimentally?

For example, we might all agree that species richness is an important quantity, but that doesn’t mean that species richness is easy to measure in practice. (In fact, it’s not, more on which below.) My own view is that the two questions should be kept separate:

The statistical problem of designing appropriate estimators becomes relevant only after the measure to be estimated is accepted to be meaningful.

(Hans-Rolf Gregorius, Elizabeth M. Gillet, Generalized Simpson-diversity, Ecological Modelling 211:90–96, 2008.)

The problems involved in quantifying diversity are of three types: practical, statistical and conceptual. I’ll say a little about the first two, and rather more about the third.

Practical  Suppose that you’re doing a survey of the vertebrates in a forest. Perhaps one important species is brightly coloured and noisy, while another is silent, shy, and well-camouflaged. How do you prevent the first from being recorded disproportionately?

Or suppose that you’re carrying out a survey, with multiple people doing the fieldwork. Different people have a tendency to spot different things: for example, one person might be short-sighted and another long-sighted. How do you ensure that this doesn’t affect your results?

Statistical  Imagine that you want to know how many distinct species of insect live in a particular area — the “species richness”, in the terminology introduced above. You go out collecting, and you come back with 100 specimens representing 10 species.

But your survey might have missed some species altogether, so you go out and get a bigger sample. This time, you get 200 specimens representing 15 species. Does this help you discover how many species there really are?

Logically, not at all. The only certainty is that there are at least 15 species. Maybe there are thousands of species, but almost all of them are extremely rare. Or maybe there are really only 15. Unless you collect all the insects, you’ll never know for sure exactly how many species there are.

However, it may be that you can make reasonable assumptions about the frequency distribution of the species. People sometimes do exactly this, to try to overcome the difficulty of estimating species richness.

Conceptual  This is what I really want to talk about.

I mentioned earlier that different people mean different things by “diversity”. Here’s an example.

Consider two bird communities. The first looks like this:

It contains four species, one of which is responsible for most of the population, and three of which are quite rare. The second looks like this:

It has only three species, but they’re evenly balanced.

Which community is the more diverse? It’s a matter of opinion. Mostly in the press, and in many scholarly articles too, “biodiversity” is used as a synonym for “species richness”. On this count, the first community is more diverse. But if you’re more concerned with the healthy functioning of the whole community, the presence of rare species might not be particularly important: it’s balance that matters, and the second community has more of that.

Different people using the word “diversity” attach different amounts of significance to rare species. There’s a spectrum of points of view, ranging from those who give rare species the same weight as common ones (as in the definition of species richness) to those who are only interested in the most common species of all. Every point on this spectrum of viewpoints is reasonable. None should have a monopoly on the word “diversity”.

At least, that’s what Christina Cobbold and I argue in our new paper:

• Tom Leinster, Christina A. Cobbold, Measuring diversity: the importance of species similarity, Ecology, in press (doi:10.1890/10-2402.1).

But that’s not actually our main point. As the title suggests, the real purpose of our paper is to show how to measure diversity in a way that reflects the varying differences between species. I’ll explain.

Most of the existing approaches to measuring biodiversity go like this.

We have a “community” of organisms — the fish in a lake, the fungi in a forest, or the bacteria on your skin. This community is divided into $S$ groups, conventionally called species, though they needn’t be species in the ordinary sense.

We assume that we know the relative abundances, or relative frequencies, of the species. Write them as $p_1, \ldots, p_S$. Thus, $p_i$ is the proportion of the total population that belongs to the $i$th species, where “proportion” is measured in any way you think sensible (number of individuals, total mass, etc).

We only care about relative abundances here, not absolute abundances: so $p_1 + \cdots + p_S = 1$. If half of a forest is destroyed, it might be a catastrophe, but on the (unrealistic) assumption that all the flora and fauna in the forest were distributed homogeneously, it won’t actually change the biodiversity. (That’s not a statement about what’s important in life; it’s only a statement about the usage of a word.)

This model is common but crude. It can’t detect the difference between a community of six dramatically different species and a community consisting of six species of barnacle.

So, Christina and I use a refined model, as follows. We assume that we also have a measure of the similarity between each pair of species. This is a real number between 0 and 1, with 0 indicating that the species are as dissimilar as could be, and 1 indicating that they’re identical. Writing the similarity between the $i$th and $j$th species as $Z_{ij}$, this gives an $S \times S$ matrix $\mathbf{Z}$. Our only assumption on $\mathbf{Z}$ is that its diagonal entries are all 1: every species is identical to itself.

There are many ways of measuring inter-species similarity. Probably the most familiar approach is genetic, as in “you share 98% of your DNA with a chimpanzee”. But there are many other possibilities: functional, phylogenetic, morphological, taxonomic, …. Diversity is a measure of the variety of life; having to choose a measure of similarity forces you to get clear exactly what you mean by “variety”.

Christina and I are by no means the first people to incorporate species similarity into the model of an ecological community. The main new thing in our paper is this measure of the community’s diversity:

${}^q D^{\mathbf{Z}}(\mathbf{p}) = ( \sum_i p_i (\mathbf{Z}\mathbf{p})_i^{q - 1} )^{1/(1 - q)}.$

What does this mean?

• ${}^q D^{\mathbf{Z}}(\mathbf{p})$ is what we call the diversity of order $q$ of the community. Here $q$ is a parameter between $0$ and $\infty$, which you get to choose. Different values of $q$ represent different points on the spectrum of viewpoints described above. Small values of $q$ give high importance to rare species; large values of $q$ give high importance to common species.
• $\mathbf{p}$ is shorthand for the relative abundances $p_1, \ldots, p_S$, and $\mathbf{Z}$ is the matrix of similarities.
• $(\mathbf{Z}\mathbf{p})_i$ means $\sum_j Z_{ij} p_j$.

The expression doesn’t make sense if $q = 1$ or $q = \infty$, but can be made sense of by taking limits. For $q = 1$, this gives

${}^1 D^{\mathbf{Z}}(\mathbf{p}) = 1/(\mathbf{Z p})_1^{p_1} (\mathbf{Z p})_2^{p_2} \cdots (\mathbf{Z p})_S^{p_S} = \exp(-\sum_i p_i \log(\mathbf{Z p})_i)$

If you want to know the value at $q = \infty$, or any of the other mathematical details, you can read this post at the n-Category Café, or of course our paper. In both places, you’ll also find an explanation of what motivates this formula. What’s more, you’ll see that many existing measures of diversity are special cases of ours, obtained by taking particular values for $q$ and/or $\mathbf{Z}$.

But I won’t talk about any of that here. Instead, I’ll tell you how taking species similarity into account can radically alter the assessment of diversity.

I’ll do this using an example: butterflies of subfamily Charaxinae at a site in an Ecuadorian rainforest. The data is from here:

• P. J. DeVries, D. Murray, R. Lande, Species diversity in vertical, horizontal and temporal dimensions of a fruit-feeding butterfly community in an Ecuadorian rainforest. Biological Journal of the Linnean Society 62:343–364, 1997.

They measured the butterfly abundances in both the canopy (top level) and understorey (lower level) at this site, with the following results:

 Species Canopy Understorey Prepona laertes 15 0 Archaeoprepona demophon 14 37 Zaretis itys 25 11 Memphis arachne 89 23 Memphis offa 21 3 Memphis xenocles 32 8

Which is more diverse: canopy or understorey?

We’ve already seen that the answer is going to depend on what exactly we mean by “diverse”.

First let’s answer the question under the (crude!) assumption that different species have nothing whatsoever in common. This means taking our similarity matrix $\mathbf{Z}$ to be the identity matrix: if $i \neq j$ then $Z_{ij} = 0$ (totally dissimilar), and if $i = j$ then $Z_{ii} = 1$ (totally identical).

Now, remember that there’s a spectrum of viewpoints on how much importance to give to rare species when measuring diversity. Rather than choosing a particular viewpoint, we’ll calculate the diversity from all viewpoints, and display it on a graph. In other words, we’ll draw the graph of ${}^q D^{\mathbf{Z}}(\mathbf{p})$ (the diversity of order $q$) against $q$ (the viewpoint). Here’s what we get:

(the horizontal axis should be labelled with a $q$.)

Conclusion: from all viewpoints, the butterfly population in the canopy is at least as diverse as that in the understorey.

Now let’s do it again, but this time taking account of the varying similarities between species of butterflies. We don’t have much to go on: how do we know whether Prepona laertes is very similar to, or very different from, Archaeoprepona demophon? With only the data above, we don’t. So what can we do?

All we have to go on is the taxonomy. Remember your high school biology: for the butterfly Prepona laertes, the genus is Prepona and the species is laertes. We’d expect species in the same genus to have more in common than species in different genera. So let’s define the similarity between two species as follows:

• the similarity is 1 if the species are the same
• the similarity is 0.5 if the species are different but in the same genus
• the similarity is 0 if they are not even in the same genus.

This is still crude, but in the absence of further information, it’s about the best we can do. And it’s better than the first approach, where we ignored the taxonomy entirely. Throwing away biologically relevant information is unlikely to lead to a better assessment of diversity.

Using this taxonomic matrix $\mathbf{Z}$, and the same abundances, the diversity graphs become:

This is more interesting! For $q > 1$, the understorey looks more diverse than the canopy — the opposite conclusion to our first approach.

It’s not hard to see why. Look again at the table of abundances, but paying attention to the genera of the butterflies. In the canopy, nearly three-quarters of the butterflies are of genus Memphis. So when we take into account the fact that species in the same genus tend to be somewhat similar, the canopy looks much less diverse than it did before. In the understorey, however, the species are spread more evenly between genera, so taking similarity into account leaves its diversity relatively unchanged.

Taking account of species similarity opens up a world of uncertainty. How should we measure similarity? There are as many possibilities as there are quantifiable characteristics of living organisms. It’s much more reassuring to stay in the black-and-white world where distinct species are always assigned a similarity of 0, no matter how similar they might actually be. (This is, effectively, what most existing measures do.) But that’s just hiding from reality.

Maybe you disagree! If so, try the the Discussion section of our paper, where we lay out our arguments in more detail. Or let me know by leaving a comment.

## Bioremediation and Ecological Restoration Job

23 October, 2011

The University of Texas – Pan American (UTPA) Department of Biology is trying to fill an Assistant Professor Faculty position in Bioremediation and Ecological Restoration, which will start in Fall 2012 pending budget approval. They’re looking for someone whose area of research is bioremediation and/or ecological restoration, and they’re especially interested in candidates whose research focuses on environmental issues relevant to the Lower Rio Grande Valley.

For more details, go here.