## The Mathematics of Biodiversity (Part 1)

Research Program on the Mathematics of Biodiversity, June-July 2012, Centre de Recerca Matemàtica, Barcelona, Spain. Organized by Ben Allen, Silvia Cuadrado, Tom Leinster, Richard Reeve and John Woolliams.

We’re having daily informal talks and there’s no way I can blog about all of them, talk to people here, and still get enough work done. So, I’ll just mention a few things that strike me! For example, this morning Lou Jost told me about an interesting paper by I. J. Good.

I’d known of I. J. Good as one of the guys who came up with the concept of a ‘technological singularity’. In 1963 he wrote:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

He was a British mathematician who worked as a cryptologist at Bletchley Park with Alan Turing. After World War II, he continued to work with Turing on the design of computers and Bayesian statistics at the University of Manchester. Later he moved to the US. In 1968, thanks to his interest in artificial intelligence, he served as consultant for Stanley Kubrick’s film 2001: A Space Odyssey. He died in 2009.

Good was also a big chess enthusiast, and worked on writing programs to play chess. He’s the guy in front here:

But the paper Lou Jost mentioned is on a rather different subject:

• Irving John Good, The population frequency of species and the estimation of population parameters, Biometrika 40 (1953), 237–264.

Let me just state one result, sloppily, without any details or precise hypotheses!

Puzzle: Suppose you go into the jungles of Ecuador and start collecting orchids. You count the number of orchids of each different species that you find. You get a list of numbers, something like this:

$14, 10, 8, 6, 2, 1, 1, 1$

What is the chance that the next orchid you find will belong to a new species?

Good gives a rule of thumb for solving problems of this type:

$\displaystyle{ \frac{n_1}{N} }$

Here $N$ is the total number of orchid you collected, and $n_i$ is the number of species for which you found exactly $i$ orchids of that species. In our example,

$n_1 = 3$

since we found just one orchid of three different species: those are the three 1′s at the end of our list. Furthermore,

$N = 14 + 10 + 8 + 6 + 2 + 1 + 1 = 42$

So here is Good’s estimate the chance that the next orchid you collect will be of a new species:

$\displaystyle{ \frac{n_1}{N} = \frac{3}{42} }$

Good’s argument is nontrivial—and of course it depends on some assumptions on the nature of the distribution of populations of different species! Since he doesn’t state these assumptions succinctly and I haven’t read the paper carefully yet, I’m afraid you’ll have to read the paper to find out what they are.

Of course the math works for samples of anything that comes in distinct types, not just species of organisms! Good considers four examples:

• moths captured in a light-trap at Rothamsted, England,

• words in American newspapers,

• nouns in Macaulay’s essay on Bacon,

• chess openings in games published by the British Chess Magazine in 1951.

By comparing a small sample to a bigger one, he studies how well his rule works in practice, and apparently it does okay.

In his paper, I. J. Good thanks Alan Turing for coming up with the basic idea. In fact he says Turing gave an ‘intuitive demonstration’ of it—but he doesn’t give this intuitive demonstration, and according to Lou Jost he actually admits somewhere that he forgot it.

By the way, Lou Jost is not only an expert on biodiversity and its relation to entropy! He lives in the jungles of Ecuador and has discovered over 60 new species of orchids, including the world’s smallest:

He found it in Ecuador, and the petals are just a few cells thick! (Typically, the news reports say he found it in Bolivia and the petals are just one cell thick.)

He said:

I found it among the roots of another plant that I had collected, another small orchid which I took back to grow in my greenhouse to get it to flower. A few months later I saw that down among the roots was a tiny little plant that I realised was more interesting than the bigger orchid. Looking at the flower is often the best way to be able to identify which species of orchid you’ve got hold of – and can tell you whether you’re looking at an unknown species or not.

### 24 Responses to The Mathematics of Biodiversity (Part 1)

1. Tim van Beek says:

Alan Turing, some years before he died, wrote down rules for a computer chess program on paper. He played one game of chess against a human player, where he – Turing – acted as human CPU calculating the next move according to his program.

2. Fascinating tidbits…and I don’t know how you do all the outreach, travel, blog, research, etc, John! But I am very glad that somehow you do.

• John Baez says:

Thanks! I really can’t help doing it: it’s so fun. Of course when I return to Riverside this fall and start teaching I’ll do less, but (judging from past experience) still a fair amount. What always puzzles me is why more people don’t talk more online about their work.

3. memming says:

Nice article. For entropy estimation, there’s a recent paper:

• Zhiyi Zhang, Entropy estimation in Turing’s perspective, Neural Computation, Vol. 24, No. 5. (1 February 2012), pp. 1368-1389, doi:10.1162/NECO_a_00266.

But it does not cite Jost.

BTW, the link on “biodiversity and its relation to entropy” paper doesn’t work.

• John Baez says:

Thanks for the reference! We are in fact discussing the problem of estimating entropy from finite samples—not just Shannon entropy, but the Rényi entropies for all values of $q$, which give rise to different measures of entropy, called Hill numbers.

I’ve fixed the broken link, which was to this paper:

• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.

The Hill numbers are defined and discussed here. When $q= 1$, the Hill number is just the total number of species. This is the hardest one to estimate, which is why our conversation drifted toward Good’s estimate of the chance that the next species you see will be a new one!

4. John Baez says:

After I blogged about this, the whole team here went to lunch. I suddenly realized I could use I. J. Good’s formula to estimate the probability that we’d go out to a new restaurant!

We’d been to 3 restaurants so far, and a new one each time. So, according to Good’s formula, the probability we’d go out to a new restaurant would be about

$\displaystyle{ \frac{n_1}{N} = \frac{3}{3} = 1}$

Wow! A 100% chance that we’d go to a new restaurant!

This seemed like an extreme case where Good’s rule must break down… especially because we didn’t actually know a new restaurant. We all laughed about it and went out to lunch…

… and then we found a new restaurant, and went there!

We’ll see what happens next time.

• Just perfect!

• Colin says:

It was probably a very biased decision after doing the calculation I suppose? Glad to see you chose the right answer.

• John Baez says:

Actually I don’t think that decision was biased: the people who chose that new restaurant didn’t seem to be paying attention to my joke. But of course someday we’ll go to the same restaurant again—we’re staying here for several weeks and there aren’t many choices. So then my winning streak will break.

5. John Baez says:

Over on Google+, Fernando Pereira pointed me to this paper:

• David McAllester and Robert E. Schapire, On the convergence rate of Good-Turing estimators, February 17, 2000.

This quote does a better job than the abstract at conveying the idea:

Perhaps the most significant Good-Turing estimator is the estimate of the missing mass. We assume that there is some unknown underlying distributions on a some unknown set of objects, e.g., an unknown frequency for each word in English. We assume that a sample is constructing by drawing objects independently according to this unknown distribution. If the number of objects with nonzero probability is infinite then for any finite sample there will be objects of nonzero probability that do not occur in the sample. It is well known that in any sample of English text there will be English words not occurring in the sample. The missing mass of a sample is the total probability mass of the objects not occurring in the sample. Good published a proof that the fraction of the sample consisting of objects that occur only once in the sample is a nearly unbiased estimate of the missing mass independent of the underlying distribution. However, it appears that the convergence rate of this estimator has never been formally analyzed. Let $G_0$ be the fraction of the sample consisting of words that occur only once in the sample and let $M_0$ be the actual missing mass, i.e., the total probability mass of the items not occurring in the sample. Here we prove that with probability at least $1 - \delta$ over the choice of the sample we have that $M_0$ is no larger than

$\displaystyle{ G_0 + O\left(\sqrt{\frac{\ln(1/\delta)}{m}}\right) }$

where $m$ is the size of the sample. This is true independent of the underlying distribution. We also give a somewhat weaker PAC lower bound on $M_0$ and PAC bounds on the true total probability mass of the set of words occurring $k$ times in the sample.

I have no idea what a ‘PAC’ lower bound is. I hate it when people don’t define acronyms the first time they use them: the authors saves a tiny bit of work and makes many readers do more work.

But the paper looks interesting!

• Matías says:

PAC may stand for ‘probably approximately correct’ that means something is correct with high probability (ie probably), within some prespecified tolerance (ie approximately). I’m not sure what is it that should be PAC, maybe the confidence level of the bound being right or the bound itself (probably the later).

Luckily I couldn´t follow the link to I. J. Good’s paper (for it sent me on a search for it and there are many other interesting papers of his, besides what seems an inclination to publicate ‘corrections’ of previous papers).

Your joke about the restaurant-choosing got me thinking: imagine taking a sample of a hundred elements everyone of which turns out to be of a unique species, the guess of 1 for the probablity of finding a new specie next sample doesn’t appear to be so unreliable anymore. My guess (intuition and ignorance mixed together) is that the estimator is asymptotically unbiased and an n=3 is to small a sample for its bias to get small.

As usual, great post. And a great thing also this seminar, just hope I can participate in things like this someday.

• John Baez says:

I fixed the link to I. J. Good’s paper. Firefox has stopped including the “http://” in its display of URLs. This is why two links in this blog entry didn’t work. But it’s weird, because until very recently, Firefox included the “http://” when I cut-and-pasted URLs from that display, despite not showing the “http://”.

My guess (intuition and ignorance mixed together) is that the estimator is asymptotically unbiased and an n=3 is to small a sample for its bias to get small.

That sounds right. Theorem 1 in the paper by Macallister and Schapire seems to say this estimator is asymptotically unbiased. (My main uncertainty here is that I’m just guessing the official definition of ‘asymptotically unbiased
estimator’. The theorem certainly gives a sense in which the estimator gets less biased as the sample size increases.)

By the way, after having gone to different restaurants for 4 days, I skipped lunch on the 5th, but the rest of the gang went to yet another new restaurant.

6. Boris Borcic says:

An almost trivial remark: Good’s estimate is what you immediately obtain if you time-reverse your sampling procedure, e.g., if you ask for the probability that there is a change in the number of species in your sample when you randomly remove a specimen from it.

Can’t this be turned into a generic advice: to hone one’s expectations for what an expanded horizon can reveal, study what its contraction can hide?

• John Baez says:

Boris wrote:

Good’s estimate is what you immediately obtain if you time-reverse your sampling procedure, e.g., if you ask for the probability that there is a change in the number of species in your sample when you randomly remove a specimen from it.

Excellent! This ‘almost trivial remark’ might be Alan Turing’s original ‘intuitive demonstration’, which I. J. Good forgot. We can’t tell if this original demonstration was supposed to be a proof or just a heuristic argument.

Can’t this be turned into a generic advice: to hone one’s expectations for what an expanded horizon can reveal, study what its contraction can hide?

Sounds good. Or, to make it sound almost tautologous: to understand the future, you need to think about the past.

• Boris Borcic says:

To understand the future, you need to think about the past, sure, but I meant something more like using imagination to reverse the gears of learning (twice in a row over a short sequence, if need be).

Meanwhile, I see 20/20 hindsight - a phenomenon acknowledged enough to get a nickname - as proof that people don’t generally understand the contemplation of the past to involve similar gear-reversals.

• Graham says:

I like the time-reversed idea, but I think the most natural way of getting an estimate from the reversed process is like this. Having observed N orchids and got your list of numbers summing to N, you find all the lists summing to N+1 that can produce the observation by removing one orchid, and weight these lists by the number of ways they produce the observation. Example:

15, 10, 8, 6, 1, 1, 1 (15)
14, 11, 8, 6, 1, 1, 1 (11)
14, 10, 9, 6, 1, 1, 1 (9)
14, 10, 8, 7, 1, 1, 1 (7)
14, 10, 8, 6, 2, 1, 1 (2)
14, 10, 8, 6, 1, 1, 1, 1 (4)

So you estimate P(43rd orchid is a new species) = 4/48 = 1/12. This is not What Good does, but is not far off, and seems reasonable. If you have observed N different species, the estimate is (N+1)/(N+3). This gives 2/3 in the restaurant example, which I think is better than 1.

• Boris Borcic says:

I find something a bit unclear about your counting procedure. It would seem from the line ending in (2), that you treat repeated counts in such a way that the estimate becomes quite sensitive to a circumstance that shouldn’t impact IMO – that of two large counts happening to be equal. Also, it would help if you’d named number of species and number of specimens in a more distinctive way. My first perception is that the greater simplicity of my version of “reversing” makes it easier to bring assumptions into focus.

• Graham says:

Yes, two large counts happening to be equal is a problem with my scheme.

• davidtweed says:

Actually there topic of “reversibility arguments” brings up something I’ve been meaning to mention. A month or so ago the BBC radio series “More or less” had a piece interviewing some scientist (name forgotten) who was claiming most biodiversity loss estimates were inaccurate overestimates of loss. Apparently, if I understand it correctly, there’s two steps typically used:

1. Select a series of nested pieces of land (of some type) increasing size and “count” the number of species in each. (The estimation in the count doesn’t really matter.) From this construct a curve of how the number of species varies with area.

2. Take an existing area with a “counted” number of species and a projected area reduction and use the curve to produce an estimate of the new number of species.

The argument was that an implied reversibility symmetry isn’t there: species are more “robust to loss in the presence of declining area” than they are “prone to speciation in the presence of increasing area”, so simple reversing doesn’t work. The scientist wasn’t claiming that biodiversity loss wasn’t happening, just that many numerical predictions were dramatic overestimates.

(Alert readers will have spotted that this isn’t an exact reversal; ideally step 1 would actively expand the area and watch speciation happen rather than just be a “measurement artifact”. The radio piece, which I was half listening to while stripping wallpaper, did stress that a non-applicable reversibility was being assumed in standard models.)

• davidtweed says:

Turns out this item was actually written up on the BBC website here.

• Boris Borcic says:

The meat of the argument seems to be a variation on the observation that to establish the presence of a species in a patch of territory you only need to explore the patch until the first specimen while to establish its absence you need to explore the whole patch. As presented at BBC or in the abstract I don’t find it convincingly relevant, but the details of the Nature paper are available for 30\$.

Species–area relationships always overestimate extinction rates from habitat loss

Fangliang He & Stephen P. Hubbell
Nature 473, 368–371 (19 May 2011) doi:10.1038/nature09985

7. romain says:

Just a comment about the asymptotical behavior when the number of collected individuals N becomes infinite.

Assume there is a finite number of possible species and note the probability of finding an individual of the species i $p_i$. Then there exists a number $N^*$ such that

$N^* \mathrm{min}(p_i)>1.$

For all numbers of collected individuals larger than $N^*$, the probability that a newly collected individual belongs to a new species is 0. So (assuming a finite number of species) Good’s formula works well asymptotically.

However, if the number of species is infinite, this “proof” does not hold.

8. [...] Last time I mentioned the Good–Turing estimate for how likely it is that the next thing we encounter is one of a brand new kind. The discussion that blog entry triggered has been very helpful! Among other things, it got Lou Jost more interested in this subject. Two days ago, he showed me the following simple argument for the Good–Turing estimate. [...]

9. [...] They computed the first-order bias of the information and then used a Bayesian technique to estimate the number of responses not included in the sample but that would be in an infinite sample (a goal similar to that of Good’s rule of thumb).