## Maximum Entropy and Ecology

I already talked about John Harte’s book on how to stop global warming. Since I’m trying to apply information theory and thermodynamics to ecology, I was also interested in this book of his:

John Harte, Maximum Entropy and Ecology, Oxford U. Press, Oxford, 2011.

There’s a lot in this book, and I haven’t absorbed it all, but let me try to briefly summarize his maximum entropy theory of ecology. This aims to be “a comprehensive, parsimonious, and testable theory of the distribution, abundance, and energetics of species across spatial scales”. One great thing is that he makes quantitative predictions using this theory and compares them to a lot of real-world data. But let me just tell you about the theory.

It’s heavily based on the principle of maximum entropy (MaxEnt for short), and there are two parts:

Two MaxEnt calculations are at the core of the theory: the first yields all the metrics that describe abundance and energy distributions, and the second describes the spatial scaling properties of species’ distributions.

### Abundance and energy distributions

The first part of Harte’s theory is all about a conditional probability distribution

$R(n,\epsilon | S_0, N_0, E_0)$

which he calls the ecosystem structure function. Here:

$S_0$: the total number of species under consideration in some area.

$N_0$: the total number of individuals under consideration in that area.

$E_0$: the total rate of metabolic energy consumption of all these individuals.

Given this,

$R(n,\epsilon | S_0, N_0, E_0) \, d \epsilon$

is the probability that given $S_0, N_0, E_0,$ if a species is picked from the collection of species, then it has $n$ individuals, and if an individual is picked at random from that species, then its rate of metabolic energy consumption is in the interval $(\epsilon, \epsilon + d \epsilon).$

Here of course $d \epsilon$ is ‘infinitesimal’, meaning that we take a limit where it goes to zero to make this idea precise (if we’re doing analytical work) or take it to be very small (if we’re estimating $R$ from data).

I believe that when we ‘pick a species’ we’re treating them all as equally probable, not weighting them according to their number of individuals.

Clearly $R$ obeys some constraints. First, since it’s a probability distribution, it obeys the normalization condition:

$\displaystyle{ \sum_n \int d \epsilon \; R(n,\epsilon | S_0, N_0, E_0) = 1 }$

Second, since the average number of individuals per species is $N_0/S_0,$ we have:

$\displaystyle{ \sum_n \int d \epsilon \; n R(n,\epsilon | S_0, N_0, E_0) = N_0 / S_0 }$

Third, since the average over species of the total rate of metabolic energy consumption of individuals within the species is $E_0/ S_0,$ we have:

$\displaystyle{ \sum_n \int d \epsilon \; n \epsilon R(n,\epsilon | S_0, N_0, E_0) = E_0 / S_0 }$

Harte’s theory is that $R$ maximizes entropy subject to these three constraints. Here entropy is defined by

$\displaystyle{ - \sum_n \int d \epsilon \; R(n,\epsilon | S_0, N_0, E_0) \ln(R(n,\epsilon | S_0, N_0, E_0)) }$

Harte uses this theory to calculate $R,$ and tests the results against data from about 20 ecosystems. For example, he predicts the abundance of species as a function of their rank, with rank 1 being the most abundant, rank 2 being the second most abundant, and so on. And he gets results like this:

The data here are from:

• Green, Harte, and Ostling’s work on a serpentine grassland,

• Luquillo’s work on a 10.24-hectare tropical forest, and

• Cocoli’s work on a 2-hectare wet tropical forest.

The fit looks good to me… but I should emphasize that I haven’t had time to study these matters in detail. For more, you can read this paper, at least if your institution subscribes to this journal:

• J. Harte, T. Zillio, E. Conlisk and A. Smith, Maximum entropy and the state-variable approach to macroecology, Ecology 89 (2008), 2700–2711.

### Spatial abundance distribution

The second part of Harte’s theory is all about a conditional probability distribution

$\Pi(n | A, n_0, A_0)$

This is the probability that $n$ individuals of a species are found in a region of area $A$ given that it has $n_0$ individuals in a larger region of area $A_0.$

$\Pi$ obeys two constraints. First, since it’s a probability distribution, it obeys the normalization condition:

$\displaystyle{ \sum_n \Pi(n | A, n_0, A_0) = 1 }$

Second, since the mean value of $n$ across regions of area $A$ equals $n_0 A/A_0,$ we have

$\displaystyle{ \sum_n n \Pi(n | A, n_0, A_0) = n_0 A/A_0 }$

Harte’s theory is that $\Pi$ maximizes entropy subject to these two constraints. Here entropy is defined by

$\displaystyle{- \sum_n \Pi(n | A, n_0, A_0)\ln(\Pi(n | A, n_0, A_0)) }$

Harte explains two approaches to use this idea to derive ‘scaling laws’ for how $n$ varies with $n$. And again, he compares his predictions to real-world data, and get results that look good to my (amateur, hasty) eye!

I hope sometime I can dig deeper into this subject. Do you have any ideas, or knowledge about this stuff?

### 26 Responses to Maximum Entropy and Ecology

1. arch1 says:

0) Is there a simple explanation as to why the MaxEnt distributions are all concave upward at the high-abundance end?

I’m a little confused about the spatial abundance distribution:

1) It looks as though the definition could be reworded “..a random region of area $A$ …larger containing region of area $A_0$.” Right? If so…

2) Can the containing region be disconnected? If not, can the contained region be disconnected, e.g. could a model for its choice be: Divide the containing region into a fine grid of n equal cells, then randomly pick $n (A/A_0)$ of them?

• John Baez says:

arch1 wrote:

0) Is there a simple explanation as to why the MaxEnt distributions are all concave upward at the high-abundance end?

Here I guess you’re talking about the so-called ‘rank-abundance’ curves I showed in my blog article:

These plot the logarithm of the abundance of the nth most abundant species as a function of n. They’re indeed concave upward at the high-abundance end. Why?

Like I said, I don’t really understand this stuff. But I’ll try to make a little progress….

Harte and his coauthors compute their predictions of these curves starting from $\Phi(n)$, which is the fraction of species that have abundance $n$.

In terms of the function $R$ I described in my article, $\Phi(n)$ is given by

$\displaystyle{ \Phi(n) = \int d \epsilon \; R(n,\epsilon | S_0, N_0, E_0) }$

They claim that using their MaxEnt hypothesis on $R,$ they can show

$\displaystyle{ \Phi(n) = \frac{1}{\lambda} \frac{e^{-\lambda n}}{n} }$

for some number $\lambda.$ I haven’t checked this.

Then they write:

A geometric distribution, $\Phi(n) = c/n,$ results in a rank-abundance graph that is a straight line when plotted as log(abundance) vs. rank, with a negative slope if low rank corresponds to high abundance; the exponential term in the log series distribution bends that straight line upward at low rank.

The “concave upward at the high-abundance end” effect you’re wondering about is what they call “bending the straight line upwards at low rank”. They’re saying that it comes from deviations from the simple law

$\displaystyle{ \Phi(n) = \frac{c}{n} }$

So, they’re saying the exponential term here:

$\displaystyle{ \Phi(n) \approx \frac{1}{\lambda} \frac{e^{-\lambda n}}{n} }$

is to blame for the effect you’re wondering about.

One could check these statements using some math, but this is all I have the energy for at this moment!

• John Baez says:

arch1 wrote:

2) Can the containing region be disconnected? If not, can the contained region be disconnected [...?]

I don’t, but John Harte calls these regions ‘cells’, which suggests the answer to both question is no. Or at least, I don’t think he’s giving us permission to use disconnected regions. I imagine these ecologists are using some sort of grid with square or rectangular ‘cells’ divided into smaller cells.

• It is correct that Harte is not allowing for disconnected regions in this context. Disconnected cells would affect the implicit spatial autocorrelation that is present in the solution. I asked him about this because it would of course make testing easier if we could use disconnected data, but sadly no.

• John Baez says:

Thanks for clearing this up! That’s very helpful!

2. Richard Brown says:

I don’t think that the paper is freely available – I know I had to use an institutional login to get it.

I am likewise not particularly well versed in this area, although I have read a reasonable chunk of Jaynes’ book (http://www.amazon.com/Probability-Theory-Science-T-Jaynes/dp/0521592712) — as much for the entertaining and inflammatory style as anything else!

The principle is very compelling. I like the notion that you could understand the relative importance (or perhaps dependency) of different underlying constraints for a problem by testing different combinations of them, and seeing which combinations give rise to a MaxEnt distribution that fits well with observed data. Of course taking this kind of approach may well be abusing the whole underlying philosophy…

Some results I find counterintuitive though. I’m trying to wrap my head around the fact that, given a single constraint of a mean abundance of a species $\lambda = n_0 A / A_0$ in a cell of size $A$, that the “least informative” distribution of species abundance $p^{(n_0)}_A(n)$ is not in fact $\text{Poiss}(n; \lambda)$ (corresponding to random placement in $A_0$), but rather some other distribution corresponding to a more regular arrangement.

• John Baez says:

Richard wrote:

I don’t think that the paper is freely available – I know I had to use an institutional login to get it.

Whoops, I guess I was using it without knowing it. Too bad! John Harte doesn’t seem to make his technical work freely available… I thought I’d lucked out. I’ll fix my post. And I’ll talk about your other, more interesting point after I’ve had some time to think about it!

• Richard Brown says:

In fact, I think there’s a bad assumption here. If you are assuming that for a cell of size $A$, there will be a mean density $n_0 A / A_0$, and that this property is true for all cell sizes $A$, independent of their location, then you have defined a Poisson process, and the distribution must necessarily be Poisson.

You’re not free to use MaxEnt and scale your mean with your cell size.

• Richard Brown says:

Now that I think about it I was a bit hasty with this comment. The assumption of mean depending on cell size alone is of course not enough to imply that we’re dealing with a homogeneous Poisson process. Apologies for monopolising the comments thread!

• John Baez says:

I don’t mind you posting a lot of comments as long as you answer your own questions! In fact it’s great, because I haven’t thought hard about this stuff and need all the help I can get.

• Richard Brown says:

Right, I’ve had a weekend (in NZ) and thought about this a bit more and have managed to slightly more precisely figure out what is bothering me.

Consider a 1D interval of length 1 with a known mean number of points $\mu$. We can then pose a MaxEnt problem in two ways:
1. Consider the distribution of distances $f(x)$ between points, subject to the known mean distance between points $1/\mu$. Then the maximum entropy distribution subject to the mean and probability distribution constraints is the exponential $f(x) = 1/\mu \exp -x/\mu$. This gives rise to a Poisson-distributed number of points in the interval with mean $\mu$
2. Consider the discrete distribution of the number of points in the interval. The maximum entropy discrete distribution, as described in the paper, is a discrete exponential distribution
$p(n) = A \exp -\alpha n$, with $\alpha = -\log \left(\mu / (1 + \mu)\right)$ (with $A$ a normalisation constant)

Going into 2D the problem is more complicated, because it’s less clear what the interpretation of 1. is. However, my point is that these to me seem to be two valid ways of posing the problem, which give quite different results. To me, 1. feels like the more fundamental way of stating it, and it gives rise to the standard Poisson distribution that we know corresponds to random scattering of points.

The question that interests me is, how often is there a seemingly arbitrary choice in the way that the problem is posed that affects the maximum entropy solution?

• Yes the Poisson is one of the natural distributions that emerges from MaxEnt in this context and it is (as in most modeling approaches) fairly common for the details of the way in which you setup up the problem to influence the outcome. However, the changes that can be seen from what appear to be small differences in setup in MaxEnt can sometimes be substantial. For a good description of this in the context of the spatial predictions of Harte et al.’s theory I recommend Haegeman and Etienne 2010

3. Arrow says:

How do those graphs compare to appropriate random distribution (Poisson?) of given number of individuals among given number of species?

• John Baez says:

Of course the MaxEnt philosophy John Harte is using says that everything is as boringly random as possible subject to the constraints of what we know.

So, his predictions should completely match the “appropriate random distribution” for a given number of individuals among a given number of species…. at least if we define “appropriate” in an appropriate way.

In other words, his predictions should only fail if some interesting effect is happening.

Come to think of it, this philosophy should suit you quite nicely.

Anyway, his theory says the probability of a species having $n$ individuals is proportional to

$\displaystyle{ \frac{e^{-\lambda n}}{n} }$

for some constant $\lambda.$

• Richard Brown says:

If you have access to the paper, the curves marked “Random” in Fig. 2 are the appropriate Poisson distribution (I double checked by generating the plots to make sure)

4. Jim Cliborn says:

For sheer fun of the best kind let me direct your attention to Harte’s two earlier books: Consider a Spherical Cow (1988) and Consider a Cylindrical Cow (2001). The man loves his subject!

5. davidtweed says:

I hate to come over all grinchy, but does the book describe precisely how the predictions/tests were done? Unfortunately it’s quite common for “big simple principle” works to do mostly explanation (here’s some data I looked at a lot, now I know what parameters to use to fit my model to it really well) and retrodiction (here’s a dataset, I haven’t looked at it in detail but by I know metadata that suggests it should be compatible with my theory; i do that and indeed it does). If actual prediction (here’s something I think ought to be true that hasn’t been measured before; after I said this someone collected the data and it does indeed match) is being done then I’m much more interested in investing time learning about this theory.

• davidtweed says:

For some reason the html didn’t come out in that comment

• John Baez says:

David wrote:

I hate to come over all grinchy, but does the book describe precisely how the predictions/tests were done?

I don’t have it on me anymore, but it seems fairly precise, and so does the paper I mentioned.

I don’t think it explains how, for example, Luquillo combed through a 10.24-hectare patch tropical forest and kept track of every specimen of every species he found! This info would be in the original paper.

There are standard methods for doing such things, which ecologists use. And there are standard problems: for example, the rarest species can easily escape detection, and so can nocturnal species, species that are underground, etcetera.

Unfortunately it’s quite common for “big simple principle” works to do mostly explanation (here’s some data I looked at a lot, now I know what parameters to use to fit my model to it really well) [...]

Here I think you’re being a bit grinchy. The beauty of the theory I explained here is that it contains no adjustable parameters except those that are actually measured.

$R(n,\epsilon | S_0, N_0, E_0) \, d \epsilon$

This is the probability that given

$S_0$: the total number of species under consideration in some area.

$N_0$: the total number of individuals under consideration in that area.

$E_0$: the total rate of metabolic energy consumption of all these individuals.

and if a species is picked from the collection of species, then it has $n$ individuals, and if an individual is picked at random from that species, then its rate of metabolic energy consumption is in the interval $(\epsilon, \epsilon + d \epsilon).$

The theory I explained lets you calculate this function $R$ given $S_0, N_0$ and $E_0$ with no further adjustable parameters. It’s just the probability distribution that maximizes entropy subject to these three constraints.

And, it seems to work pretty well on the data John Harte looks at.

Of course, not being an expert, I can’t rule out the possibility that he’s cherry-picked examples where his formula works! And I don’t think people have yet gone out after he wrote his book, counted species in new places, and checked to see how well this new data fits his formula. But they should, and I bet they will.

• Arrow says:

There is also the issue of microorganisms, both procaryotic (bacteria, archea) and eucaryotic (protists, fungi, etc). There are probably many thousands if not millions of bacterial species alone in the 10.24-hectare patch of tropical forest (plus the whole concept of a species is pretty ill-defined in this case.)

• John Baez says:

Good point. Nonetheless the whole theory still makes sense if we explicitly say we’re restricting attention to, say, vertebrates, or any other chosen collection of organisms that we have the ability to catalogue. And then we can test it.

• davetweed says:

Yeah, I wasn’t really expressing any doubt about the physical collection, just about possible “unconscious preknowledge bias”. I’m just a bit hard on this point where people try to do something using pure statistics on existing data only. One of the pluses, and also soul-crushing minuses, of doing some of my work on computer performance is that “new” experiments are really quite easy to do. It’s amazing how often I can persuade myself that some gathered data is really, amazingly well-explained by some simple model, code up a new experiment that I’m sure I know will prove my point and then find out that at best reality is more complex than my model and at worst my model is wrong.

My experience is that it’s much more reliable if when you’re restricted to existing data your model is motivated by some “physical” reasoning, or if it’s purely statistical you’re validating by looking at never-seen data.

It’d be interesting if Harte’s work is well supported; I’ll have to try to dig out more work and read up on it.

6. [...] 2013/02/21: J.C. Baez: Maximum Entropy and Ecology [...]

7. I would apply the stochastic petri nets to relative abundance distribution. Just vary the rates of growth across related species with a MaxEnt uncertainty, and most of the RAD characteristics will pop out as a result of the dispersion.

8. linasv says:

FYI, There was some recent hubbub about a PRL from Wissner-Gross He suggests maximizing the “causal entropy”, defined as entropy over a set of paths. I’ve got this vague impression that perhaps this might resolve some of the contradictory issues with MEP principles: any one single path is minimizing (e.g. minimizing the action) but the ensemble of all paths maximize the (causal) entropy. In his words, systems move to a state that maximizes the causal access to the largest number of possible future states.

http://www.alexwg.org/publications/PhysRevLett_110-168702.pdf

and it splattered on the popular press:

• John Baez says:

Thanks, I’ll look at that, and add it to this database:

Extremal principles in non-equilibrium thermodynamics, Azimuth Wiki.

Over there you’ll see a smart fellow nicknamed Tomate writes:

Funnily enough, there exists a “minimum entropy production principle” and a “maximum entropy production principle”. The apparent clash is due to the fact that while minimum entropy production is an ensemble property, that is, it holds on a macroscopic scale, the maximum entropy production principle is believed to hold for single trajectories, single “histories”. I think the first is well-established, indeed a classical result due to Prigogine, while the second is still speculative and sloppy; it is believed to have important ecological applications. Similarly, a similar confusion arises when one defines entropy as an ensemble property (Gibbs entropy) or else as a microstate property (Boltzmann entropy)

and also:

Let me explain my own take on the minEP vs. maxEP problem and on similar problems (such as Boltzmann vs. Gibbs entropy increase). It might help sorting out ideas.

By “state” we mean very different things in NESM, among which: 1) the (micro)state which a single history of a system occupies at given times 2) the trajectory itself 3) the density of microstates which an ensemble of a large number of trajectories occupies at a given time (a macrostate). One can define entropy production at all levels of discussion (for the mathematically-inclined, markovian master equation systems offer the best set up where all is nice and defined). So, for example, the famous “fluctuation theorem” is a statement about microscopic entropy production along a trajectory, while the Onsager’s reciprocity relations are a statement about macroscopic entropy production. By “steady state”, we mean a stationary macrostate.

The minEP principle asserts that the distribution of macroscopic currents at a nonequilibrium steady state minimizes entropy production consistently with the constraints which prevent the system from reaching equilibrium.

As I understand it, maxEP is instead a property of single trajectories: most probable trajectories are those which have a maximum entropy production rate, consistently with constraints.

As a climate scientist, you should be interested in the second as we have not an ensemble of planets among which to maximize entropy or minimize entropy production. We have one single realization of the process, and we’d better make good use of it.

I have not yet gotten around to really understanding these remarks. I really want to understand them.