I already talked about John Harte’s book on how to stop global warming. Since I’m trying to apply information theory and thermodynamics to ecology, I was also interested in this book of his:
• John Harte, Maximum Entropy and Ecology, Oxford U. Press, Oxford, 2011.
There’s a lot in this book, and I haven’t absorbed it all, but let me try to briefly summarize his maximum entropy theory of ecology. This aims to be “a comprehensive, parsimonious, and testable theory of the distribution, abundance, and energetics of species across spatial scales”. One great thing is that he makes quantitative predictions using this theory and compares them to a lot of real-world data. But let me just tell you about the theory.
It’s heavily based on the principle of maximum entropy (MaxEnt for short), and there are two parts:
Two MaxEnt calculations are at the core of the theory: the first yields all the metrics that describe abundance and energy distributions, and the second describes the spatial scaling properties of species’ distributions.
Abundance and energy distributions
The first part of Harte’s theory is all about a conditional probability distribution
which he calls the ecosystem structure function. Here:
• : the total number of species under consideration in some area.
• : the total number of individuals under consideration in that area.
• : the total rate of metabolic energy consumption of all these individuals.
Given this,
is the probability that given if a species is picked from the collection of species, then it has
individuals, and if an individual is picked at random from that species, then its rate of metabolic energy consumption is in the interval
Here of course is ‘infinitesimal’, meaning that we take a limit where it goes to zero to make this idea precise (if we’re doing analytical work) or take it to be very small (if we’re estimating
from data).
I believe that when we ‘pick a species’ we’re treating them all as equally probable, not weighting them according to their number of individuals.
Clearly obeys some constraints. First, since it’s a probability distribution, it obeys the normalization condition:
Second, since the average number of individuals per species is we have:
Third, since the average over species of the total rate of metabolic energy consumption of individuals within the species is we have:
Harte’s theory is that maximizes entropy subject to these three constraints. Here entropy is defined by
Harte uses this theory to calculate and tests the results against data from about 20 ecosystems. For example, he predicts the abundance of species as a function of their rank, with rank 1 being the most abundant, rank 2 being the second most abundant, and so on. And he gets results like this:
The data here are from:
• Green, Harte, and Ostling’s work on a serpentine grassland,
• Luquillo’s work on a 10.24-hectare tropical forest, and
• Cocoli’s work on a 2-hectare wet tropical forest.
The fit looks good to me… but I should emphasize that I haven’t had time to study these matters in detail. For more, you can read this paper, at least if your institution subscribes to this journal:
• J. Harte, T. Zillio, E. Conlisk and A. Smith, Maximum entropy and the state-variable approach to macroecology, Ecology 89 (2008), 2700–2711.
Spatial abundance distribution
The second part of Harte’s theory is all about a conditional probability distribution
This is the probability that individuals of a species are found in a region of area
given that it has
individuals in a larger region of area
obeys two constraints. First, since it’s a probability distribution, it obeys the normalization condition:
Second, since the mean value of across regions of area
equals
we have
Harte’s theory is that maximizes entropy subject to these two constraints. Here entropy is defined by
Harte explains two approaches to use this idea to derive ‘scaling laws’ for how varies with
. And again, he compares his predictions to real-world data, and get results that look good to my (amateur, hasty) eye!
I hope sometime I can dig deeper into this subject. Do you have any ideas, or knowledge about this stuff?
0) Is there a simple explanation as to why the MaxEnt distributions are all concave upward at the high-abundance end?
I’m a little confused about the spatial abundance distribution:
1) It looks as though the definition could be reworded “..a random region of area
…larger containing region of area
.” Right? If so…
2) Can the containing region be disconnected? If not, can the contained region be disconnected, e.g. could a model for its choice be: Divide the containing region into a fine grid of n equal cells, then randomly pick
of them?
arch1 wrote:
Here I guess you’re talking about the so-called ‘rank-abundance’ curves I showed in my blog article:
These plot the logarithm of the abundance of the nth most abundant species as a function of n. They’re indeed concave upward at the high-abundance end. Why?
Like I said, I don’t really understand this stuff. But I’ll try to make a little progress….
Harte and his coauthors compute their predictions of these curves starting from
, which is the fraction of species that have abundance
.
In terms of the function
I described in my article,
is given by
They claim that using their MaxEnt hypothesis on
they can show
for some number
I haven’t checked this.
Then they write:
The “concave upward at the high-abundance end” effect you’re wondering about is what they call “bending the straight line upwards at low rank”. They’re saying that it comes from deviations from the simple law
So, they’re saying the exponential term here:
is to blame for the effect you’re wondering about.
One could check these statements using some math, but this is all I have the energy for at this moment!
arch1 wrote:
I don’t, but John Harte calls these regions ‘cells’, which suggests the answer to both question is no. Or at least, I don’t think he’s giving us permission to use disconnected regions. I imagine these ecologists are using some sort of grid with square or rectangular ‘cells’ divided into smaller cells.
It is correct that Harte is not allowing for disconnected regions in this context. Disconnected cells would affect the implicit spatial autocorrelation that is present in the solution. I asked him about this because it would of course make testing easier if we could use disconnected data, but sadly no.
Thanks for clearing this up! That’s very helpful!
I don’t think that the paper is freely available – I know I had to use an institutional login to get it.
I am likewise not particularly well versed in this area, although I have read a reasonable chunk of Jaynes’ book (http://www.amazon.com/Probability-Theory-Science-T-Jaynes/dp/0521592712) — as much for the entertaining and inflammatory style as anything else!
The principle is very compelling. I like the notion that you could understand the relative importance (or perhaps dependency) of different underlying constraints for a problem by testing different combinations of them, and seeing which combinations give rise to a MaxEnt distribution that fits well with observed data. Of course taking this kind of approach may well be abusing the whole underlying philosophy…
Some results I find counterintuitive though. I’m trying to wrap my head around the fact that, given a single constraint of a mean abundance of a species
in a cell of size
, that the “least informative” distribution of species abundance
is not in fact
(corresponding to random placement in
), but rather some other distribution corresponding to a more regular arrangement.
Richard wrote:
Whoops, I guess I was using it without knowing it. Too bad! John Harte doesn’t seem to make his technical work freely available… I thought I’d lucked out. I’ll fix my post. And I’ll talk about your other, more interesting point after I’ve had some time to think about it!
In fact, I think there’s a bad assumption here. If you are assuming that for a cell of size
, there will be a mean density
, and that this property is true for all cell sizes
, independent of their location, then you have defined a Poisson process, and the distribution must necessarily be Poisson.
You’re not free to use MaxEnt and scale your mean with your cell size.
Now that I think about it I was a bit hasty with this comment. The assumption of mean depending on cell size alone is of course not enough to imply that we’re dealing with a homogeneous Poisson process. Apologies for monopolising the comments thread!
I don’t mind you posting a lot of comments as long as you answer your own questions!
In fact it’s great, because I haven’t thought hard about this stuff and need all the help I can get.
Right, I’ve had a weekend (in NZ) and thought about this a bit more and have managed to slightly more precisely figure out what is bothering me.
Consider a 1D interval of length 1 with a known mean number of points
. We can then pose a MaxEnt problem in two ways:
between points, subject to the known mean distance between points
. Then the maximum entropy distribution subject to the mean and probability distribution constraints is the exponential
. This gives rise to a Poisson-distributed number of points in the interval with mean 
, with
(with
a normalisation constant)
1. Consider the distribution of distances
2. Consider the discrete distribution of the number of points in the interval. The maximum entropy discrete distribution, as described in the paper, is a discrete exponential distribution
Going into 2D the problem is more complicated, because it’s less clear what the interpretation of 1. is. However, my point is that these to me seem to be two valid ways of posing the problem, which give quite different results. To me, 1. feels like the more fundamental way of stating it, and it gives rise to the standard Poisson distribution that we know corresponds to random scattering of points.
The question that interests me is, how often is there a seemingly arbitrary choice in the way that the problem is posed that affects the maximum entropy solution?
Yes the Poisson is one of the natural distributions that emerges from MaxEnt in this context and it is (as in most modeling approaches) fairly common for the details of the way in which you setup up the problem to influence the outcome. However, the changes that can be seen from what appear to be small differences in setup in MaxEnt can sometimes be substantial. For a good description of this in the context of the spatial predictions of Harte et al.’s theory I recommend Haegeman and Etienne 2010
How do those graphs compare to appropriate random distribution (Poisson?) of given number of individuals among given number of species?
Of course the MaxEnt philosophy John Harte is using says that everything is as boringly random as possible subject to the constraints of what we know.
So, his predictions should completely match the “appropriate random distribution” for a given number of individuals among a given number of species…. at least if we define “appropriate” in an appropriate way.
In other words, his predictions should only fail if some interesting effect is happening.
Come to think of it, this philosophy should suit you quite nicely.
Anyway, his theory says the probability of a species having
individuals is proportional to
for some constant
If you have access to the paper, the curves marked “Random” in Fig. 2 are the appropriate Poisson distribution (I double checked by generating the plots to make sure)
For sheer fun of the best kind let me direct your attention to Harte’s two earlier books: Consider a Spherical Cow (1988) and Consider a Cylindrical Cow (2001). The man loves his subject!
I hate to come over all grinchy, but does the book describe precisely how the predictions/tests were done? Unfortunately it’s quite common for “big simple principle” works to do mostly explanation (here’s some data I looked at a lot, now I know what parameters to use to fit my model to it really well) and retrodiction (here’s a dataset, I haven’t looked at it in detail but by I know metadata that suggests it should be compatible with my theory; i do that and indeed it does). If actual prediction (here’s something I think ought to be true that hasn’t been measured before; after I said this someone collected the data and it does indeed match) is being done then I’m much more interested in investing time learning about this theory.
For some reason the html didn’t come out in that comment
David wrote:
I don’t have it on me anymore, but it seems fairly precise, and so does the paper I mentioned.
I don’t think it explains how, for example, Luquillo combed through a 10.24-hectare patch tropical forest and kept track of every specimen of every species he found! This info would be in the original paper.
There are standard methods for doing such things, which ecologists use. And there are standard problems: for example, the rarest species can easily escape detection, and so can nocturnal species, species that are underground, etcetera.
Here I think you’re being a bit grinchy. The beauty of the theory I explained here is that it contains no adjustable parameters except those that are actually measured.
For example, I talked about
This is the probability that given
•
: the total number of species under consideration in some area.
•
: the total number of individuals under consideration in that area.
•
: the total rate of metabolic energy consumption of all these individuals.
and if a species is picked from the collection of species, then it has
individuals, and if an individual is picked at random from that species, then its rate of metabolic energy consumption is in the interval
The theory I explained lets you calculate this function
given
and
with no further adjustable parameters. It’s just the probability distribution that maximizes entropy subject to these three constraints.
And, it seems to work pretty well on the data John Harte looks at.
Of course, not being an expert, I can’t rule out the possibility that he’s cherry-picked examples where his formula works! And I don’t think people have yet gone out after he wrote his book, counted species in new places, and checked to see how well this new data fits his formula. But they should, and I bet they will.
There is also the issue of microorganisms, both procaryotic (bacteria, archea) and eucaryotic (protists, fungi, etc). There are probably many thousands if not millions of bacterial species alone in the 10.24-hectare patch of tropical forest (plus the whole concept of a species is pretty ill-defined in this case.)
Good point. Nonetheless the whole theory still makes sense if we explicitly say we’re restricting attention to, say, vertebrates, or any other chosen collection of organisms that we have the ability to catalogue. And then we can test it.
Yeah, I wasn’t really expressing any doubt about the physical collection, just about possible “unconscious preknowledge bias”. I’m just a bit hard on this point where people try to do something using pure statistics on existing data only. One of the pluses, and also soul-crushing minuses, of doing some of my work on computer performance is that “new” experiments are really quite easy to do. It’s amazing how often I can persuade myself that some gathered data is really, amazingly well-explained by some simple model, code up a new experiment that I’m sure I know will prove my point and then find out that at best reality is more complex than my model and at worst my model is wrong.
My experience is that it’s much more reliable if when you’re restricted to existing data your model is motivated by some “physical” reasoning, or if it’s purely statistical you’re validating by looking at never-seen data.
It’d be interesting if Harte’s work is well supported; I’ll have to try to dig out more work and read up on it.
[…] 2013/02/21: J.C. Baez: Maximum Entropy and Ecology […]
I would apply the stochastic petri nets to relative abundance distribution. Just vary the rates of growth across related species with a MaxEnt uncertainty, and most of the RAD characteristics will pop out as a result of the dispersion.
FYI, There was some recent hubbub about a PRL from Wissner-Gross He suggests maximizing the “causal entropy”, defined as entropy over a set of paths. I’ve got this vague impression that perhaps this might resolve some of the contradictory issues with MEP principles: any one single path is minimizing (e.g. minimizing the action) but the ensemble of all paths maximize the (causal) entropy. In his words, systems move to a state that maximizes the causal access to the largest number of possible future states.
and it splattered on the popular press:
http://phys.org/news/2013-04-emergence-complex-behaviors-causal-entropic.html
http://www.physicscentral.com/buzz/blog/index.cfm?postid=3765851541979757837
Thanks, I’ll look at that, and add it to this database:
• Extremal principles in non-equilibrium thermodynamics, Azimuth Wiki.
Over there you’ll see a smart fellow nicknamed Tomate writes:
and also:
I have not yet gotten around to really understanding these remarks. I really want to understand them.
John Harte of U. C. Berkeley spoke about the maximum entropy method as a method of predicting patterns in ecology. Annette Ostling of the University of Michigan spoke about some competing theories, such as the ‘neutral model’ of biodiversity—a theory that sounds much too simple to be right, yet fits the data surprisingly well!
We managed to get a video of Ostling’s talk, but not Harte’s. Luckily, you can see the slides of both. You can also see a summary of Harte’s book Maximum Entropy and Ecology:
• John Baez, Maximum entropy and ecology, Azimuth, 21 February 2013.