John Harte

27 October, 2012

Earlier this week I gave a talk on the Mathematics of Planet Earth at the University of Southern California, and someone there recommended that I look into John Harte’s work on maximum entropy methods in ecology. He works at U.C. Berkeley.

I checked out his website and found that his goals resemble mine: save the planet and understand its ecosystems. He’s a lot further along than I am, since he comes from a long background in ecology while I’ve just recently blundered in from mathematical physics. I can’t really say what I think of his work since I’m just learning about it. But I thought I should point out its existence.

This free book is something a lot of people would find interesting:

• John and Mary Ellen Harte, Cool the Earth, Save the Economy: Solving the Climate Crisis Is EASY, 2008.

EASY? Well, it’s an acronym. Here’s the basic idea of the US-based plan described in this book:

Any proposed energy policy should include these two components:

Technical/Behavioral: What resources and technologies are to be used to supply energy? On the demand side, what technologies and lifestyle changes are being proposed to consumers?

Incentives/Economic Policy: How are the desired supply and demand options to be encouraged or forced? Here the options include taxes, subsidies, regulations, permits, research and development, and education.

And a successful energy policy should satisfy the AAA criteria:

Availability. The climate crisis will rapidly become costly to society if we do not take action expeditiously. We need to adopt now those technologies that are currently available, provided they meet the following two additional criteria:

Affordability. Because of the central role of energy in our society, its cost to consumers should not increase significantly. In fact, a successful energy policy could ultimately save consumers money.

Acceptability. All energy strategies have environmental, land use, and health and safety implications; these must be acceptable to the public. Moreover, while some interest groups will undoubtedly oppose any particular energy policy, political acceptability at a broad scale is necessary.

Our strategy for preventing climate catastrophe and achieving energy independence includes:

Energy Efficient Technology at home and at the workplace. Huge reductions in home energy use can be achieved with available technologies, including more efficient appliances such as refrigerators, water heaters, and light bulbs. Home retrofits and new home design features such as “smart” window coatings, lighter-colored roofs where there are hot summers, better home insulation, and passive solar designs can also reduce energy use. Together, energy efficiency in home and industry can save the U.S. up to approximately half of the energy currently consumed in those sectors, and at no net cost—just by making different choices. Sounds good, doesn’t it?

Automobile Fuel Efficiency. Phase in higher Corporate Average Fuel Economy (CAFE) standards for automobiles, SUVs and light trucks by requiring vehicles to go 35 miles per gallon of gas (mpg) by 2015, 45 mpg by 2020, and 60 mpg by 2030. This would rapidly wipe out our dependence on foreign oil and cut emissions from the vehicle sector by two-thirds. A combination of plug-in hybrid, lighter car body materials, re-design and other innovations could readily achieve these standards. This sounds good, too!

Solar and Wind Energy. Rooftop photovoltaic panels and solar water heating units should be phased in over the next 20 years, with the goal of solar installation on 75% of U.S. homes and commercial buildings by 2030. (Not all roofs receive sufficient sunlight to make solar panels practical for them.) Large wind farms, solar photovoltaic stations, and solar thermal stations should also be phased in so that by 2030, all U.S. electricity demand will be supplied by existing hydroelectric, existing and possibly some new nuclear, and, most importantly, new solar and wind units. This will require investment in expansion of the grid to bring the new supply to the demand, and in research and development to improve overnight storage systems. Achieving this goal would reduce our dependence on coal to practically zero. More good news!

You are part of the answer. Voting wisely for leaders who promote the first three components is one of the most important individual actions one can make. Other actions help, too. Just as molecules make up mountains, individual actions taken collectively have huge impacts. Improved driving skills, automobile maintenance, reusing and recycling, walking and biking, wearing sweaters in winter and light clothing in summer, installing timers on thermostats and insulating houses, carpooling, paying attention to energy efficiency labels on appliances, and many other simple practices and behaviors hugely influence energy consumption. A major education campaign, both in schools for youngsters and by the media for everyone, should be mounted to promote these consumer practices.

No part of EASY can be left out; all parts are closely integrated. Some parts might create much larger changes—for example, more efficient home appliances and automobiles—but all parts are essential. If, for example, we do not achieve the decrease in electricity demand that can be brought about with the E of EASY, then it is extremely doubtful that we could meet our electricity needs with the S of EASY.

It is equally urgent that once we start implementing the plan, we aggressively export it to other major emitting nations. We can reduce our own emissions all we want, but the planet will continue to warm if we can’t convince other major global emitters to reduce their emissions substantially, too.

What EASY will achieve. If no actions are taken to reduce carbon dioxide emissions, in the year 2030 the U.S. will be emitting about 2.2 billion tons of carbon in the form of carbon dioxide. This will be an increase of 25% from today’s emission rate of about 1.75 billion tons per year of carbon. By following the EASY plan, the U.S. share in a global effort to solve the climate crisis (that is, prevent catastrophic warming) will result in U.S emissions of only about 0.4 billion tons of carbon by 2030, which represents a little less than 25% of 2007 carbon dioxide emissions.128 Stated differently, the plan provides a way to eliminate 1.8 billion tons per year of carbon by that date.

We must act urgently: in the 14 months it took us to write this book, atmospheric CO2 levels rose by several billion tons of carbon, and more climatic consequences have been observed. Let’s assume that we conserve our forests and other natural carbon reservoirs at our current levels, as well as maintain our current nuclear and hydroelectric plants (or replace them with more solar and wind generators). Here’s what implementing EASY will achieve, as illustrated by Figure 3.1 on the next page.

Please check out this book and help me figure out if the numbers add up! I could also use help understanding his research, for example:

• John Harte, Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics, Oxford University Press, Oxford, 2011.

The book is not free but the first chapter is.

This paper looks really interesting too:

• J. Harte, T. Zillio, E. Conlisk and A. B. Smith, Maximum entropy and the state-variable approach to macroecology, Ecology 89 (2008), 2700–-2711.

Again, it’s not freely available—tut tut. Ecologists should follow physicists and make their work free online; if you’re serious about saving the planet you should let everyone know what you’re doing! However, the abstract is visible to all, and of course I can use my academic superpowers to get ahold of the paper for myself:

Abstract: The biodiversity scaling metrics widely studied in macroecology include the species-area relationship (SAR), the scale-dependent species-abundance distribution (SAD), the distribution of masses or metabolic energies of individuals within and across species, the abundance-energy or abundance-mass relationship across species, and the species-level occupancy distributions across space. We propose a theoretical framework for predicting the scaling forms of these and other metrics based on the state-variable concept and an analytical method derived from information theory. In statistical physics, a method of inference based on information entropy results in a complete macro-scale description of classical thermodynamic systems in terms of the state variables volume, temperature, and number of molecules. In analogy, we take the state variables of an ecosystem to be its total area, the total number of species within any specified taxonomic group in that area, the total number of individuals across those species, and the summed metabolic energy rate for all those individuals. In terms solely of ratios of those state variables, and without invoking any specific ecological mechanisms, we show that realistic functional forms for the macroecological metrics listed above are inferred based on information entropy. The Fisher log series SAD emerges naturally from the theory. The SAR is predicted to have negative curvature on a log-log plot, but as the ratio of the number of species to the number of individuals decreases, the SAR becomes better and better approximated by a power law, with the predicted slope z in the range of 0.14-0.20. Using the 3/4 power mass-metabolism scaling relation to relate energy requirements and measured body sizes, the Damuth scaling rule relating mass and abundance is also predicted by the theory. We argue that the predicted forms of the macroecological metrics are in reasonable agreement with the patterns observed from plant census data across habitats and spatial scales. While this is encouraging, given the absence of adjustable fitting parameters in the theory, we further argue that even small discrepancies between data and predictions can help identify ecological mechanisms that influence macroecological patterns.


The Mathematical Origin of Irreversibility

8 October, 2012

guest post by Matteo Smerlak

Introduction

Thermodynamical dissipation and adaptive evolution are two faces of the same Markovian coin!

Consider this. The Second Law of Thermodynamics states that the entropy of an isolated thermodynamic system can never decrease; Landauer’s principle maintains that the erasure of information inevitably causes dissipation; Fisher’s fundamental theorem of natural selection asserts that any fitness difference within a population leads to adaptation in an evolution process governed by natural selection. Diverse as they are, these statements have two common characteristics:

1. they express the irreversibility of certain natural phenomena, and

2. the dynamical processes underlying these phenomena involve an element of randomness.

Doesn’t this suggest to you the following question: Could it be that thermal phenomena, forgetful information processing and adaptive evolution are governed by the same stochastic mechanism?

The answer is—yes! The key to this rather profound connection resides in a universal property of Markov processes discovered recently in the context of non-equilibrium statistical mechanics, and known as the ‘fluctuation theorem’. Typically stated in terms of ‘dissipated work’ or ‘entropy production’, this result can be seen as an extension of the Second Law of Thermodynamics to small systems, where thermal fluctuations cannot be neglected. But it is actually much more than this: it is the mathematical underpinning of irreversibility itself, be it thermodynamical, evolutionary, or else. To make this point clear, let me start by giving a general formulation of the fluctuation theorem that makes no reference to physics concepts such as ‘heat’ or ‘work’.

The mathematical fact

Consider a system randomly jumping between states a, b,\dots with (possibly time-dependent) transition rates \gamma_{a b}(t) where a is the state prior to the jump, while b is the state after the jump. I’ll assume that this dynamics defines a (continuous-time) Markov process, namely that the numbers \gamma_{a b} are the matrix entries of an infinitesimal stochastic matrix, which means that its off-diagonal entries are non-negative and that its columns sum up to zero.

Now, each possible history \omega=(\omega_t)_{0\leq t\leq T} of this process can be characterized by the sequence of occupied states a_{j} and by the times \tau_{j} at which the transitions a_{j-1}\longrightarrow a_{j} occur (0\leq j\leq N):

\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T}).

Define the skewness \sigma_{j}(\tau_{j}) of each of these transitions to be the logarithmic ratio of transition rates:

\displaystyle{\sigma_{j}(\tau_{j}):=\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}

Also define the self-information of the system in state a at time t by:

i_a(t):= -\ln\pi_{a}(t)

where \pi_{a}(t) is the probability that the system is in state a at time t, given some prescribed initial distribution \pi_{a}(0). This quantity is also sometimes called the surprisal, as it measures the ‘surprise’ of finding out that the system is in state a at time t.

Then the following identity—the detailed fluctuation theorem—holds:

\mathrm{Prob}[\Delta i-\Sigma=-A] = e^{-A}\;\mathrm{Prob}[\Delta i-\Sigma=A] \;

where

\displaystyle{\Sigma:=\sum_{j}\sigma_{j}(\tau_{j})}

is the cumulative skewness along a trajectory of the system, and

\Delta i= i_{a_N}(T)-i_{a_0}(0)

is the variation of self-information between the end points of this trajectory.

This identity has an immediate consequence: if \langle\,\cdot\,\rangle denotes the average over all realizations of the process, then we have the integral fluctuation theorem:

\langle e^{-\Delta i+\Sigma}\rangle=1,

which, by the convexity of the exponential and Jensen’s inequality, implies:

\langle \Delta i\rangle=\Delta S\geq\langle\Sigma\rangle.

In short: the mean variation of self-information, aka the variation of Shannon entropy

\displaystyle{ S(t):= \sum_{a}\pi_{a}(t)i_a(t) }

is bounded from below by the mean cumulative skewness of the underlying stochastic trajectory.

This is the fundamental mathematical fact underlying irreversibility. To unravel its physical and biological consequences, it suffices to consider the origin and interpretation of the ‘skewness’ term in different contexts. (By the way, people usually call \Sigma the ‘entropy production’ or ‘dissipation function’—but how tautological is that?)

The physical and biological consequences

Consider first the standard stochastic-thermodynamic scenario where a physical system is kept in contact with a thermal reservoir at inverse temperature \beta and undergoes thermally induced transitions between states a, b,\dots. By virtue of the detailed balance condition:

\displaystyle{ e^{-\beta E_{a}(t)}\gamma_{a b}(t)=e^{-\beta E_{b}(t)}\gamma_{b a}(t),}

the skewness \sigma_{j}(\tau_{j}) of each such transition is \beta times the energy difference between the states a_{j} and a_{j-1}, namely the heat received from the reservoir during the transition. Hence, the mean cumulative skewness \langle \Sigma\rangle is nothing but \beta\langle Q\rangle, with Q the total heat received by the system along the process. It follows from the detailed fluctuation theorem that

\langle e^{-\Delta i+\beta Q}\rangle=1

and therefore

\Delta S\geq\beta\langle Q\rangle

which is of course Clausius’ inequality. In a computational context where the control parameter is the entropy variation itself (such as in a bit-erasure protocol, where \Delta S=-\ln 2), this inequality in turn expresses Landauer’s principle: it impossible to decrease the self-information of the system’s state without dissipating a minimal amount of heat into the environment (in this case -Q \geq k T\ln2, the ‘Landauer bound’). More general situations (several types of reservoirs, Maxwell-demon-like feedback controls) can be treated along the same lines, and the various forms of the Second Law derived from the detailed fluctuation theorem.

Now, many would agree that evolutionary dynamics is a wholly different business from thermodynamics; in particular, notions such as ‘heat’ or ‘temperature’ are clearly irrelevant to Darwinian evolution. However, the stochastic framework of Markov processes is relevant to describe the genetic evolution of a population, and this fact alone has important consequences. As a simple example, consider the time evolution of mutant fixations x_{a} in a population, with a ranging over the possible genotypes. In a ‘symmetric mutation scheme’, which I understand is biological parlance for ‘reversible Markov process’, meaning one that obeys detailed balance, the ratio between the a\mapsto b and b\mapsto a transition rates is completely determined by the fitnesses f_{a} and f_b of a and b, according to

\displaystyle{\frac{\gamma_{a b}}{\gamma_{b a}} =\left(\frac{f_{b}}{f_{a}}\right)^{\nu} }

where \nu is a model-dependent function of the effective population size [Sella2005]. Along a given history of mutant fixations, the cumulated skewness \Sigma is therefore given by minus the fitness flux:

\displaystyle{\Phi=\nu\sum_{j}(\ln f_{a_j}-\ln f_{a_{j-1}}).}

The integral fluctuation theorem then becomes the fitness flux theorem:

\displaystyle{ \langle e^{-\Delta i -\Phi}\rangle=1}

discussed recently by Mustonen and Lässig [Mustonen2010] and implying Fisher’s fundamental theorem of natural selection as a special case. (Incidentally, the ‘fitness flux theorem’ derived in this reference is more general than this; for instance, it does not rely on the ‘symmetric mutation scheme’ assumption above.) The ensuing inequality

\langle \Phi\rangle\geq-\Delta S

shows that a positive fitness flux is “an almost universal evolutionary principle of biological systems” [Mustonen2010], with negative contributions limited to time intervals with a systematic loss of adaptation (\Delta S > 0). This statement may well be the closest thing to a version of the Second Law of Thermodynamics applying to evolutionary dynamics.

It is really quite remarkable that thermodynamical dissipation and Darwinian evolution can be reduced to the same stochastic mechanism, and that notions such as ‘fitness flux’ and ‘heat’ can arise as two faces of the same mathematical coin, namely the ‘skewness’ of Markovian transitions. After all, the phenomenon of life is in itself a direct challenge to thermodynamics, isn’t it? When thermal phenomena tend to increase the world’s disorder, life strives to bring about and maintain exquisitely fine spatial and chemical structures—which is why Schrödinger famously proposed to define life as negative entropy. Could there be a more striking confirmation of his intuition—and a reconciliation of evolution and thermodynamics in the same go—than the fundamental inequality of adaptive evolution \langle\Phi\rangle\geq-\Delta S?

Surely the detailed fluctuation theorem for Markov processes has other applications, pertaining neither to thermodynamics nor adaptive evolution. Can you think of any?

Proof of the fluctuation theorem

I am a physicist, but knowing that many readers of John’s blog are mathematicians, I’ll do my best to frame—and prove—the FT as an actual theorem.

Let (\Omega,\mathcal{T},p) be a probability space and (\,\cdot\,)^{\dagger}=\Omega\to \Omega a measurable involution of \Omega. Denote p^{\dagger} the pushforward probability measure through this involution, and

\displaystyle{ R=\ln \frac{d p}{d p^\dagger} }

the logarithm of the corresponding Radon-Nikodym derivative (we assume p^\dagger and p are mutually absolutely continuous). Then the following lemmas are true, with (1)\Rightarrow(2)\Rightarrow(3):

Lemma 1. The detailed fluctuation relation:

\forall A\in\mathbb{R} \quad  p\big(R^{-1}(-A) \big)=e^{-A}p \big(R^{-1}(A) \big)

Lemma 2. The integral fluctuation relation:

\displaystyle{\int_{\Omega} d p(\omega)\,e^{-R(\omega)}=1 }

Lemma 3. The positivity of the Kullback-Leibler divergence:

D(p\,\Vert\, p^{\dagger}):=\int_{\Omega} d p(\omega)\,R(\omega)\geq 0.

These are basic facts which anyone can show: (2)\Rightarrow(3) by Jensen’s inequality, (1)\Rightarrow(2) trivially, and (1) follows from R(\omega^{\dagger})=-R(\omega) and the change of variables theorem, as follows,

\begin{array}{ccl} \displaystyle{ \int_{R^{-1}(-A)} d p(\omega)} &=& \displaystyle{ \int_{R^{-1}(A)}d p^{\dagger}(\omega) } \\ \\ &=& \displaystyle{ \int_{R^{-1}(A)} d p(\omega)\, e^{-R(\omega)} } \\ \\ &=& \displaystyle{ e^{-A} \int_{R^{-1}(A)} d p(\omega)} .\end{array}

But here is the beauty: if

(\Omega,\mathcal{T},p) is actually a Markov process defined over some time interval [0,T] and valued in some (say discrete) state space \Sigma, with the instantaneous probability \pi_{a}(t)=p\big(\{\omega_{t}=a\} \big) of each state a\in\Sigma satisfying the master equation (aka Kolmogorov equation)

\displaystyle{ \frac{d\pi_{a}(t)}{dt}=\sum_{b\neq a}\Big(\gamma_{b a}(t)\pi_{a}(t)-\gamma_{a b}(t)\pi_{b}(t)\Big),}

and

• the dagger involution is time-reversal, that is \omega^{\dagger}_{t}:=\omega_{T-t},

then for a given path

\displaystyle{\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T})\in\Omega}

the logarithmic ratio R(\omega) decomposes into ‘variation of self-information’ and ‘cumulative skewness’ along \omega:

\displaystyle{ R(\omega)=\underbrace{\Big(\ln\pi_{a_0}(0)-\ln\pi_{a_N}(T) \Big)}_{\Delta i(\omega)}-\underbrace{\sum_{j=1}^{N}\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}_{\Sigma(\omega)}.}

This is easy to see if one writes the probability of a path explicitly as

\displaystyle{p(\omega)=\pi_{a_{0}}(0)\left[\prod_{j=1}^{N}\phi_{a_{j-1}}(\tau_{j-1},\tau_{j})\gamma_{a_{j-1}a_{j}}(\tau_{j})\right]\phi_{a_{N}}(\tau_{N},T)}

where

\displaystyle{ \phi_{a}(\tau,\tau')=\phi_{a}(\tau',\tau)=\exp\Big(-\sum_{b\neq a}\int_{\tau}^{\tau'}dt\, \gamma_{a b}(t)\Big)}

is the probability that the process remains in the state a between the times \tau and \tau'. It follows from the above lemma that

Theorem. Let (\Omega,\mathcal{T},p) be a Markov process and let i,\Sigma:\Omega\rightarrow \mathbb{R} be defined as above. Then we have

1. The detailed fluctuation theorem:

\forall A\in\mathbb{R}, p\big((\Delta i-\Sigma)^{-1}(-A) \big)=e^{-A}p \big((\Delta i-\Sigma)^{-1}(A) \big)

2. The integral fluctuation theorem:

\int_{\Omega} d p(\omega)\,e^{-\Delta i(\omega)+\Sigma(\omega)}=1

3. The ‘Second Law’ inequality:

\displaystyle{ \Delta S:=\int_{\Omega} d p(\omega)\,\Delta i(\omega)\geq \int_{\Omega} d p(\omega)\,\Sigma(\omega)}

The same theorem can be formulated for other kinds of Markov processes as well, including diffusion processes (in which case it follows from the Girsanov theorem).

References

Landauer’s principle was introduced here:

• [Landauer1961] R. Landauer, Irreversibility and heat generation in the computing process}, IBM Journal of Research and Development 5, (1961) 183–191.

and is now being verified experimentally by various groups worldwide.

The ‘fundamental theorem of natural selection’ was derived by Fisher in his book:

• [Fisher1930] R. Fisher, The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930.

His derivation has long been considered obscure, even perhaps wrong, but apparently the theorem is now well accepted. I believe the first Markovian models of genetic evolution appeared here:

• [Fisher1922] R. A. Fisher, On the dominance ratio, Proc. Roy. Soc. Edinb. 42 (1922), 321–341.

• [Wright1931] S. Wright, Evolution in Mendelian populations, Genetics 16 (1931), 97–159.

Fluctuation theorems are reviewed here:

• [Sevick2008] E. Sevick, R. Prabhakar, S. R. Williams, and D. J. Searles, Fluctuation theorems, Ann. Rev. Phys. Chem. 59 (2008), 603–633.

Two of the key ideas for the ‘detailed fluctuation theorem’ discussed here are due to Crooks:

• [Crooks1999] Gavin Crooks, The entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60 (1999), 2721–2726.

who identified (E_{a}(\tau_{j})-E_{a}(\tau_{j-1})) as heat, and Seifert:

• [Seifert2005] Udo Seifert, Entropy production along a stochastic trajectory and an integral fluctuation theorem, Phys. Rev. Lett. 95 (2005), 4.

who understood the relevance of the self-information in this context.

The connection between statistical physics and evolutionary biology is discussed here:

• [Sella2005] G. Sella and A.E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. USA 102 (2005), 9541–9546.

and the ‘fitness flux theorem’ is derived in

• [Mustonen2010] V. Mustonen and M. Lässig, Fitness flux and ubiquity of adaptive evolution, Proc. Nat. Acad. Sci. USA 107 (2010), 4248–4253.

Schrödinger’s famous discussion of the physical nature of life was published here:

• [Schrödinger1944] E. Schrödinger, What is Life?, Cambridge University Press, Cambridge, 1944.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers