guest post by Matteo Smerlak
Thermodynamical dissipation and adaptive evolution are two faces of the same Markovian coin!
Consider this. The Second Law of Thermodynamics states that the entropy of an isolated thermodynamic system can never decrease; Landauer’s principle maintains that the erasure of information inevitably causes dissipation; Fisher’s fundamental theorem of natural selection asserts that any fitness difference within a population leads to adaptation in an evolution process governed by natural selection. Diverse as they are, these statements have two common characteristics:
1. they express the irreversibility of certain natural phenomena, and
2. the dynamical processes underlying these phenomena involve an element of randomness.
Doesn’t this suggest to you the following question: Could it be that thermal phenomena, forgetful information processing and adaptive evolution are governed by the same stochastic mechanism?
The answer is—yes! The key to this rather profound connection resides in a universal property of Markov processes discovered recently in the context of non-equilibrium statistical mechanics, and known as the ‘fluctuation theorem’. Typically stated in terms of ‘dissipated work’ or ‘entropy production’, this result can be seen as an extension of the Second Law of Thermodynamics to small systems, where thermal fluctuations cannot be neglected. But it is actually much more than this: it is the mathematical underpinning of irreversibility itself, be it thermodynamical, evolutionary, or else. To make this point clear, let me start by giving a general formulation of the fluctuation theorem that makes no reference to physics concepts such as ‘heat’ or ‘work’.
The mathematical fact
Consider a system randomly jumping between states with (possibly time-dependent) transition rates where is the state prior to the jump, while is the state after the jump. I’ll assume that this dynamics defines a (continuous-time) Markov process, namely that the numbers are the matrix entries of an infinitesimal stochastic matrix, which means that its off-diagonal entries are non-negative and that its columns sum up to zero.
Now, each possible history of this process can be characterized by the sequence of occupied states and by the times at which the transitions occur :
Define the skewness of each of these transitions to be the logarithmic ratio of transition rates:
Also define the self-information of the system in state at time by:
where is the probability that the system is in state at time , given some prescribed initial distribution . This quantity is also sometimes called the surprisal, as it measures the ‘surprise’ of finding out that the system is in state at time .
Then the following identity—the detailed fluctuation theorem—holds:
is the cumulative skewness along a trajectory of the system, and
is the variation of self-information between the end points of this trajectory.
This identity has an immediate consequence: if denotes the average over all realizations of the process, then we have the integral fluctuation theorem:
which, by the convexity of the exponential and Jensen’s inequality, implies:
In short: the mean variation of self-information, aka the variation of Shannon entropy
is bounded from below by the mean cumulative skewness of the underlying stochastic trajectory.
This is the fundamental mathematical fact underlying irreversibility. To unravel its physical and biological consequences, it suffices to consider the origin and interpretation of the ‘skewness’ term in different contexts. (By the way, people usually call the ‘entropy production’ or ‘dissipation function’—but how tautological is that?)
The physical and biological consequences
Consider first the standard stochastic-thermodynamic scenario where a physical system is kept in contact with a thermal reservoir at inverse temperature and undergoes thermally induced transitions between states . By virtue of the detailed balance condition:
the skewness of each such transition is times the energy difference between the states and , namely the heat received from the reservoir during the transition. Hence, the mean cumulative skewness is nothing but with the total heat received by the system along the process. It follows from the detailed fluctuation theorem that
which is of course Clausius’ inequality. In a computational context where the control parameter is the entropy variation itself (such as in a bit-erasure protocol, where ), this inequality in turn expresses Landauer’s principle: it impossible to decrease the self-information of the system’s state without dissipating a minimal amount of heat into the environment (in this case , the ‘Landauer bound’). More general situations (several types of reservoirs, Maxwell-demon-like feedback controls) can be treated along the same lines, and the various forms of the Second Law derived from the detailed fluctuation theorem.
Now, many would agree that evolutionary dynamics is a wholly different business from thermodynamics; in particular, notions such as ‘heat’ or ‘temperature’ are clearly irrelevant to Darwinian evolution. However, the stochastic framework of Markov processes is relevant to describe the genetic evolution of a population, and this fact alone has important consequences. As a simple example, consider the time evolution of mutant fixations in a population, with ranging over the possible genotypes. In a ‘symmetric mutation scheme’, which I understand is biological parlance for ‘reversible Markov process’, meaning one that obeys detailed balance, the ratio between the and transition rates is completely determined by the fitnesses and of and , according to
where is a model-dependent function of the effective population size [Sella2005]. Along a given history of mutant fixations, the cumulated skewness is therefore given by minus the fitness flux:
The integral fluctuation theorem then becomes the fitness flux theorem:
discussed recently by Mustonen and Lässig [Mustonen2010] and implying Fisher’s fundamental theorem of natural selection as a special case. (Incidentally, the ‘fitness flux theorem’ derived in this reference is more general than this; for instance, it does not rely on the ‘symmetric mutation scheme’ assumption above.) The ensuing inequality
shows that a positive fitness flux is “an almost universal evolutionary principle of biological systems” [Mustonen2010], with negative contributions limited to time intervals with a systematic loss of adaptation (). This statement may well be the closest thing to a version of the Second Law of Thermodynamics applying to evolutionary dynamics.
It is really quite remarkable that thermodynamical dissipation and Darwinian evolution can be reduced to the same stochastic mechanism, and that notions such as ‘fitness flux’ and ‘heat’ can arise as two faces of the same mathematical coin, namely the ‘skewness’ of Markovian transitions. After all, the phenomenon of life is in itself a direct challenge to thermodynamics, isn’t it? When thermal phenomena tend to increase the world’s disorder, life strives to bring about and maintain exquisitely fine spatial and chemical structures—which is why Schrödinger famously proposed to define life as negative entropy. Could there be a more striking confirmation of his intuition—and a reconciliation of evolution and thermodynamics in the same go—than the fundamental inequality of adaptive evolution ?
Surely the detailed fluctuation theorem for Markov processes has other applications, pertaining neither to thermodynamics nor adaptive evolution. Can you think of any?
Proof of the fluctuation theorem
I am a physicist, but knowing that many readers of John’s blog are mathematicians, I’ll do my best to frame—and prove—the FT as an actual theorem.
Let be a probability space and a measurable involution of . Denote the pushforward probability measure through this involution, and
the logarithm of the corresponding Radon-Nikodym derivative (we assume and are mutually absolutely continuous). Then the following lemmas are true, with :
Lemma 1. The detailed fluctuation relation:
Lemma 2. The integral fluctuation relation:
Lemma 3. The positivity of the Kullback-Leibler divergence:
These are basic facts which anyone can show: by Jensen’s inequality, trivially, and follows from and the change of variables theorem, as follows,
But here is the beauty: if
• is actually a Markov process defined over some time interval and valued in some (say discrete) state space , with the instantaneous probability of each state satisfying the master equation (aka Kolmogorov equation)
• the dagger involution is time-reversal, that is
then for a given path
the logarithmic ratio decomposes into ‘variation of self-information’ and ‘cumulative skewness’ along :
This is easy to see if one writes the probability of a path explicitly as
is the probability that the process remains in the state between the times and . It follows from the above lemma that
Theorem. Let be a Markov process and let be defined as above. Then we have
1. The detailed fluctuation theorem:
2. The integral fluctuation theorem:
3. The ‘Second Law’ inequality:
The same theorem can be formulated for other kinds of Markov processes as well, including diffusion processes (in which case it follows from the Girsanov theorem).
Landauer’s principle was introduced here:
• [Landauer1961] R. Landauer, Irreversibility and heat generation in the computing process}, IBM Journal of Research and Development 5, (1961) 183–191.
and is now being verified experimentally by various groups worldwide.
The ‘fundamental theorem of natural selection’ was derived by Fisher in his book:
• [Fisher1930] R. Fisher, The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930.
His derivation has long been considered obscure, even perhaps wrong, but apparently the theorem is now well accepted. I believe the first Markovian models of genetic evolution appeared here:
• [Fisher1922] R. A. Fisher, On the dominance ratio, Proc. Roy. Soc. Edinb. 42 (1922), 321–341.
• [Wright1931] S. Wright, Evolution in Mendelian populations, Genetics 16 (1931), 97–159.
Fluctuation theorems are reviewed here:
• [Sevick2008] E. Sevick, R. Prabhakar, S. R. Williams, and D. J. Searles, Fluctuation theorems, Ann. Rev. Phys. Chem. 59 (2008), 603–633.
Two of the key ideas for the ‘detailed fluctuation theorem’ discussed here are due to Crooks:
• [Crooks1999] Gavin Crooks, The entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60 (1999), 2721–2726.
who identified as heat, and Seifert:
• [Seifert2005] Udo Seifert, Entropy production along a stochastic trajectory and an integral fluctuation theorem, Phys. Rev. Lett. 95 (2005), 4.
who understood the relevance of the self-information in this context.
The connection between statistical physics and evolutionary biology is discussed here:
• [Sella2005] G. Sella and A.E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. USA 102 (2005), 9541–9546.
and the ‘fitness flux theorem’ is derived in
Schrödinger’s famous discussion of the physical nature of life was published here:
• [Schrödinger1944] E. Schrödinger, What is Life?, Cambridge University Press, Cambridge, 1944.