Biology as Information Dynamics (Part 2)

27 April, 2017

Here’s a video of the talk I gave at the Stanford Complexity Group:

You can see slides here:

Biology as information dynamics.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

I’d given a version of this talk earlier this year at a workshop on Quantifying biological complexity, but I’m glad this second try got videotaped and not the first, because I was a lot happier about my talk this time. And as you’ll see at the end, there were a lot of interesting questions.

Information Geometry (Part 16)

1 February, 2017

This week I’m giving a talk on biology and information:

• John Baez, Biology as information dynamics, talk for Biological Complexity: Can it be Quantified?, a workshop at the Beyond Center, 2 February 2017.

While preparing this talk, I discovered a cool fact. I doubt it’s new, but I haven’t exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of ‘Fisher’s fundamental theorem of natural selection’. I won’t start by explaining that theorem, since my version looks rather different than Fisher’s, and I came up with mine precisely because I had trouble understanding his. I’ll say a bit more about this at the end.

Here’s my version:

The square of the rate at which a population learns information is the variance of its fitness.

This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!

The setup

Let’s start by assuming we have n different kinds of self-replicating entities with populations P_1, \dots, P_n. As usual, these could be all sorts of things:

• molecules of different chemicals
• organisms belonging to different species
• genes of different alleles
• restaurants belonging to different chains
• people with different beliefs
• game-players with different strategies
• etc.

I’ll call them replicators of different species.

Let’s suppose each population P_i is a function of time that grows at a rate equal to this population times its ‘fitness’. I explained the resulting equation back in Part 9, but it’s pretty simple:

\displaystyle{ \frac{d}{d t} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t)   }

Here f_i is a completely arbitrary smooth function of all the populations! We call it the fitness of the ith species.

This equation is important, so we want a short way to write it. I’ll often write f_i(P_1(t), \dots, P_n(t)) simply as f_i, and P_i(t) simply as P_i. With these abbreviations, which any red-blooded physicist would take for granted, our equation becomes simply this:

\displaystyle{ \frac{dP_i}{d t}  = f_i \, P_i   }

Next, let p_i(t) be the probability that a randomly chosen organism is of the ith species:

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }

Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:

\displaystyle{ \frac{d}{d t} p_i(t)  = ( f_i - \langle f \rangle ) \, p_i(t) }

Here \langle f \rangle is the average fitness of all the replicators, or mean fitness:

\displaystyle{ \langle f \rangle = \sum_j f_j(P_1(t), \dots, P_n(t)) \, p_j(t)  }

In what follows I’ll abbreviate the replicator equation as follows:

\displaystyle{ \frac{dp_i}{d t}  = ( f_i - \langle f \rangle ) \, p_i }

The result

Okay, now let’s figure out how fast the probability distribution

p(t) = (p_1(t), \dots, p_n(t))

changes with time. For this we need to choose a way to measure the length of the vector

\displaystyle{  \frac{dp}{dt} = (\frac{d}{dt} p_1(t), \dots, \frac{d}{dt} p_n(t)) }

And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.

I’ve talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution p(t) changes with time, its speed

\displaystyle{  \left\| \frac{dp}{dt} \right\|}

as measured using the Fisher information metric can be seen as the rate at which information is learned. I’ll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose v and w are two tangent vectors to the point p in the space of probability distributions. Then the Fisher information metric is given as follows:

\displaystyle{ \langle v, w \rangle = \sum_i \frac{1}{p_i} \, v_i w_i }

Using this we can calculate the speed at which p(t) moves when it obeys the replicator equation. Actually the square of the speed is simpler:

\begin{array}{ccl}  \displaystyle{ \left\| \frac{dp}{dt}  \right\|^2 } &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( \frac{dp_i}{dt} \right)^2 } \\ \\  &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( ( f_i - \langle f \rangle ) \, p_i \right)^2 } \\ \\  &=& \displaystyle{ \sum_i  ( f_i - \langle f \rangle )^2 p_i }   \end{array}

The answer has a nice meaning, too! It’s just the variance of the fitness: that is, the square of its standard deviation.

So, if you’re willing to buy my claim that the speed \|dp/dt\| is the rate at which our population learns new information, then we’ve seen that the square of the rate at which a population learns information is the variance of its fitness!

Fisher’s fundamental theorem

Now, how is this related to Fisher’s fundamental theorem of natural selection? First of all, what is Fisher’s fundamental theorem? Here’s what Wikipedia says about it:

It uses some mathematical notation but is not a theorem in the mathematical sense.

It states:

“The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Or in more modern terminology:

“The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time”.

Largely as a result of Fisher’s feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher’s theorem was indeed correct (and that Fisher’s proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection

Price’s paper is here:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 36 (1972), 129–140.

I don’t find it very clear, perhaps because I didn’t spend enough time on it. But I think I get the idea.

My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher’s original equation: the variance of the fitness.

But my left-hand side is different: it’s the square of the speed of the corresponding probability distribution, where speed is measured using the ‘Fisher information metric’. This metric was discovered by the same guy, Ronald Fisher, but I don’t think he used it in his work on the fundamental theorem!

Something a bit similar to my statement appears as Theorem 2 of this paper:

• Marc Harper, Information geometry and evolutionary game theory.

and for that theorem he cites:

• Josef Hofbauer and Karl Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998.

However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher’s fundamental theorem. Moreover, he assumes that the probability distribution p(t) flows along the gradient of a function, and I’m not assuming that. Indeed, my version applies to situations where the probability distribution moves round and round in periodic orbits!

Relative information and the Fisher information metric

The key to generalizing Fisher’s fundamental theorem is thus to focus on the speed at which p(t) moves, rather than the increase in fitness. Why do I call this speed the ‘rate at which the population learns information’? It’s because we’re measuring this speed using the Fisher information metric, which is closely connected to relative information, also known as relative entropy or the Kullback–Leibler divergence.

I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here’s a faster one, which I created while preparing my talk.

The information of a probability distribution q relative to a probability distribution p is

\displaystyle{     I(q,p) = \sum_{i =1}^n q_i \log\left(\frac{q_i}{p_i}\right) }

It says how much information you learn if you start with a hypothesis p saying that the probability of the ith situation was p_i, and then update this to a new hypothesis q.

Now suppose you have a hypothesis that’s changing with time in a smooth way, given by a time-dependent probability p(t). Then a calculation shows that

\displaystyle{ \left.\frac{d}{dt} I(p(t),p(t_0)) \right|_{t = t_0} = 0 }

for all times t_0. This seems paradoxical at first. I like to jokingly put it this way:

To first order, you’re never learning anything.

However, as long as the velocity \frac{d}{dt}p(t_0) is nonzero, we have

\displaystyle{ \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} > 0 }

so we can say

To second order, you’re always learning something… unless your opinions are fixed.

This lets us define a ‘rate of learning’—that is, a ‘speed’ at which the probability distribution p(t) moves. And this is precisely the speed given by the Fisher information metric!

In other words:

\displaystyle{ \left\|\frac{dp}{dt}(t_0)\right\|^2 =  \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} }

where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.

In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a ‘rate at which information is learned’, we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it’s evolving according to the replicator equation, this speed is also just the variance of the fitness!

Biology as Information Dynamics (Part 1)

31 January, 2017

This is my talk for the workshop Biological Complexity: Can It Be Quantified?

• John Baez, Biology as information dynamics, 2 February 2017.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’—a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clean general formulation of Fisher’s fundamental theorem of natural selection.

For more, read:

• Marc Harper, The replicator equation as an inference dynamic.

• Marc Harper, Information geometry and evolutionary game theory.

• Barry Sinervo and Curt M. Lively, The rock-paper-scissors game and the evolution of alternative male strategies, Nature 380 (1996), 240–243.

• John Baez, Diversity, entropy and thermodynamics.

• John Baez, Information geometry.

The last reference contains proofs of the equations shown in red in my slides.
In particular, Part 16 contains a proof of my updated version of Fisher’s fundamental theorem.

Compositional Frameworks for Open Systems

27 November, 2016


Here are the slides of Blake Pollard’s talk at the Santa Fe Institute workshop on Statistical Physics, Information Processing and Biology:

• Blake Pollard, Compositional frameworks for open systems, 17 November 2016.

He gave a really nice introduction to how we can use categories to study open systems, with his main example being ‘open Markov processes’, where probability can flow in and out of the set of states. People liked it a lot!


Jarzynksi on Non-Equilibrium Statistical Mechanics

18 November, 2016


Here at the Santa Fe Institute we’re having a workshop on Statistical Physics, Information Processing and Biology. Unfortunately the talks are not being videotaped, so it’s up to me to spread the news of what’s going on here.

Christopher Jarzynski is famous for discovering the Jarzynski equality. It says

\displaystyle{ e^ { -\Delta F / k T} = \langle e^{ -W/kT } \rangle }

where k is Boltzmann’s consstant and T is the temperature of a system that’s in equilibrium before some work is done on it. \Delta F is the change in free energy, W is the amount of work, and the angle brackets represent an average over the possible options for what takes place—this sort of process is typically nondeterministic.

We’ve seen a good quick explanation of this equation here on Azimuth:

• Eric Downes, Crooks’ Fluctuation Theorem, Azimuth, 30 April 2011.

We’ve also gotten a proof, where it was called the ‘integral fluctuation theorem’:

• Matteo Smerlak, The mathematical origin of irreversibility, Azimuth, 8 October 2012.

It’s a fundamental result in nonequilibrium statistical mechanics—a subject where inequalities are so common that this equation is called an ‘equality’.

Two days ago, Jarzynski gave an incredibly clear hour-long tutorial on this subject, starting with the basics of thermodynamics and zipping forward to modern work. With his permission, you can see the slides here:

• Christopher Jarzynski, A brief introduction to the delights of non-equilibrium statistical physics.

Also try this review article:

• Christopher Jarzynski, Equalities and inequalities: irreversibility and the Second Law of thermodynamics at the nanoscale, Séminaire Poincaré XV Le Temps (2010), 77–102.

Azimuth News (Part 5)

11 June, 2016

I’ve been rather quiet about Azimuth projects lately, because I’ve been too busy actually working on them. Here’s some of what’s happening:

Jason Erbele is finishing his thesis, entitled Categories in Control: Applied PROPs. He successfully gave his thesis defense on Wednesday June 8th, but he needs to polish it up some more. Building on the material in our paper “Categories in control”, he’s defined a category where the morphisms are signal flow diagrams. But interestingly, not all the diagrams you can draw are actually considered useful in control theory! So he’s also found a subcategory where the morphisms are the ‘good’ signal flow diagrams, the ones control theorists like. For these he studies familiar concepts like controllability and observability. When his thesis is done I’ll announce it here.

Brendan Fong is also finishing his thesis, called The Algebra of Open and Interconnected Systems. Brendan has already created a powerful formalism for studying open systems: the decorated cospan formalism. We’ve applied it to two examples: electrical circuits and Markov processes. Lately he’s been developing the formalism further, and this will appear in his thesis. Again, I’ll talk about it when he’s done!

Blake Pollard and I are writing a paper called “A compositional framework for open chemical reaction networks”. Here we take our work on Markov processes and throw in two new ingredients: dynamics and nonlinearity. Of course Markov processes have a dynamics, but in our previous paper when we ‘black-boxed’ them to study their external behaviour, we got a relation between flows and populations in equilibrium. Now we explain how to handle nonequilibrium situations as well.

Brandon Coya, Franciscus Rebro and I are writing a paper that might be called “The algebra of networks”. I’m not completely sure of the title, nor who the authors will be: Brendan Fong may also be a coauthor. But the paper explores the technology of PROPs as a tool for describing networks. As an application, we’ll give a new shorter proof of the functoriality of black-boxing for electrical circuits. This new proof also applies to nonlinear circuits. I’m really excited about how the theory of PROPs, first introduced in algebraic topology, is catching fire with all the new applications to network theory.

I expect all these projects to be done by the end of the summer. Near the end of June I’ll go to the Centre for Quantum Technologies, in Singapore. This will be my last summer there. My main job will be to finish up the two papers that I’m supposed to be writing.

There’s another paper that’s already done:

Kenny Courser has written a paper “A bicategory of decorated cospans“, pushing Brendan’s framework from categories to bicategories. I’ll explain this very soon here on this blog! One goal is to understand things like the coarse-graining of open systems: that is, the process of replacing a detailed description by a less detailed description. Since we treat open systems as morphisms, coarse-graining is something that goes from one morphism to another, so it’s naturally treated as a 2-morphism in a bicategory.

So, I’ve got a lot of new ideas to explain here, and I’ll start soon! I also want to get deeper into systems biology.

In the fall I’ve got a couple of short trips lined up:

• Monday November 14 – Friday November 18, 2016 – I’ve been invited by Yoav Kallus to visit the Santa Fe Institute. From the 16th to 18th I’ll attend a workshop on Statistical Physics, Information Processing and Biology.

• Monday December 5 – Friday December 9 – I’ve been invited to Berkeley for a workshop on Compositionality at the Simons Institute for the Theory of Computing, organized by Samson Abramsky, Lucien Hardy, and Michael Mislove. ‘Compositionality’ is a name for how you describe the behavior of a big complicated system in terms of the behaviors of its parts, so this is closely connected to my dream of studying open systems by treating them as morphisms that can be composed to form bigger open systems.

Here’s the announcement:

The compositional description of complex objects is a fundamental feature of the logical structure of computation. The use of logical languages in database theory and in algorithmic and finite model theory provides a basic level of compositionality, but establishing systematic relationships between compositional descriptions and complexity remains elusive. Compositional models of probabilistic systems and languages have been developed, but inferring probabilistic properties of systems in a compositional fashion is an important challenge. In quantum computation, the phenomenon of entanglement poses a challenge at a fundamental level to the scope of compositional descriptions. At the same time, compositionally has been proposed as a fundamental principle for the development of physical theories. This workshop will focus on the common structures and methods centered on compositionality that run through all these areas.

I’ll say more about both these workshops when they take place.

Statistical Laws of Darwinian Evolution

18 April, 2016

guest post by Matteo Smerlak

Biologists like Steven J. Gould like to emphasize that evolution is unpredictable. They have a point: there is absolutely no way an alien visiting the Earth 400 million years ago could have said:

Hey, I know what’s gonna happen here. Some descendants of those ugly fish will grow wings and start flying in the air. Others will walk the surface of the Earth for a few million years, but they’ll get bored and they’ll eventually go back to the oceans; when they do, they’ll be able to chat across thousands of kilometers using ultrasound. Yet others will grow arms, legs, fur, they’ll climb trees and invent BBQ, and, sooner or later, they’ll start wondering “why all this?”.

Nor can we tell if, a week from now, the flu virus will mutate, become highly pathogenic and forever remove the furry creatures from the surface of the Earth.

Evolution isn’t gravity—we can’t tell in which directions things will fall down.

One reason we can’t predict the outcomes of evolution is that genomes evolve in a super-high dimensional combinatorial space, which a ginormous number of possible turns at every step. Another is that living organisms interact with one another in a massively non-linear way, with, feedback loops, tipping points and all that jazz.

Life’s a mess, if you want my physicist’s opinion.

But that doesn’t mean that nothing can be predicted. Think of statistics. Nobody can predict who I’ll vote for in the next election, but it’s easy to tell what the distribution of votes in the country will be like. Thus, for continuous variables which arise as sums of large numbers of independent components, the central limit theorem tells us that the distribution will always be approximately normal. Or take extreme events: the max of N independent random variables is distributed according to a member of a one-parameter family of so-called “extreme value distributions”: this is the content of the famous Fisher–Tippett–Gnedenko theorem.

So this is the problem I want to think about in this blog post: is evolution ruled by statistical laws? Or, in physics terms: does it exhibit some form of universality?

Fitness distributions are the thing

One lesson from statistical physics is that, to uncover universality, you need to focus on relevant variables. In the case of evolution, it was Darwin’s main contribution to figure out the main relevant variable: the average number of viable offspring, aka fitness, of an organism. Other features—physical strength, metabolic efficiency, you name it—matter only insofar as they are correlated with fitness. If we further assume that fitness is (approximately) heritable, meaning that descendants have the same fitness as their ancestors, we get a simple yet powerful dynamical principle called natural selection: in a given population, the lineage with the highest fitness eventually dominates, i.e. its fraction goes to one over time. This principle is very general: it applies to genes and species, but also to non-living entities such as algorithms, firms or language. The general relevance of natural selection as a evolutionary force is sometimes referred to as “Universal Darwinism”.

The general idea of natural selection is pictured below (reproduced from this paper):

It’s not hard to write down an equation which expresses natural selection in general terms. Consider an infinite population in which each lineage grows with some rate x. (This rate is called the log-fitness or Malthusian fitness to contrast it with the number of viable offspring w=e^{x\Delta t} with \Delta t the lifetime of a generation. It’s more convenient to use x than w in what follows, so we’ll just call x “fitness”). Then the distribution of fitness at time t satisfies the equation

\displaystyle{ \frac{\partial p_t(x)}{\partial t} =\left(x-\int d y\, y\, p_t(y)\right)p_t(x) }

whose explicit solution in terms of the initial fitness distribution p_0(x):

\displaystyle{ p_t(x)=\frac{e^{x t}p_0(x)}{\int d y\, e^{y t}p_0(y)} }

is called the Cramér transform of p_0(x) in large deviations theory. That is, viewed as a flow in the space of probability distributions, natural selection is nothing but a time-dependent exponential tilt. (These equations and the results below can be generalized to include the effect of mutations, which are critical to maintain variation in the population, but we’ll skip this here to focus on pure natural selection. See my paper referenced below for more information.)

An immediate consequence of these equations is that the mean fitness \mu_t=\int dx\, x\, p_t(x) grows monotonically in time, with a rate of growth given by the variance \sigma_t^2=\int dx\, (x-\mu_t)^2\, p_t(x):

\displaystyle{ \frac{d\mu_t}{dt}=\sigma_t^2\geq 0 }

The great geneticist Ronald Fisher (yes, the one in the extreme value theorem!) was very impressed with this relationship. He thought it amounted to an biological version of the second law of thermodynamics, writing in his 1930 monograph

Professor Eddington has recently remarked that “The law that entropy always increases—the second law of thermodynamics—holds, I think, the supreme position among the laws of nature”. It is not a little instructive that so similar a law should hold the supreme position among the biological sciences.

Unfortunately, this excitement hasn’t been shared by the biological community, notably because this Fisher “fundamental theorem of natural selection” isn’t predictive: the mean fitness \mu_t grows according to the fitness variance \sigma_t^2, but what determines the evolution of \sigma_t^2? I can’t use the identity above to predict the speed of evolution in any sense. Geneticists say it’s “dynamically insufficient”.

Two limit theorems

But the situation isn’t as bad as it looks. The evolution of p_t(x) may be decomposed into the evolution of its mean \mu_t, of its variance \sigma_t^2, and of its shape or type

\overline{p}_t(x)=\sigma_t p_t(\sigma_t x+\mu_t).

(We also call \overline{p}_t(x) the “standardized fitness distribution”.) With Ahmed Youssef we showed that:

• If p_0(x) is supported on the whole real line and decays at infinity as

-\ln\int_x^{\infty}p_0(y)d y\underset{x\to\infty}{\sim} x^{\alpha}

for some \alpha > 1, then \mu_t\sim t^{\overline{\alpha}-1}, \sigma_t^2\sim t^{\overline{\alpha}-2} and \overline{p}_t(x) converges to the standard normal distribution as t\to\infty. Here \overline{\alpha} is the conjugate exponent to \alpha, i.e. 1/\overline{\alpha}+1/\alpha=1.

• If p_0(x) has a finite right-end point x_+ with

p(x)\underset{x\to x_+}{\sim} (x_+-x)^\beta

for some \beta\geq0, then x_+-\mu_t\sim t^{-1}, \sigma_t^2\sim t^{-2} and \overline{p}_t(x) converges to the flipped gamma distribution

\displaystyle{ p^*_\beta(x)= \frac{(1+\beta)^{(1+\beta)/2}}{\Gamma(1+\beta)} \Theta[x-(1+\beta)^{1/2}] }

\displaystyle { e^{-(1+\beta)^{1/2}[(1+\beta)^{1/2}-x]}\Big[(1+\beta)^{1/2}-x\Big]^\beta }

Here and below the symbol \sim means “asymptotically equivalent up to a positive multiplicative constant”; \Theta(x) is the Heaviside step function. Note that p^*_\beta(x) becomes Gaussian in the limit \beta\to\infty, i.e. the attractors of cases 1 and 2 form a continuous line in the space of probability distributions; the other extreme case, \beta\to0, corresponds to a flipped exponential distribution.

The one-parameter family of attractors p_\beta^*(x) is plotted below:

These results achieve two things. First, they resolve the dynamical insufficiency of Fisher’s fundamental theorem by giving estimates of the speed of evolution in terms of the tail behavior of the initial fitness distribution. Second, they show that natural selection is indeed subject to a form of universality, whereby the relevant statistical structure turns out to be finite dimensional, with only a handful of “conserved quantities” (the \alpha and \beta exponents) controlling the late-time behavior of natural selection. This amounts to a large reduction in complexity and, concomitantly, an enhancement of predictive power.

(For the mathematically-oriented reader, the proof of the theorems above involves two steps: first, translate the selection equation into a equation for (cumulant) generating functions; second, use a suitable Tauberian theorem—the Kasahara theorem—to relate the behavior of generating functions at large values of their arguments to the tail behavior of p_0(x). Details in our paper.)

It’s useful to consider the convergence of fitness distributions to the attractors p_\beta^*(x) for 0\leq\beta\leq \infty in the skewness-kurtosis plane, i.e. in terms of the third and fourth cumulants of p_t(x).

The red curve is the family of attractors, with the normal at the bottom right and the flipped exponential at the top left, and the dots correspond to numerical simulations performed with the classical Wright–Fisher model and with a simple genetic algorithm solving a linear programming problem. The attractors attract!

Conclusion and a question

Statistics is useful because limit theorems (the central limit theorem, the extreme value theorem) exist. Without them, we wouldn’t be able to make any population-level prediction. Same with statistical physics: it only because matter consists of large numbers of atoms, and limit theorems hold (the H-theorem, the second law), that macroscopic physics is possible in the first place. I believe the same perspective is useful in evolutionary dynamics: it’s true that we can’t predict how many wings birds will have in ten million years, but we can tell what shape fitness distributions should have if natural selection is true.

I’ll close with an open question for you, the reader. In the central limit theorem as well as in the second law of thermodynamics, convergence is driven by a Lyapunov function, namely entropy. (In the case of the central limit theorem, it’s a relatively recent result by Arstein et al.: the entropy of the normalized sum of n i.i.d. random variables, when it’s finite, is a monotonically increasing function of n.) In the case of natural selection for unbounded fitness, it’s clear that entropy will also be eventually monotonically increasing—the normal is the distribution with largest entropy at fixed variance and mean.

Yet it turns out that, in our case, entropy isn’t monotonic at all times; in fact, the closer the initial distribution p_0(x) is to the normal distribution, the later the entropy of the standardized fitness distribution starts to increase. Or, equivalently, the closer the initial distribution p_0(x) to the normal, the later its relative entropy with respect to the normal. Why is this? And what’s the actual Lyapunov function for this process (i.e., what functional of the standardized fitness distribution is monotonic at all times under natural selection)?

In the plots above the blue, orange and green lines correspond respectively to

\displaystyle{ p_0(x)\propto e^{-x^2/2-x^4}, \quad p_0(x)\propto e^{-x^2/2-.01x^4}, \quad p_0(x)\propto e^{-x^2/2-.001x^4} }


• S. J. Gould, Wonderful Life: The Burgess Shale and the Nature of History, W. W. Norton & Co., New York, 1989.

• M. Smerlak and A. Youssef, Limiting fitness distributions in evolutionary dynamics, 2015.

• R. A. Fisher, The Genetical Theory of Natural Selection, Oxford University Press, Oxford, 1930.

• S. Artstein, K. Ball, F. Barthe and A. Naor, Solution of Shannon’s problem on the monotonicity of entropy, J. Am. Math. Soc. 17 (2004), 975–982.