Information Geometry (Part 16)

1 February, 2017

This week I’m giving a talk on biology and information:

• John Baez, Biology as information dynamics, talk for Biological Complexity: Can it be Quantified?, a workshop at the Beyond Center, 2 February 2017.

While preparing this talk, I discovered a cool fact. I doubt it’s new, but I haven’t exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of ‘Fisher’s fundamental theorem of natural selection’. I won’t start by explaining that theorem, since my version looks rather different than Fisher’s, and I came up with mine precisely because I had trouble understanding his. I’ll say a bit more about this at the end.

Here’s my version:

The square of the rate at which a population learns information is the variance of its fitness.

This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!

The setup

Let’s start by assuming we have n different kinds of self-replicating entities with populations P_1, \dots, P_n. As usual, these could be all sorts of things:

• molecules of different chemicals
• organisms belonging to different species
• genes of different alleles
• restaurants belonging to different chains
• people with different beliefs
• game-players with different strategies
• etc.

I’ll call them replicators of different species.

Let’s suppose each population P_i is a function of time that grows at a rate equal to this population times its ‘fitness’. I explained the resulting equation back in Part 9, but it’s pretty simple:

\displaystyle{ \frac{d}{d t} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t)   }

Here f_i is a completely arbitrary smooth function of all the populations! We call it the fitness of the ith species.

This equation is important, so we want a short way to write it. I’ll often write f_i(P_1(t), \dots, P_n(t)) simply as f_i, and P_i(t) simply as P_i. With these abbreviations, which any red-blooded physicist would take for granted, our equation becomes simply this:

\displaystyle{ \frac{dP_i}{d t}  = f_i \, P_i   }

Next, let p_i(t) be the probability that a randomly chosen organism is of the ith species:

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }

Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:

\displaystyle{ \frac{d}{d t} p_i(t)  = ( f_i - \langle f \rangle ) \, p_i(t) }

Here \langle f \rangle is the average fitness of all the replicators, or mean fitness:

\displaystyle{ \langle f \rangle = \sum_j f_j(P_1(t), \dots, P_n(t)) \, p_j(t)  }

In what follows I’ll abbreviate the replicator equation as follows:

\displaystyle{ \frac{dp_i}{d t}  = ( f_i - \langle f \rangle ) \, p_i }

The result

Okay, now let’s figure out how fast the probability distribution

p(t) = (p_1(t), \dots, p_n(t))

changes with time. For this we need to choose a way to measure the length of the vector

\displaystyle{  \frac{dp}{dt} = (\frac{d}{dt} p_1(t), \dots, \frac{d}{dt} p_n(t)) }

And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.

I’ve talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution p(t) changes with time, its speed

\displaystyle{  \left\| \frac{dp}{dt} \right\|}

as measured using the Fisher information metric can be seen as the rate at which information is learned. I’ll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose v and w are two tangent vectors to the point p in the space of probability distributions. Then the Fisher information metric is given as follows:

\displaystyle{ \langle v, w \rangle = \sum_i \frac{1}{p_i} \, v_i w_i }

Using this we can calculate the speed at which p(t) moves when it obeys the replicator equation. Actually the square of the speed is simpler:

\begin{array}{ccl}  \displaystyle{ \left\| \frac{dp}{dt}  \right\|^2 } &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( \frac{dp_i}{dt} \right)^2 } \\ \\  &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( ( f_i - \langle f \rangle ) \, p_i \right)^2 } \\ \\  &=& \displaystyle{ \sum_i  ( f_i - \langle f \rangle )^2 p_i }   \end{array}

The answer has a nice meaning, too! It’s just the variance of the fitness: that is, the square of its standard deviation.

So, if you’re willing to buy my claim that the speed \|dp/dt\| is the rate at which our population learns new information, then we’ve seen that the square of the rate at which a population learns information is the variance of its fitness!

Fisher’s fundamental theorem

Now, how is this related to Fisher’s fundamental theorem of natural selection? First of all, what is Fisher’s fundamental theorem? Here’s what Wikipedia says about it:

It uses some mathematical notation but is not a theorem in the mathematical sense.

It states:

“The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Or in more modern terminology:

“The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time”.

Largely as a result of Fisher’s feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher’s theorem was indeed correct (and that Fisher’s proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection

Price’s paper is here:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 36 (1972), 129–140.

I don’t find it very clear, perhaps because I didn’t spend enough time on it. But I think I get the idea.

My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher’s original equation: the variance of the fitness.

But my left-hand side is different: it’s the square of the speed of the corresponding probability distribution, where speed is measured using the ‘Fisher information metric’. This metric was discovered by the same guy, Ronald Fisher, but I don’t think he used it in his work on the fundamental theorem!

Something a bit similar to my statement appears as Theorem 2 of this paper:

• Marc Harper, Information geometry and evolutionary game theory.

and for that theorem he cites:

• Josef Hofbauer and Karl Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998.

However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher’s fundamental theorem. Moreover, he assumes that the probability distribution p(t) flows along the gradient of a function, and I’m not assuming that. Indeed, my version applies to situations where the probability distribution moves round and round in periodic orbits!

Relative information and the Fisher information metric

The key to generalizing Fisher’s fundamental theorem is thus to focus on the speed at which p(t) moves, rather than the increase in fitness. Why do I call this speed the ‘rate at which the population learns information’? It’s because we’re measuring this speed using the Fisher information metric, which is closely connected to relative information, also known as relative entropy or the Kullback–Leibler divergence.

I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here’s a faster one, which I created while preparing my talk.

The information of a probability distribution q relative to a probability distribution p is

\displaystyle{     I(q,p) = \sum_{i =1}^n q_i \log\left(\frac{q_i}{p_i}\right) }

It says how much information you learn if you start with a hypothesis p saying that the probability of the ith situation was p_i, and then update this to a new hypothesis q.

Now suppose you have a hypothesis that’s changing with time in a smooth way, given by a time-dependent probability p(t). Then a calculation shows that

\displaystyle{ \left.\frac{d}{dt} I(p(t),p(t_0)) \right|_{t = t_0} = 0 }

for all times t_0. This seems paradoxical at first. I like to jokingly put it this way:

To first order, you’re never learning anything.

However, as long as the velocity \frac{d}{dt}p(t_0) is nonzero, we have

\displaystyle{ \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} > 0 }

so we can say

To second order, you’re always learning something… unless your opinions are fixed.

This lets us define a ‘rate of learning’—that is, a ‘speed’ at which the probability distribution p(t) moves. And this is precisely the speed given by the Fisher information metric!

In other words:

\displaystyle{ \left\|\frac{dp}{dt}(t_0)\right\|^2 =  \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} }

where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.

In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a ‘rate at which information is learned’, we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it’s evolving according to the replicator equation, this speed is also just the variance of the fitness!


Biology as Information Dynamics

31 January, 2017

This is my talk for the workshop Biological Complexity: Can It Be Quantified?

• John Baez, Biology as information dynamics, 2 February 2017.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’—a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clean general formulation of Fisher’s fundamental theorem of natural selection.

For more, read:

• Marc Harper, The replicator equation as an inference dynamic.

• Marc Harper, Information geometry and evolutionary game theory.

• Barry Sinervo and Curt M. Lively, The rock-paper-scissors game and the evolution of alternative male strategies, Nature 380 (1996), 240–243.

• John Baez, Diversity, entropy and thermodynamics.

• John Baez, Information geometry.

The last reference contains proofs of the equations shown in red in my slides.
In particular, Part 16 contains a proof of my updated version of Fisher’s fundamental theorem.


Quantifying Biological Complexity

23 January, 2017

Next week I’m going to this workshop:

Biological Complexity: Can It Be Quantified?, 1-3 February 2017, Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe Arizona. Organized by Paul Davies.

I haven’t heard that any of it will be made publicly available, but I’ll see if there’s something I can show you. Here’s the schedule:

Wednesday February 1st

9:00 – 9:30 am Paul Davies

Brief welcome address, outline of the subject and aims of the meeting

Session 1. Life: do we know it when we see it?

9:30 – 10:15 am: Chris McKay, “Mission to Enceladus”

10:15 – 10:45 am: Discussion

10:45– 11:15 am: Tea/coffee break

11:15 – 12:00 pm: Kate Adamala, “Alive but not life”

12:00 – 12:30 pm: Discussion

12:30 – 2:00 pm: Lunch

Session 2. Quantifying life

2:00 – 2:45 pm: Lee Cronin, “The living and the dead: molecular signatures of life”

2:45 – 3:30 pm: Sara Walker, “Can we build a life meter?”

3:30 – 4:00 pm: Discussion

4:00 – 4:30 pm: Tea/coffee break

4:30 – 5:15 pm: Manfred Laubichler, “Complexity is smaller than you think”

5:15 – 5:30 pm: Discussion

The Beyond Annual Lecture

7:00 – 8:30 pm: Sean Carroll, “Our place in the universe”

Thursday February 2nd

Session 3: Life, information and the second law of thermodynamics

9:00 – 9:45 am: James Crutchfield, “Vital bits: the fuel of life”

9:45 – 10:00 am: Discussion

10:00 – 10:45 pm: John Baez, “Information and entropy in biology”

10:45 – 11:00 am: Discussion

11:00 – 11:30 pm: Tea/coffee break

11:30 – 12:15 pm: Chris Adami, “What is biological information?”

12:15 – 12:30 pm: Discussion

12:30 – 2:00 pm: Lunch

Session 4: The emergence of agency

2:00 – 2:45 pm: Olaf Khang Witkowski, “When do autonomous agents act collectively?”

2:45 – 3:00 pm: Discussion

3:00 – 3:45 pm: William Marshall, “When macro beats micro”

3:45 – 4:00 pm: Discussion

4:00 – 4:30 am: Tea/coffee break

4:30 – 5:15pm: Alexander Boyd, “Biology’s demons”

5:15 – 5:30 pm: Discussion

Friday February 3rd

Session 5: New physics?

9:00 – 9:45 am: Sean Carroll, “Laws of complexity, laws of life?”

9:45 – 10:00 am: Discussion

10:00 – 10:45 am: Andreas Wagner, “The arrival of the fittest”

10:45 – 11:00 am: Discussion

11:00 – 11:30 am: Tea/coffee break

11:30 – 12:30 pm: George Ellis, “Top-down causation demands new laws”

12:30 – 2:00 pm: Lunch


Information Processing and Biology

7 November, 2016

santa_fe_institute

The Santa Fe Institute, in New Mexico, is a place for studying complex systems. I’ve never been there! Next week I’ll go there to give a colloquium on network theory, and also to participate in this workshop:

Statistical Mechanics, Information Processing and Biology, November 16–18, Santa Fe Institute. Organized by David Krakauer, Michael Lachmann, Manfred Laubichler, Peter Stadler, and David Wolpert.

Abstract. This workshop will address a fundamental question in theoretical biology: Does the relationship between statistical physics and the need of biological systems to process information underpin some of their deepest features? It recognizes that a core feature of biological systems is that they acquire, store and process information (i.e., perform computation). However to manipulate information in this way they require a steady flux of free energy from their environments. These two, inter-related attributes of biological systems are often taken for granted; they are not part of standard analyses of either the homeostasis or the evolution of biological systems. In this workshop we aim to fill in this major gap in our understanding of biological systems, by gaining deeper insight in the relation between the need for biological systems to process information and the free energy they need to pay for that processing.

The goal of this workshop is to address these issues by focusing on a set three specific questions: 1) How has the fraction of free energy flux on earth that is used by biological computation changed with time? 2) What is the free energy cost of biological computation or functioning? 3) What is the free energy cost of the evolution of biological computation or functioning? In all of these cases we are interested in the fundamental limits that the laws of physics impose on various aspects of living systems as expressed by these three questions.

I think it’s not open to the public, but I will try to blog about it. The speakers include a lot of experts on information theory, statistical mechanics, and biology. Here they are:

Wednesday November 16: Chris Jarzynski, Seth Lloyd, Artemy Kolchinski, John Baez, Manfred Laubichler, Harold de Vladar, Sonja Prohaska, Chris Kempes.

Thursday November 17: Phil Ball, Matina C. Donaldson-Matasci, Sebastian Deffner, David Wolpert, Daniel Polani, Christoph Flamm, Massimiliano Esposito, Hildegard Meyer-Ortmanns, Blake Pollard, Mikhail Prokopenko, Peter Stadler, Ben Machta.

Friday November 18: Jim Crutchfield, Sara Walker, Hyunju Kim, Takahiro Sagawa, Michael Lachmann, Wojciech Zurek, Christian Van den Broeck, Susanne Still, Chris Stephens.


Frigatebirds

18 July, 2016

 

Frigatebirds are amazing!

They have the largest ratio of wing area to body weight of any bird. This lets them fly very long distances while only rarely flapping their wings. They often stay in the air for weeks at time. And one being tracked by satellite in the Indian Ocean stayed aloft for two months.

Surprisingly for sea birds, they don’t go into the water. Their feathers aren’t waterproof. They are true creatures of the air. They snatch fish from the ocean surface using their long, hooked bills—and they often eat flying fish! They clean themselves in flight by flying low and wetting themselves at the water’s surface before preening themselves.

They live a long time: often over 35 years.

But here’s the cool new discovery:

Since the frigatebird spends most of its life at sea, its habits outside of when it breeds on land aren’t well-known—until researchers started tracking them around the Indian Ocean. What the researchers discovered is that the birds’ flying ability almost defies belief.

Ornithologist Henri Weimerskirch put satellite tags on a couple of dozen frigatebirds, as well as instruments that measured body functions such as heart rate. When the data started to come in, he could hardly believe how high the birds flew.

“First, we found, ‘Whoa, 1,500 meters. Wow. Excellent, fantastique,’ ” says Weimerskirch, who is with the National Center for Scientific Research in Paris. “And after 2,000, after 3,000, after 4,000 meters — OK, at this altitude they are in freezing conditions, especially surprising for a tropical bird.”

Four thousand meters is more than 12,000 feet, or as high as parts of the Rocky Mountains. “There is no other bird flying so high relative to the sea surface,” he says.

Weimerskirch says that kind of flying should take a huge amount of energy. But the instruments monitoring the birds’ heartbeats showed that the birds weren’t even working up a sweat. (They wouldn’t, actually, since birds don’t sweat, but their heart rate wasn’t going up.)

How did they do it? By flying into a cloud.

“It’s the only bird that is known to intentionally enter into a cloud,” Weimerskirch says. And not just any cloud—a fluffy, white cumulus cloud. Over the ocean, these clouds tend to form in places where warm air rises from the sea surface. The birds hitch a ride on the updraft, all the way up to the top of the cloud.

[…]

“Absolutely incredible,” says Curtis Deutsch, an oceanographer at the University of Washington. “They’re doing it right through these cumulus clouds. You know, if you’ve ever been on an airplane, flying through turbulence, you know it can be a little bit nerve-wracking.”

One of the tagged birds soared 40 miles without a wing-flap. Several covered more than 300 miles a day on average, and flew continuously for weeks.

• Christopher Joyce, Nonstop flight: how the frigatebird can soar for weeks without stopping, All Things Considered, National Public Radio, 30 June 2016.

Frigatebirds aren’t admirable in every way. They’re kleptoparasites—now there’s a word you don’t hear every day! That’s a name for animals that steal food:

Frigatebirds will rob other seabirds such as boobies, particularly the red-footed booby, tropicbirds, shearwaters, petrels, terns, gulls and even ospreys of their catch, using their speed and maneuverability to outrun and harass their victims until they regurgitate their stomach contents. They may either assail their targets after they have caught their food or circle high over seabird colonies waiting for parent birds to return laden with food.

Frigatebird, Wikipedia.


Bleaching of the Great Barrier Reef

22 April, 2016


The chatter of gossip distracts us from the really big story, the Anthropocene: the new geological era we are bringing about. Here’s something that should be dominating the headlines: Most of the Great Barrier Reef, the world’s largest coral reef system, now looks like a ghostly graveyard.

Most corals are colonies of tiny genetically identical animals called polyps. Over centuries, their skeletons build up reefs, which are havens for many kinds of sea life. Some polyps catch their own food using stingers. But most get their food by symbiosis! They cooperate with single-celled organism called zooxanthellae. Zooxanthellae get energy from the sun’s light. They actually live inside the polyps, and provide them with food. Most of the color of a coral reef comes from these zooxanthellae.

When a polyp is stressed, the zooxanthellae living inside it may decide to leave. This can happen when the sea water gets too hot. Without its zooxanthellae, the polyp is transparent and the coral’s white skeleton is revealed—as you see here. We say the coral is bleached.

After they bleach, the polyps begin to starve. If conditions return to normal fast enough, the zooxanthellae may come back. If they don’t, the coral will die.

The Great Barrier Reef, off the northeast coast of Australia, contains over 2,900 reefs and 900 islands. It’s huge: 2,300 kilometers long, with an area of about 340,000 square kilometers. It can be seen from outer space!

With global warming, this reef has been starting to bleach. Parts of it bleached in 1998 and again in 2002. But this year, with a big El Niño pushing world temperatures to new record highs, is the worst.

Scientists have being flying over the Great Barrier Reef to study the damage, and divers have looked at some of the reefs in detail. Of the 522 reefs surveyed in the northern sector, over 80% are severely bleached and less than 1% are not bleached at all. The damage is less further south where the water is cooler—but most of the reefs are in the north:



The top expert on coral reefs in Australia, Terry Hughes, wrote:

I showed the results of aerial surveys of bleaching on the Great Barrier Reef to my students. And then we wept.

Imagine devoting your life to studying and trying to protect coral reefs, and then seeing this.

Some of the bleached reefs may recover. But as oceans continue to warm, the prospects look bleak. The last big El Niño was in 1998. With a lot of hard followup work, scientists showed that in the end, 16% of the world’s corals died in that event.

This year is quite a bit hotter.

So, global warming is not a problem for the future: it’s a problem now. It’s not good enough to cut carbon emissions eventually. We’ve got to get serious now.

I need to recommit myself to this. For example, I need to stop flying around to conferences. I’ve cut back, but I need to do much better. Future generations, living in the damaged world we’re creating, will not have much sympathy for our excuses.


Statistical Laws of Darwinian Evolution

18 April, 2016

guest post by Matteo Smerlak

Biologists like Steven J. Gould like to emphasize that evolution is unpredictable. They have a point: there is absolutely no way an alien visiting the Earth 400 million years ago could have said:

Hey, I know what’s gonna happen here. Some descendants of those ugly fish will grow wings and start flying in the air. Others will walk the surface of the Earth for a few million years, but they’ll get bored and they’ll eventually go back to the oceans; when they do, they’ll be able to chat across thousands of kilometers using ultrasound. Yet others will grow arms, legs, fur, they’ll climb trees and invent BBQ, and, sooner or later, they’ll start wondering “why all this?”.

Nor can we tell if, a week from now, the flu virus will mutate, become highly pathogenic and forever remove the furry creatures from the surface of the Earth.

Evolution isn’t gravity—we can’t tell in which directions things will fall down.

One reason we can’t predict the outcomes of evolution is that genomes evolve in a super-high dimensional combinatorial space, which a ginormous number of possible turns at every step. Another is that living organisms interact with one another in a massively non-linear way, with, feedback loops, tipping points and all that jazz.

Life’s a mess, if you want my physicist’s opinion.

But that doesn’t mean that nothing can be predicted. Think of statistics. Nobody can predict who I’ll vote for in the next election, but it’s easy to tell what the distribution of votes in the country will be like. Thus, for continuous variables which arise as sums of large numbers of independent components, the central limit theorem tells us that the distribution will always be approximately normal. Or take extreme events: the max of N independent random variables is distributed according to a member of a one-parameter family of so-called “extreme value distributions”: this is the content of the famous Fisher–Tippett–Gnedenko theorem.

So this is the problem I want to think about in this blog post: is evolution ruled by statistical laws? Or, in physics terms: does it exhibit some form of universality?

Fitness distributions are the thing

One lesson from statistical physics is that, to uncover universality, you need to focus on relevant variables. In the case of evolution, it was Darwin’s main contribution to figure out the main relevant variable: the average number of viable offspring, aka fitness, of an organism. Other features—physical strength, metabolic efficiency, you name it—matter only insofar as they are correlated with fitness. If we further assume that fitness is (approximately) heritable, meaning that descendants have the same fitness as their ancestors, we get a simple yet powerful dynamical principle called natural selection: in a given population, the lineage with the highest fitness eventually dominates, i.e. its fraction goes to one over time. This principle is very general: it applies to genes and species, but also to non-living entities such as algorithms, firms or language. The general relevance of natural selection as a evolutionary force is sometimes referred to as “Universal Darwinism”.

The general idea of natural selection is pictured below (reproduced from this paper):

It’s not hard to write down an equation which expresses natural selection in general terms. Consider an infinite population in which each lineage grows with some rate x. (This rate is called the log-fitness or Malthusian fitness to contrast it with the number of viable offspring w=e^{x\Delta t} with \Delta t the lifetime of a generation. It’s more convenient to use x than w in what follows, so we’ll just call x “fitness”). Then the distribution of fitness at time t satisfies the equation

\displaystyle{ \frac{\partial p_t(x)}{\partial t} =\left(x-\int d y\, y\, p_t(y)\right)p_t(x) }

whose explicit solution in terms of the initial fitness distribution p_0(x):

\displaystyle{ p_t(x)=\frac{e^{x t}p_0(x)}{\int d y\, e^{y t}p_0(y)} }

is called the Cramér transform of p_0(x) in large deviations theory. That is, viewed as a flow in the space of probability distributions, natural selection is nothing but a time-dependent exponential tilt. (These equations and the results below can be generalized to include the effect of mutations, which are critical to maintain variation in the population, but we’ll skip this here to focus on pure natural selection. See my paper referenced below for more information.)

An immediate consequence of these equations is that the mean fitness \mu_t=\int dx\, x\, p_t(x) grows monotonically in time, with a rate of growth given by the variance \sigma_t^2=\int dx\, (x-\mu_t)^2\, p_t(x):

\displaystyle{ \frac{d\mu_t}{dt}=\sigma_t^2\geq 0 }

The great geneticist Ronald Fisher (yes, the one in the extreme value theorem!) was very impressed with this relationship. He thought it amounted to an biological version of the second law of thermodynamics, writing in his 1930 monograph

Professor Eddington has recently remarked that “The law that entropy always increases—the second law of thermodynamics—holds, I think, the supreme position among the laws of nature”. It is not a little instructive that so similar a law should hold the supreme position among the biological sciences.

Unfortunately, this excitement hasn’t been shared by the biological community, notably because this Fisher “fundamental theorem of natural selection” isn’t predictive: the mean fitness \mu_t grows according to the fitness variance \sigma_t^2, but what determines the evolution of \sigma_t^2? I can’t use the identity above to predict the speed of evolution in any sense. Geneticists say it’s “dynamically insufficient”.

Two limit theorems

But the situation isn’t as bad as it looks. The evolution of p_t(x) may be decomposed into the evolution of its mean \mu_t, of its variance \sigma_t^2, and of its shape or type

\overline{p}_t(x)=\sigma_t p_t(\sigma_t x+\mu_t).

(We also call \overline{p}_t(x) the “standardized fitness distribution”.) With Ahmed Youssef we showed that:

• If p_0(x) is supported on the whole real line and decays at infinity as

-\ln\int_x^{\infty}p_0(y)d y\underset{x\to\infty}{\sim} x^{\alpha}

for some \alpha > 1, then \mu_t\sim t^{\overline{\alpha}-1}, \sigma_t^2\sim t^{\overline{\alpha}-2} and \overline{p}_t(x) converges to the standard normal distribution as t\to\infty. Here \overline{\alpha} is the conjugate exponent to \alpha, i.e. 1/\overline{\alpha}+1/\alpha=1.

• If p_0(x) has a finite right-end point x_+ with

p(x)\underset{x\to x_+}{\sim} (x_+-x)^\beta

for some \beta\geq0, then x_+-\mu_t\sim t^{-1}, \sigma_t^2\sim t^{-2} and \overline{p}_t(x) converges to the flipped gamma distribution

\displaystyle{ p^*_\beta(x)= \frac{(1+\beta)^{(1+\beta)/2}}{\Gamma(1+\beta)} \Theta[x-(1+\beta)^{1/2}] }

\displaystyle { e^{-(1+\beta)^{1/2}[(1+\beta)^{1/2}-x]}\Big[(1+\beta)^{1/2}-x\Big]^\beta }

Here and below the symbol \sim means “asymptotically equivalent up to a positive multiplicative constant”; \Theta(x) is the Heaviside step function. Note that p^*_\beta(x) becomes Gaussian in the limit \beta\to\infty, i.e. the attractors of cases 1 and 2 form a continuous line in the space of probability distributions; the other extreme case, \beta\to0, corresponds to a flipped exponential distribution.

The one-parameter family of attractors p_\beta^*(x) is plotted below:

These results achieve two things. First, they resolve the dynamical insufficiency of Fisher’s fundamental theorem by giving estimates of the speed of evolution in terms of the tail behavior of the initial fitness distribution. Second, they show that natural selection is indeed subject to a form of universality, whereby the relevant statistical structure turns out to be finite dimensional, with only a handful of “conserved quantities” (the \alpha and \beta exponents) controlling the late-time behavior of natural selection. This amounts to a large reduction in complexity and, concomitantly, an enhancement of predictive power.

(For the mathematically-oriented reader, the proof of the theorems above involves two steps: first, translate the selection equation into a equation for (cumulant) generating functions; second, use a suitable Tauberian theorem—the Kasahara theorem—to relate the behavior of generating functions at large values of their arguments to the tail behavior of p_0(x). Details in our paper.)

It’s useful to consider the convergence of fitness distributions to the attractors p_\beta^*(x) for 0\leq\beta\leq \infty in the skewness-kurtosis plane, i.e. in terms of the third and fourth cumulants of p_t(x).

The red curve is the family of attractors, with the normal at the bottom right and the flipped exponential at the top left, and the dots correspond to numerical simulations performed with the classical Wright–Fisher model and with a simple genetic algorithm solving a linear programming problem. The attractors attract!

Conclusion and a question

Statistics is useful because limit theorems (the central limit theorem, the extreme value theorem) exist. Without them, we wouldn’t be able to make any population-level prediction. Same with statistical physics: it only because matter consists of large numbers of atoms, and limit theorems hold (the H-theorem, the second law), that macroscopic physics is possible in the first place. I believe the same perspective is useful in evolutionary dynamics: it’s true that we can’t predict how many wings birds will have in ten million years, but we can tell what shape fitness distributions should have if natural selection is true.

I’ll close with an open question for you, the reader. In the central limit theorem as well as in the second law of thermodynamics, convergence is driven by a Lyapunov function, namely entropy. (In the case of the central limit theorem, it’s a relatively recent result by Arstein et al.: the entropy of the normalized sum of n i.i.d. random variables, when it’s finite, is a monotonically increasing function of n.) In the case of natural selection for unbounded fitness, it’s clear that entropy will also be eventually monotonically increasing—the normal is the distribution with largest entropy at fixed variance and mean.

Yet it turns out that, in our case, entropy isn’t monotonic at all times; in fact, the closer the initial distribution p_0(x) is to the normal distribution, the later the entropy of the standardized fitness distribution starts to increase. Or, equivalently, the closer the initial distribution p_0(x) to the normal, the later its relative entropy with respect to the normal. Why is this? And what’s the actual Lyapunov function for this process (i.e., what functional of the standardized fitness distribution is monotonic at all times under natural selection)?

In the plots above the blue, orange and green lines correspond respectively to

\displaystyle{ p_0(x)\propto e^{-x^2/2-x^4}, \quad p_0(x)\propto e^{-x^2/2-.01x^4}, \quad p_0(x)\propto e^{-x^2/2-.001x^4} }

References

• S. J. Gould, Wonderful Life: The Burgess Shale and the Nature of History, W. W. Norton & Co., New York, 1989.

• M. Smerlak and A. Youssef, Limiting fitness distributions in evolutionary dynamics, 2015.

• R. A. Fisher, The Genetical Theory of Natural Selection, Oxford University Press, Oxford, 1930.

• S. Artstein, K. Ball, F. Barthe and A. Naor, Solution of Shannon’s problem on the monotonicity of entropy, J. Am. Math. Soc. 17 (2004), 975–982.