Fisher’s Fundamental Theorem (Part 4)

13 July, 2021

I wrote a paper that summarizes my work connecting natural selection to information theory:

• John Baez, The fundamental theorem of natural selection.

Check it out! If you have any questions or see any mistakes, please let me know.

Just for fun, here’s the abstract and introduction.

Abstract. Suppose we have n different types of self-replicating entity, with the population P_i of the ith type changing at a rate equal to P_i times the fitness f_i of that type. Suppose the fitness f_i is any continuous function of all the populations P_1, \dots, P_n. Let p_i be the fraction of replicators that are of the ith type. Then p = (p_1, \dots, p_n) is a time-dependent probability distribution, and we prove that its speed as measured by the Fisher information metric equals the variance in fitness. In rough terms, this says that the speed at which information is updated through natural selection equals the variance in fitness. This result can be seen as a modified version of Fisher’s fundamental theorem of natural selection. We compare it to Fisher’s original result as interpreted by Price, Ewens and Edwards.

Introduction

In 1930, Fisher stated his “fundamental theorem of natural selection” as follows:

The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time

Some tried to make this statement precise as follows:

The time derivative of the mean fitness of a population equals the variance of its fitness.

But this is only true under very restrictive conditions, so a controversy was ignited.

An interesting resolution was proposed by Price, and later amplified by Ewens and Edwards. We can formalize their idea as follows. Suppose we have n types of self-replicating entity, and idealize the population of the ith type as a real-valued function P_i(t). Suppose

\displaystyle{ \frac{d}{dt} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t) }

where the fitness f_i is a differentiable function of the populations of every type of replicator. The mean fitness at time t is

\displaystyle{ \overline{f}(t) = \sum_{i=1}^n p_i(t) \, f_i(P_1(t), \dots, P_n(t)) }

where p_i(t) is the fraction of replicators of the ith type:

\displaystyle{ p_i(t) = \frac{P_i(t)}{\phantom{\Big|} \sum_{j = 1}^n P_j(t) } }

By the product rule, the rate of change of the mean fitness is the sum of two terms:

\displaystyle{ \frac{d}{dt} \overline{f}(t) = \sum_{i=1}^n \dot{p}_i(t) \, f_i(P_1(t), \dots, P_n(t)) \; + \; }

\displaystyle{ \sum_{i=1}^n p_i(t) \,\frac{d}{dt} f_i(P_1(t), \dots, P_n(t)) }

The first of these two terms equals the variance of the fitness at time t. We give the easy proof in Theorem 1. Unfortunately, the conceptual significance of this first term is much less clear than that of the total rate of change of mean fitness. Ewens concluded that “the theorem does not provide the substantial biological statement that Fisher claimed”.

But there is another way out, based on an idea Fisher himself introduced in 1922: Fisher information. Fisher information gives rise to a Riemannian metric on the space of probability distributions on a finite set, called the ‘Fisher information metric’—or in the context of evolutionary game theory, the ‘Shahshahani metric’. Using this metric we can define the speed at which a time-dependent probability distribution changes with time. We call this its ‘Fisher speed’. Under just the assumptions already stated, we prove in Theorem 2 that the Fisher speed of the probability distribution

p(t) = (p_1(t), \dots, p_n(t))

is the variance of the fitness at time t.

As explained by Harper, natural selection can be thought of as a learning process, and studied using ideas from information geometry—that is, the geometry of the space of probability distributions. As p(t) changes with time, the rate at which information is updated is closely connected to its Fisher speed. Thus, our revised version of the fundamental theorem of natural selection can be loosely stated as follows:

As a population changes with time, the rate at which information is updated equals the variance of fitness.

The precise statement, with all the hypotheses, is in Theorem 2. But one lesson is this: variance in fitness may not cause ‘progress’ in the sense of increased mean fitness, but it does cause change!

For more details in a user-friendly blog format, read the whole series:

Part 1: the obscurity of Fisher’s original paper.

Part 2: a precise statement of Fisher’s fundamental theorem of natural selection, and conditions under which it holds.

Part 3: a modified version of the fundamental theorem of natural selection, which holds much more generally.

Part 4: my paper on the fundamental theorem of natural selection.


Nonequilibrium Thermodynamics in Biology (Part 2)

16 June, 2021

Larry Li, Bill Cannon and I ran a session on non-equilibrium thermodynamics in biology at SMB2021, the annual meeting of the Society for Mathematical Biology. You can see talk slides here!

Here’s the basic idea:

Since Lotka, physical scientists have argued that living things belong to a class of complex and orderly systems that exist not despite the second law of thermodynamics, but because of it. Life and evolution, through natural selection of dissipative structures, are based on non-equilibrium thermodynamics. The challenge is to develop an understanding of what the respective physical laws can tell us about flows of energy and matter in living systems, and about growth, death and selection. This session addresses current challenges including understanding emergence, regulation and control across scales, and entropy production, from metabolism in microbes to evolving ecosystems.

Click on the links to see slides for most of the talks:

Persistence, permanence, and global stability in reaction network models: some results inspired by thermodynamic principles
Gheorghe Craciun, University of Wisconsin–Madison

The standard mathematical model for the dynamics of concentrations in biochemical networks is called mass-action kinetics. We describe mass-action kinetics and discuss the connection between special classes of mass-action systems (such as detailed balanced and complex balanced systems) and the Boltzmann equation. We also discuss the connection between the ‘global attractor conjecture’ for complex balanced mass-action systems and Boltzmann’s H-theorem. We also describe some implications for biochemical mechanisms that implement noise filtering and cellular homeostasis.

The principle of maximum caliber of nonequilibria
Ken Dill, Stony Brook University

Maximum Caliber is a principle for inferring pathways and rate distributions of kinetic processes. The structure and foundations of MaxCal are much like those of Maximum Entropy for static distributions. We have explored how MaxCal may serve as a general variational principle for nonequilibrium statistical physics—giving well-known results, such as the Green-Kubo relations, Onsager’s reciprocal relations and Prigogine’s Minimum Entropy Production principle near equilibrium, but is also applicable far from equilibrium. I will also discuss some applications, such as finding reaction coordinates in molecular simulations non-linear dynamics in gene circuits, power-law-tail distributions in ‘social-physics’ networks, and others.

Nonequilibrium biomolecular information processes
Pierre Gaspard, Université libre de Bruxelles

Nearly 70 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of genetic information processing in DNA replication, transcription, and translation remain poorly understood. These template-directed copolymerization processes are running away from equilibrium, being powered by extracellular energy sources. Recent advances show that their kinetic equations can be exactly solved in terms of so-called iterated function systems. Remarkably, iterated function systems can determine the effects of genome sequence on replication errors, up to a million times faster than kinetic Monte Carlo algorithms. With these new methods, fundamental links can be established between molecular information processing and the second law of thermodynamics, shedding a new light on genetic drift, mutations, and evolution.

Nonequilibrium dynamics of disturbed ecosystems
John Harte, University of California, Berkeley

The Maximum Entropy Theory of Ecology (METE) predicts the shapes of macroecological metrics in relatively static ecosystems, across spatial scales, taxonomic categories, and habitats, using constraints imposed by static state variables. In disturbed ecosystems, however, with time-varying state variables, its predictions often fail. We extend macroecological theory from static to dynamic, by combining the MaxEnt inference procedure with explicit mechanisms governing disturbance. In the static limit, the resulting theory, DynaMETE, reduces to METE but also predicts a new scaling relationship among static state variables. Under disturbances, expressed as shifts in demographic, ontogenic growth, or migration rates, DynaMETE predicts the time trajectories of the state variables as well as the time-varying shapes of macroecological metrics such as the species abundance distribution and the distribution of metabolic rates over
individuals. An iterative procedure for solving the dynamic theory is presented. Characteristic signatures of the deviation from static predictions of macroecological patterns are shown to result from different kinds of disturbance. By combining MaxEnt inference with explicit dynamical mechanisms of disturbance, DynaMETE is a candidate theory of macroecology for ecosystems responding to anthropogenic or natural disturbances.

Stochastic chemical reaction networks
Supriya Krishnamurthy, Stockholm University

The study of chemical reaction networks (CRN’s) is a very active field. Earlier well-known results (Feinberg Chem. Enc. Sci. 42 2229 (1987), Anderson et al Bull. Math. Biol. 72 1947 (2010)) identify a topological quantity called deficiency, easy to compute for CRNs of any size, which, when exactly equal to zero, leads to a unique factorized (non-equilibrium) steady-state for these networks. No general results exist however for the steady states of non-zero-deficiency networks. In recent work, we show how to write the full moment-hierarchy for any non-zero-deficiency CRN obeying mass-action kinetics, in terms of equations for the factorial moments. Using these, we can recursively predict values for lower moments from higher moments, reversing the procedure usually used to solve moment hierarchies. We show, for non-trivial examples, that in this manner we can predict any moment of interest, for CRN’s with non-zero deficiency and non-factorizable steady states. It is however an open question how scalable these techniques are for large networks.

Heat flows adjust local ion concentrations in favor of prebiotic chemistry
Christof Mast, Ludwig-Maximilians-Universität München

Prebiotic reactions often require certain initial concentrations of ions. For example, the activity of RNA enzymes requires a lot of divalent magnesium salt, whereas too much monovalent sodium salt leads to a reduction in enzyme function. However, it is known from leaching experiments that prebiotically relevant geomaterial such as basalt releases mainly a lot of sodium and only little magnesium. A natural solution to this problem is heat fluxes through thin rock fractures, through which magnesium is actively enriched and sodium is depleted by thermogravitational convection and thermophoresis. This process establishes suitable conditions for ribozyme function from a basaltic leach. It can take place in a spatially distributed system of rock cracks and is therefore particularly stable to natural fluctuations and disturbances.

Deficiency of chemical reaction networks and thermodynamics
Matteo Polettini, University of Luxembourg

Deficiency is a topological property of a Chemical Reaction Network linked to important dynamical features, in particular of deterministic fixed points and of stochastic stationary states. Here we link it to thermodynamics: in particular we discuss the validity of a strong vs. weak zeroth law, the existence of time-reversed mass-action kinetics, and the possibility to formulate marginal fluctuation relations. Finally we illustrate some subtleties of the Python module we created for MCMC stochastic simulation of CRNs, soon to be made public.

Large deviations theory and emergent landscapes in biological dynamics
Hong Qian, University of Washington

The mathematical theory of large deviations provides a nonequilibrium thermodynamic description of complex biological systems that consist of heterogeneous individuals. In terms of the notions of stochastic elementary reactions and pure kinetic species, the continuous-time, integer-valued Markov process dictates a thermodynamic structure that generalizes (i) Gibbs’ microscopic chemical thermodynamics of equilibrium matters to nonequilibrium small systems such as living cells and tissues; and (ii) Gibbs’ potential function to the landscapes for biological dynamics, such as that of C. H. Waddington and S. Wright.

Using the maximum entropy production principle to understand and predict microbial biogeochemistry
Joseph Vallino, Marine Biological Laboratory, Woods Hole

Natural microbial communities contain billions of individuals per liter and can exceed a trillion cells per liter in sediments, as well as harbor thousands of species in the same volume. The high species diversity contributes to extensive metabolic functional capabilities to extract chemical energy from the environment, such as methanogenesis, sulfate reduction, anaerobic photosynthesis, chemoautotrophy, and many others, most of which are only expressed by bacteria and archaea. Reductionist modeling of natural communities is problematic, as we lack knowledge on growth kinetics for most organisms and have even less understanding on the mechanisms governing predation, viral lysis, and predator avoidance in these systems. As a result, existing models that describe microbial communities contain dozens to hundreds of parameters, and state variables are extensively aggregated. Overall, the models are little more than non-linear parameter fitting exercises that have limited, to no, extrapolation potential, as there are few principles governing organization and function of complex self-assembling systems. Over the last decade, we have been developing a systems approach that models microbial communities as a distributed metabolic network that focuses on metabolic function rather than describing individuals or species. We use an optimization approach to determine which metabolic functions in the network should be up regulated versus those that should be down regulated based on the non-equilibrium thermodynamics principle of maximum entropy production (MEP). Derived from statistical mechanics, MEP proposes that steady state systems will likely organize to maximize free energy dissipation rate. We have extended this conjecture to apply to non-steady state systems and have proposed that living systems maximize entropy production integrated over time and space, while non-living systems maximize instantaneous entropy production. Our presentation will provide a brief overview of the theory and approach, as well as present several examples of applying MEP to describe the biogeochemistry of microbial systems in laboratory experiments and natural ecosystems.

Reduction and the quasi-steady state approximation
Carsten Wiuf, University of Copenhagen

Chemical reactions often occur at different time-scales. In applications of chemical reaction network theory it is often desirable to reduce a reaction network to a smaller reaction network by elimination of fast species or fast reactions. There exist various techniques for doing so, e.g. the Quasi-Steady-State Approximation or the Rapid Equilibrium Approximation. However, these methods are not always mathematically justifiable. Here, a method is presented for which (so-called) non-interacting species are eliminated by means of QSSA. It is argued that this method is mathematically sound. Various examples are given (Michaelis-Menten mechanism, two-substrate mechanism, …) and older related techniques from the 50s and 60s are briefly discussed.


Non-Equilibrium Thermodynamics in Biology (Part 1)

11 May, 2021

Together with William Cannon and Larry Li, I’m helping run a minisymposium as part of SMB2021, the annual meeting of the Society for Mathematical Biology:

• Non-equilibrium Thermodynamics in Biology: from Chemical Reaction Networks to Natural Selection, Monday June 14, 2021, beginning 9:30 am Pacific Time.

You can register for free here before May 31st, 11:59 pm Pacific Time. You need to register to watch the talks live on Zoom. I think the talks will be recorded.

Here’s the idea:

Abstract: Since Lotka, physical scientists have argued that living things belong to a class of complex and orderly systems that exist not despite the second law of thermodynamics, but because of it. Life and evolution, through natural selection of dissipative structures, are based on non-equilibrium thermodynamics. The challenge is to develop an understanding of what the respective physical laws can tell us about flows of energy and matter in living systems, and about growth, death and selection. This session will address current challenges including understanding emergence, regulation and control across scales, and entropy production, from metabolism in microbes to evolving ecosystems.

It’s exciting to me because I want to get back into work on thermodynamics and reaction networks, and we’ll have some excellent speakers on these topics. I think the talks will be in this order… later I will learn the exact schedule.

Christof Mast, Ludwig-Maximilians-Universität München

Coauthors: T. Matreux, K. LeVay, A. Schmid, P. Aikkila, L. Belohlavek, Z. Caliskanoglu, E. Salibi, A. Kühnlein, C. Springsklee, B. Scheu, D. B. Dingwell, D. Braun, H. Mutschler.

Title: Heat flows adjust local ion concentrations in favor of prebiotic chemistry

Abstract: Prebiotic reactions often require certain initial concentrations of ions. For example, the activity of RNA enzymes requires a lot of divalent magnesium salt, whereas too much monovalent sodium salt leads to a reduction in enzyme function. However, it is known from leaching experiments that prebiotically relevant geomaterial such as basalt releases mainly a lot of sodium and only little magnesium. A natural solution to this problem is heat fluxes through thin rock fractures, through which magnesium is actively enriched and sodium is depleted by thermogravitational convection and thermophoresis. This process establishes suitable conditions for ribozyme function from a basaltic leach. It can take place in a spatially distributed system of rock cracks and is therefore particularly stable to natural fluctuations and disturbances.

Supriya Krishnamurthy, Stockholm University

Coauthors: Eric Smith

Title: Stochastic chemical reaction networks

Abstract: The study of chemical reaction networks (CRNs) is a very active field. Earlier well-known results (Feinberg Chem. Enc. Sci. 42 2229 (1987), Anderson et al Bull. Math. Biol. 72 1947 (2010)) identify a topological quantity called deficiency, easy to compute for CRNs of any size, which, when exactly equal to zero, leads to a unique factorized (non-equilibrium) steady-state for these networks. No general results exist however for the steady states of non-zero-deficiency networks. In recent work, we show how to write the full moment-hierarchy for any non-zero-deficiency CRN obeying mass-action kinetics, in terms of equations for the factorial moments. Using these, we can recursively predict values for lower moments from higher moments, reversing the procedure usually used to solve moment hierarchies. We show, for non-trivial examples, that in this manner we can predict any moment of interest, for CRNs with non-zero deficiency and non-factorizable steady states. It is however an open question how scalable these techniques are for large networks.

Pierre Gaspard, Université libre de Bruxelles

Title: Nonequilibrium biomolecular information processes

Abstract: Nearly 70 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of genetic information processing in DNA replication, transcription, and translation remain poorly understood. These template-directed copolymerization processes are running away from equilibrium, being powered by extracellular energy sources. Recent advances show that their kinetic equations can be exactly solved in terms of so-called iterated function systems. Remarkably, iterated function systems can determine the effects of genome sequence on replication errors, up to a million times faster than kinetic Monte Carlo algorithms. With these new methods, fundamental links can be established between molecular information processing and the second law of thermodynamics, shedding a new light on genetic drift, mutations, and evolution.

Carsten Wiuf, University of Copenhagen

Coauthors: Elisenda Feliu, Sebastian Walcher, Meritxell Sáez

Title: Reduction and the Quasi-Steady State Approximation

Abstract: Chemical reactions often occur at different time-scales. In applications of chemical reaction network theory it is often desirable to reduce a reaction network to a smaller reaction network by elimination of fast species or fast reactions. There exist various techniques for doing so, e.g. the Quasi-Steady-State Approximation or the Rapid Equilibrium Approximation. However, these methods are not always mathematically justifiable. Here, a method is presented for which (so-called) non-interacting species are eliminated by means of QSSA. It is argued that this method is mathematically sound. Various examples are given (Michaelis-Menten mechanism, two-substrate mechanism, …) and older related techniques from the 50-60ies are briefly discussed.

Matteo Polettini, University of Luxembourg

Coauthor: Tobias Fishback

Title: Deficiency of chemical reaction networks and thermodynamics

Abstract: Deficiency is a topological property of a Chemical Reaction Network linked to important dynamical features, in particular of deterministic fixed points and of stochastic stationary states. Here we link it to thermodynamics: in particular we discuss the validity of a strong vs. weak zeroth law, the existence of time-reversed mass-action kinetics, and the possibility to formulate marginal fluctuation relations. Finally we illustrate some subtleties of the Python module we created for MCMC stochastic simulation of CRNs, soon to be made public.

Ken Dill, Stony Brook University

Title: The principle of maximum caliber of nonequilibria

Abstract: Maximum Caliber is a principle for inferring pathways and rate distributions of kinetic processes. The structure and foundations of MaxCal are much like those of Maximum Entropy for static distributions. We have explored how MaxCal may serve as a general variational principle for nonequilibrium statistical physics – giving well-known results, such as the Green-Kubo relations, Onsager’s reciprocal relations and Prigogine’s Minimum Entropy Production principle near equilibrium, but is also applicable far from equilibrium. I will also discuss some applications, such as finding reaction coordinates in molecular simulations non-linear dynamics in gene circuits, power-law-tail distributions in “social-physics” networks, and others.

Joseph Vallino, Marine Biological Laboratory, Woods Hole

Coauthors: Ioannis Tsakalakis, Julie A. Huber

Title: Using the maximum entropy production principle to understand and predict microbial biogeochemistry

Abstract: Natural microbial communities contain billions of individuals per liter and can exceed a trillion cells per liter in sediments, as well as harbor thousands of species in the same volume. The high species diversity contributes to extensive metabolic functional capabilities to extract chemical energy from the environment, such as methanogenesis, sulfate reduction, anaerobic photosynthesis, chemoautotrophy, and many others, most of which are only expressed by bacteria and archaea. Reductionist modeling of natural communities is problematic, as we lack knowledge on growth kinetics for most organisms and have even less understanding on the mechanisms governing predation, viral lysis, and predator avoidance in these systems. As a result, existing models that describe microbial communities contain dozens to hundreds of parameters, and state variables are extensively aggregated. Overall, the models are little more than non-linear parameter fitting exercises that have limited, to no, extrapolation potential, as there are few principles governing organization and function of complex self-assembling systems. Over the last decade, we have been developing a systems approach that models microbial communities as a distributed metabolic network that focuses on metabolic function rather than describing individuals or species. We use an optimization approach to determine which metabolic functions in the network should be up regulated versus those that should be down regulated based on the non-equilibrium thermodynamics principle of maximum entropy production (MEP). Derived from statistical mechanics, MEP proposes that steady state systems will likely organize to maximize free energy dissipation rate. We have extended this conjecture to apply to non-steady state systems and have proposed that living systems maximize entropy production integrated over time and space, while non-living systems maximize instantaneous entropy production. Our presentation will provide a brief overview of the theory and approach, as well as present several examples of applying MEP to describe the biogeochemistry of microbial systems in laboratory experiments and natural ecosystems.

Gheorge Craciun, University of Wisconsin-Madison

Title: Persistence, permanence, and global stability in reaction network models: some results inspired by thermodynamic principles

Abstract: The standard mathematical model for the dynamics of concentrations in biochemical networks is called mass-action kinetics. We describe mass-action kinetics and discuss the connection between special classes of mass-action systems (such as detailed balanced and complex balanced systems) and the Boltzmann equation. We also discuss the connection between the “global attractor conjecture” for complex balanced mass-action systems and Boltzmann’s H-theorem. We also describe some implications for biochemical mechanisms that implement noise filtering and cellular homeostasis.

Hong Qian, University of Washington

Title: Large deviations theory and emergent landscapes in biological dynamics

Abstract: The mathematical theory of large deviations provides a nonequilibrium thermodynamic description of complex biological systems that consist of heterogeneous individuals. In terms of the notions of stochastic elementary reactions and pure kinetic species, the continuous-time, integer-valued Markov process dictates a thermodynamic structure that generalizes (i) Gibbs’ macroscopic chemical thermodynamics of equilibrium matters to nonequilibrium small systems such as living cells and tissues; and (ii) Gibbs’ potential function to the landscapes for biological dynamics, such as that of C. H. Waddington’s and S. Wright’s.

John Harte, University of Berkeley

Coauthors: Micah Brush, Kaito Umemura

Title: Nonequilibrium dynamics of disturbed ecosystems

Abstract: The Maximum Entropy Theory of Ecology (METE) predicts the shapes of macroecological metrics in relatively static ecosystems, across spatial scales, taxonomic categories, and habitats, using constraints imposed by static state variables. In disturbed ecosystems, however, with time-varying state variables, its predictions often fail. We extend macroecological theory from static to dynamic, by combining the MaxEnt inference procedure with explicit mechanisms governing disturbance. In the static limit, the resulting theory, DynaMETE, reduces to METE but also predicts a new scaling relationship among static state variables. Under disturbances, expressed as shifts in demographic, ontogenic growth, or migration rates, DynaMETE predicts the time trajectories of the state variables as well as the time-varying shapes of macroecological metrics such as the species abundance distribution and the distribution of metabolic rates over individuals. An iterative procedure for solving the dynamic theory is presented. Characteristic signatures of the deviation from static predictions of macroecological patterns are shown to result from different kinds of disturbance. By combining MaxEnt inference with explicit dynamical mechanisms of disturbance, DynaMETE is a candidate theory of macroecology for ecosystems responding to anthropogenic or natural disturbances.


Serpentine Barrens

19 February, 2021

 

This is the Soldiers Delight Natural Environmental Area, a nature reserve in Maryland. The early colonial records of Maryland describe the area as a hunting ground for Native Americans. In 1693, rangers in the King’s service from a nearby garrison patrolled this area and named it Soldiers Delight, for some unknown reason.

It may not look like much, but that’s exactly the point! In this otherwise lush land, why does it look like nothing but grass and a few scattered trees are growing here?

It’s because this area is a serpentine barrens. Serpentine is a kind of rock: actually a class of closely related minerals which get their name from their smooth or scaly green appearance.

Soils formed from serpentine are toxic to many plants because they have lots of nickel, chromium, and cobalt! Plants are also discouraged by how these soils have little potassium and phosphorus, not much calcium, and too much magnesium. Serpentine, you see, is made of magnesium, silicon, iron, hydrogen and oxygen.

As a result, the plants that actually do well in serpentine barrens are very specialized: some small beautiful flowers, for example. Indeed, there are nature reserves devoted to protecting these! One of the most dramatic is the Tablelands of Gros Morne National Park in Newfoundland:

Scott Weidensaul writes this about the Tablelands:

These are hardly garden spots, and virtually no animals live here except for birds and the odd caribou passing through. Yet some plants manage to eke out a living. Balsam ragwort, a relative of the cat’s-paw ragwort of the shale barrens, has managed to cope with the toxins and can tolerate up to 12 percent of its dry weight in magnesium—a concentration that would level most flowers. Even the common pitcher-plant, a species normally associated with bogs, has a niche in this near-desert, growing along the edges of spring seeps where subsurface water brings up a little calcium. By supplementing soil nourishment with a diet of insects trapped in its upright tubes, the pitcher-plant is able to augment the Tablelands’ miserly offerings. Several other carnivorous plants, including sundews and butterwort, work the same trick on their environment.

In North America, serpentine barrens can be found in the Appalachian Mountains—Gros Morne is at the northern end of these, and further south are the Soldiers Delight Natural Environmental Area in Maryland, and the State Line Serpentine Barrens on the border of Maryland and Pennsylvania.

There are also serpentine barrens in the coastal ranges of California, Oregon, and Washington. Here are some well-adapted flowers in the Klamath-Siskiyou Mountains on the border of California and Oregon:

I first thought about serpentine when the Azimuth Project was exploring ways of sucking carbon dioxide from the air. If you grind up serpentine and get it wet, it will absorb carbon dioxide! A kilogram of serpentine can dispose about two-thirds of a kilogram of carbon dioxide. So, people have suggested this as a way to fight global warming.

Unfortunately we’re putting out over 37 gigatonnes of carbon dioxide per year. To absorb all of this we’d need to grind up about 55 gigatonnes of serpentine every year, spread it around, and get it wet. There’s plenty of serpentine available, but this is over ten times the amount of worldwide cement production, so it would take a lot of work. Then there’s the question of where to put all the ground-up rock.

And now I’ve learned that serpentine poses serious challenges to the growth of plant life! It doesn’t much matter, given that nobody seems eager to fight global warming by grinding up huge amounts of this rock. But it’s interesting.

Credits

The top picture of the Soldiers Delight Natural Environmental Area was taken by someone named Veggies. The picture of serpentine was apparently taken by Kluka. The Tablelands were photographed by Tango7174. All these are on Wikicommons. The quote comes from this wonderful book:

• Scott Weidensaul, Mountains of the Heart: A Natural History of the Appalachians, Fulcrum Publishing, 2016.

The picture of flowers in the Klamath-Siskiyous was taken by Susan Erwin and appears along with many other interesting things here:

Klamath-Siskiyou serpentines, U. S. Forest Service.

A quote:

It is crystal clear when you have entered the serpentine realm. There is no mistaking it, as the vegetation shift is sharp and dramatic. Full-canopied forests become sparse woodlands or barrens sometimes in a matter of a few feet. Dwarfed trees, low-lying shrubs, grassy patches, and rock characterize the dry, serpentine uplands. Carnivorous wetlands, meadows, and Port-Orford-cedar dominated riparian areas express the water that finds its way to the surface through fractured and faulted bedrock.

For more on serpentine, serpentinization, and serpentine barrens, try this blog article:

Serpentine, Hiker’s Notebook.

It’s enjoyable despite its misuse of the word ‘Weltanschauung’.


Shinise

2 December, 2020

 

The Japanese take pride in ‘shinise’: businesses that have lasted for hundreds or even thousands of years. This points out an interesting alternative to the goal of profit maximization: maximizing the time of survival.

• Ben Dooley and Hisako Ueno, This Japanese shop is 1,020 years old. It knows a bit about surviving crises, New York Times, 2 December 2020.

Such enterprises may be less dynamic than those in other countries. But their resilience offers lessons for businesses in places like the United States, where the coronavirus has forced tens of thousands into bankruptcy.

“If you look at the economics textbooks, enterprises are supposed to be maximizing profits, scaling up their size, market share and growth rate. But these companies’ operating principles are completely different,” said Kenji Matsuoka, a professor emeritus of business at Ryukoku University in Kyoto.

“Their No. 1 priority is carrying on,” he added. “Each generation is like a runner in a relay race. What’s important is passing the baton.”

Japan is an old-business superpower. The country is home to more than 33,000 with at least 100 years of history — over 40 percent of the world’s total. Over 3,100 have been running for at least two centuries. Around 140 have existed for more than 500 years. And at least 19 claim to have been continuously operating since the first millennium.

(Some of the oldest companies, including Ichiwa, cannot definitively trace their history back to their founding, but their timelines are accepted by the government, scholars and — in Ichiwa’s case — the competing mochi shop across the street.)

The businesses, known as “shinise,” are a source of both pride and fascination. Regional governments promote their products. Business management books explain the secrets of their success. And entire travel guides are devoted to them.

Of course if some businesses try to maximize time of survival, they may be small compared to businesses that are mainly trying to become “big”—at least if size is not the best road to long-term survival, which apparently it’s not. So we’ll have short-lived dinosaurs tromping around, and, dodging their footsteps, long-lived mice.

The idea of different organisms pursuing different strategies is familiar in ecology, where people talk about r-selected and K-selected organisms. The former “emphasize high growth rates, typically exploit less-crowded ecological niches, and produce many offspring, each of which has a relatively low probability of surviving to adulthood.” The latter “display traits associated with living at densities close to carrying capacity and typically are strong competitors in such crowded niches, that invest more heavily in fewer offspring, each of which has a relatively high probability of surviving to adulthood.”

But the contrast between r-selection and K-selection seems different to me than the contrast between profit maximization and lifespan maximization. As far as I know, no organism except humans deliberately tries to maximize the lifetime of anything.

And amusingly, the theory of r-selection versus K-selection may also be nearing the end of its life:

When Stearns reviewed the status of the theory in 1992, he noted that from 1977 to 1982 there was an average of 42 references to the theory per year in the BIOSIS literature search service, but from 1984 to 1989 the average dropped to 16 per year and continued to decline. He concluded that r/K theory was a once useful heuristic that no longer serves a purpose in life history theory.

For newer thoughts, see:

• D. Reznick, M. J. Bryant and F. Bashey, r-and K-selection revisited: the role of population regulation in life-history evolution, Ecology 83 (2002) 1509–1520.

See also:

• Innan Sasaki, How to build a business that lasts more than 200 years—lessons from Japan’s shinise companies, The Conversation, 6 June 2019.

Among other things, she writes:

We also found there to be a dark side to the success of these age-old shinise firms. At least half of the 17 companies we interviewed spoke of hardships in maintaining their high social status. They experienced peer pressure not to innovate (and solely focus on maintaining tradition) and had to make personal sacrifices to maintain their family and business continuity.

As the vice president of Shioyoshiken, a sweets company established in 1884, told us:

In a shinise, the firm is the same as the family. We need to sacrifice our own will and our own feelings and what we want to do … Inheriting and continuing the household is very important … We do not continue the business because we particularly like that industry. The fact that our family makes sweets is a coincidence. What is important is to continue the household as it is.

Innovations were sometimes discouraged by either the earlier family generation who were keen on maintaining the tradition, or peer shinise firms who cared about maintaining the tradition of the industry as a whole. Ultimately, we found that these businesses achieve such a long life through long-term sacrifice at both the personal and organisational level.


Fisher’s Fundamental Theorem (Part 3)

8 October, 2020

Last time we stated and proved a simple version of Fisher’s fundamental theorem of natural selection, which says that under some conditions, the rate of increase of the mean fitness equals the variance of the fitness. But the conditions we gave were very restrictive: namely, that the fitness of each species of replicator is constant, not depending on how many of these replicators there are, or any other replicators.

To broaden the scope of Fisher’s fundamental theorem we need to do one of two things:

1) change the left side of the equation: talk about some other quantity other than rate of change of mean fitness.

2) change the right side of the question: talk about some other quantity than the variance in fitness.

Or we could do both! People have spent a lot of time generalizing Fisher’s fundamental theorem. I don’t think there are, or should be, any hard rules on what counts as a generalization.

But today we’ll take alternative 1). We’ll show the square of something called the ‘Fisher speed’ always equals the variance in fitness. One nice thing about this result is that we can drop the restrictive condition I mentioned. Another nice thing is that the Fisher speed is a concept from information theory! It’s defined using the Fisher metric on the space of probability distributions.

And yes—that metric is named after the same guy who proved Fisher’s fundamental theorem! So, arguably, Fisher should have proved this generalization of Fisher’s fundamental theorem. But in fact it seems that I was the first to prove it, around February 1st, 2017. Some similar results were already known, and I will discuss those someday. But they’re a bit different.

A good way to think about the Fisher speed is that it’s ‘the rate at which information is being updated’. A population of replicators of different species gives a probability distribution. Like any probability distribution, this has information in it. As the populations of our replicators change, the Fisher speed says the rate at which this information is being updated. So, in simple terms, we’ll show

The square of the rate at which information is updated is equal to the variance in fitness.

This is quite a change from Fisher’s original idea, namely:

The rate of increase of mean fitness is equal to the variance in fitness.

But it has the advantage of always being true… as long the population dynamics are described by the general framework we introduced last time. So let me remind you of the general setup, and then prove the result!

The setup

We start out with population functions P_i \colon \mathbb{R} \to (0,\infty), one for each species of replicator i = 1,\dots,n, obeying the Lotka–Volterra equation

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) P_i }

for some differentiable functions f_i \colon (0,\infty) \to \mathbb{R} called fitness functions. The probability of a replicator being in the ith species is

\displaystyle{  p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }

Using the Lotka–Volterra equation we showed last time that these probabilities obey the replicator equation

\displaystyle{ \dot{p}_i = \left( f_i(P) - \overline f(P) \right)  p_i }

Here P is short for the whole list of populations (P_1(t), \dots, P_n(t)), and

\displaystyle{ \overline f(P) = \sum_j f_j(P) p_j  }

is the mean fitness.

The Fisher metric

The space of probability distributions on the set \{1, \dots, n\} is called the (n-1)-simplex

\Delta^{n-1} = \{ (x_1, \dots, x_n) : \; x_i \ge 0, \; \displaystyle{ \sum_{i=1}^n x_i = 1 } \}

It’s called \Delta^{n-1} because it’s (n-1)-dimensional. When n = 3 it looks like the letter \Delta:

The Fisher metric is a Riemannian metric on the interior of the (n-1)-simplex. That is, given a point p in the interior of \Delta^{n-1} and two tangent vectors v,w at this point the Fisher metric gives a number

g(v,w) = \displaystyle{ \sum_{i=1}^n \frac{v_i w_i}{p_i}  }

Here we are describing the tangent vectors v,w as vectors in \mathbb{R}^n with the property that the sum of their components is zero: that’s what makes them tangent to the (n-1)-simplex. And we’re demanding that x be in the interior of the simplex to avoid dividing by zero, since on the boundary of the simplex we have p_i = 0 for at least one choice of $i.$

If we have a probability distribution p(t) moving around in the interior of the (n-1)-simplex as a function of time, its Fisher speed is

\displaystyle{ \sqrt{g(\dot{p}(t), \dot{p}(t))} = \sqrt{\sum_{i=1}^n \frac{\dot{p}_i(t)^2}{p_i(t)}} }

if the derivative \dot{p}(t) exists. This is the usual formula for the speed of a curve moving in a Riemannian manifold, specialized to the case at hand.

Now we’ve got all the formulas we’ll need to prove the result we want. But for those who don’t already know and love it, it’s worthwhile saying a bit more about the Fisher metric.

The factor of 1/x_i in the Fisher metric changes the geometry of the simplex so that it becomes round, like a portion of a sphere:

But the reason the Fisher metric is important, I think, is its connection to relative information. Given two probability distributions p, q \in \Delta^{n-1}, the information of q relative to p is

\displaystyle{ I(q,p) = \sum_{i = 1}^n q_i \ln\left(\frac{q_i}{p_i}\right)   }

You can show this is the expected amount of information gained if p was your prior distribution and you receive information that causes you to update your prior to q. So, sometimes it’s called the information gain. It’s also called relative entropy or—my least favorite, since it sounds so mysterious—the Kullback–Leibler divergence.

Suppose p(t) is a smooth curve in the interior of the (n-1)-simplex. We can ask the rate at which information is gained as time passes. Perhaps surprisingly, a calculation gives

\displaystyle{ \frac{d}{dt} I(p(t), p(t_0))\Big|_{t = t_0} = 0 }

That is, in some sense ‘to first order’ no information is being gained at any moment t_0 \in \mathbb{R}. However, we have

\displaystyle{  \frac{d^2}{dt^2} I(p(t), p(t_0))\Big|_{t = t_0} =  g(\dot{p}(t_0), \dot{p}(t_0))}

So, the square of the Fisher speed has a nice interpretation in terms of relative entropy!

For a derivation of these last two equations, see Part 7 of my posts on information geometry. For more on the meaning of relative entropy, see Part 6.

The result

It’s now extremely easy to show what we want, but let me state it formally so all the assumptions are crystal clear.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the Lotka–Volterra equations:

\displaystyle{ \dot P_i = f_i(P) P_i}

for some differentiable functions f_i \colon (0,\infty)^n \to \mathbb{R} called fitness functions. Define probabilities and the mean fitness as above, and define the variance of the fitness by

\displaystyle{ \mathrm{Var}(f(P)) =  \sum_j ( f_j(P) - \overline f(P))^2 \, p_j }

Then if none of the populations P_i are zero, the square of the Fisher speed of the probability distribution p(t) = (p_1(t), \dots , p_n(t)) is the variance of the fitness:

g(\dot{p}, \dot{p})  = \mathrm{Var}(f(P))

Proof. The proof is near-instantaneous. We take the square of the Fisher speed:

\displaystyle{ g(\dot{p}, \dot{p}) = \sum_{i=1}^n \frac{\dot{p}_i(t)^2}{p_i(t)} }

and plug in the replicator equation:

\displaystyle{ \dot{p}_i = (f_i(P) - \overline f(P)) p_i }

We obtain:

\begin{array}{ccl} \displaystyle{ g(\dot{p}, \dot{p})} &=&  \displaystyle{ \sum_{i=1}^n \left( f_i(P) - \overline f(P) \right)^2 p_i } \\ \\  &=& \mathrm{Var}(f(P))  \end{array}

as desired.   █

It’s hard to imagine anything simpler than this. We see that given the Lotka–Volterra equation, what causes information to be updated is nothing more and nothing less than variance in fitness!


The whole series:

Part 1: the obscurity of Fisher’s original paper.

Part 2: a precise statement of Fisher’s fundamental theorem of natural selection, and conditions under which it holds.

Part 3: a modified version of the fundamental theorem of natural selection, which holds much more generally.

Part 4: my paper on the fundamental theorem of natural selection.


Markov Decision Processes

6 October, 2020

The National Institute of Mathematical and Biological Sciences is having an online seminar on ‘adaptive management’. It should be fun for people who want to understand Markov decision processes—like me!

NIMBioS Adaptive Management Webinar Series, 2020 October 26-29 (Monday-Thursday).

Adaptive management seeks to determine sound management strategies in the face of uncertainty concerning the behavior of the system being managed. Specifically, it attempts to find strategies for managing dynamic systems while learning the behavior of the system. These webinars review the key concept of a Markov Decision Process (MDP) and demonstrate how quantitative adaptive management strategies can be developed using MDPs. Additional conceptual, computational and application aspects will be discussed, including dynamic programming and Bayesian formalization of learning.

Here are the topics:

Session 1: Introduction to decision problems
Session 2: Introduction to Markov decision processes (MDPs)
Session 3: Solving Markov decision processes (MDPs)
Session 4: Modeling beliefs
Session 5: Conjugacy and discrete model adaptive management (AM)
Session 6: More on AM problems (Dirichlet/multinomial and Gaussian prior/likelihood)
Session 7: Partially observable Markov decision processes (POMDPs)
Session 8: Frontier topics (projection methods, approximate DP, communicating solutions)

 


Fisher’s Fundamental Theorem (Part 2)

29 September, 2020

Here’s how Fisher stated his fundamental theorem:

The rate of increase of fitness of any species is equal to the genetic variance in fitness.

But clearly this is only going to be true under some conditions!

A lot of early criticism of Fisher’s fundamental theorem centered on the fact that the fitness of a species can vary due to changing external conditions. For example: suppose the Sun goes supernova. The fitness of all organisms on Earth will suddenly drop. So the conclusions of Fisher’s theorem can’t hold under these circumstances.

I find this obvious and thus uninteresting. So, let’s tackle situations where the fitness changes due to changing external conditions later. But first let’s see what happens if the fitness isn’t changing for these external reasons.

What’s ‘fitness’, anyway? To define this we need a mathematical model of how populations change with time. We’ll start with a very simple, very general model. While it’s often used in population biology, it will have very little to do with biology per se. Indeed, the reason I’m digging into Fisher’s fundamental theorem is that it has a mathematical aspect that doesn’t require much knowledge of biology to understand. Applying it to biology introduces lots of complications and caveats, but that won’t be my main focus here. I’m looking for the simple abstract core.

The Lotka–Volterra equation

The Lotka–Volterra equation is a simplified model of how populations change with time. Suppose we have n different types of self-replicating entity. We will call these entities replicators. We will call the types of replicators species, but they do not need to be species in the biological sense!

For example, the replicators could be organisms of one single biological species, and the types could be different genotypes. Or the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains. In what follows these details won’t matter: we’ll have just have different ‘species’ of ‘replicators’.

Let P_i(t) or just P_i for short, be the population of the ith species at time t. We will treat this population as a differentiable real-valued function of time, which is a reasonable approximation when the population is fairly large.

Let’s assume the population obeys the Lotka–Volterra equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each function f_i depends in a differentiable way on all the populations. Thus each population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it depends on the populations of all the species.

We call f_i the fitness function of the ith species. Note: we are assuming this function does not depend on time.

To write the Lotka–Volterra equation more concisely, we can create a vector whose components are all the populations:

P = (P_1, \dots , P_n).

Let’s call this the population vector. In terms of the population vector, the Lotka–Volterra equation become

\displaystyle{ \dot P_i = f_i(P) P_i}

where the dot stands for a time derivative.

To define concepts like ‘mean fitness’ or ‘variance in fitness’ we need to introduce probability theory, and the replicator equation.

The replicator equation

Starting from the populations P_i, we can work out the probability p_i that a randomly chosen replicator belongs to the ith species. More precisely, this is the fraction of replicators belonging to that species:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

As a mnemonic, remember that the big Population P_i is being normalized to give a little probability p_i. I once had someone scold me for two minutes during a talk I was giving on this subject, for using lower-case and upper-case P’s to mean different things. But it’s my blog and I’ll do what I want to.

How do these probabilities p_i change with time? We can figure this out using the Lotka–Volterra equation. We pull out the trusty quotient rule and calculate:

\displaystyle{ \dot{p}_i = \frac{\dot{P}_i \left(\sum_j P_j\right) - P_i \left(\sum_j \dot{P}_j \right)}{\left(  \sum_j P_j \right)^2 } }

Then the Lotka–Volterra equation gives

\displaystyle{ \dot{p}_i = \frac{ f_i(P) P_i \; \left(\sum_j P_j\right) - P_i \left(\sum_j f_j(P) P_j \right)} {\left(  \sum_j P_j \right)^2 } }

Using the definition of p_i this simplifies and we get

\displaystyle{ \dot{p}_i =  f_i(P) p_i  - \left( \sum_j f_j(P) p_j \right) p_i }

The expression in parentheses here has a nice meaning: it is the mean fitness. In other words, it is the average, or expected, fitness of a replicator chosen at random from the whole population. Let us write it thus:

\displaystyle{ \overline f(P) = \sum_j f_j(P) p_j  }

This gives the replicator equation in its classic form:

\displaystyle{ \dot{p}_i = \left( f_i(P) - \overline f(P) \right) \, p_i }

where the dot stands for a time derivative. Thus, for the fraction of replicators of the ith species to increase, their fitness must exceed the mean fitness.

The moral is clear:

To become numerous you have to be fit.
To become predominant you have to be fitter than average.

This picture by David Wakeham illustrates the idea:

The fundamental theorem

What does the fundamental theorem of natural selection say, in this context? It says the rate of increase in mean fitness is equal to the variance of the fitness. As an equation, it says this:

\displaystyle{ \frac{d}{d t} \overline f(P) = \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

The left hand side is the rate of increase in mean fitness—or decrease, if it’s negative. The right hand side is the variance of the fitness: the thing whose square root is the standard deviation. This can never be negative!

A little calculation suggests that there’s no way in the world that this equation can be true without extra assumptions!

We can start computing the left hand side:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j(P) p_j } \\  \\  &=& \displaystyle{ \sum_j  \frac{d f_j(P)}{d t} \, p_j \; + \; f_j(P) \, \frac{d p_j}{d t} } \\ \\  &=& \displaystyle{ \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j \; + \; f_j(P) \dot{p}_j }  \end{array}

Before your eyes glaze over, let’s look at the two terms and think about what they mean. The first term says: the mean fitness will change since the fitnesses f_j(P) depend on P, which is changing. The second term says: the mean fitness will change since the fraction p_j of replicators that are in the jth species is changing.

We could continue the computation by using the Lotka–Volterra equation for \dot{P} and the replicator equation for \dot{p}. But it already looks like we’re doomed without invoking an extra assumption. The left hand side of Fisher’s fundamental theorem involves the gradients of the fitness functions, \nabla f_j(P). The right hand side:

\displaystyle{ \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

does not!

This suggests an extra assumption we can make. Let’s assume those gradients \nabla f_j vanish!

In other words, let’s assume that the fitness of each replicator is a constant, independent of the populations:

f_j(P_1, \dots, P_n) = f_j

where f_j at right is just a number.

Then we can redo our computation of the rate of change of mean fitness. The gradient term doesn’t appear:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j p_j } \\  \\  &=& \displaystyle{ \sum_j f_j \dot{p}_j }  \end{array}

We can use the replicator equation for \dot{p}_j and get

\begin{array}{ccl} \displaystyle{ \frac{d}{d t} \overline f } &=&  \displaystyle{ \sum_j f_j \Big( f_j - \overline f \Big) p_j } \\ \\  &=& \displaystyle{ \sum_j f_j^2 p_j - f_j p_j  \overline f  } \\ \\  &=& \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

This is the mean of the squares of the f_j minus the square of their mean. And if you’ve done enough probability theory, you’ll recognize this as the variance! Remember, the variance is

\begin{array}{ccl} \displaystyle{ \sum_j \Big( f_j - \overline f \Big)^2 \, p_j  }  &=&  \displaystyle{ \sum_j f_j^2 \, p_j - 2 f_j \overline f \, p_j + \overline f^2 p_j } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 \, p_j\Big) - 2 \overline f^2 + \overline f^2 } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

Same thing.

So, we’ve gotten a simple version of Fisher’s fundamental theorem. Given all the confusion swirling around this subject, let’s summarize it very clearly.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the equations

\displaystyle{ \dot P_i = f_i P_i}

for some constants f_i. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f = \sum_j f_j p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f) =  \sum_j \Big( f_j - \overline f \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance of the fitness:

\displaystyle{  \frac{d}{d t} \overline f = \mathrm{Var}(f) }

This is nice—but as you can see, our extra assumption that the fitness functions are constants has trivialized the problem. The equations

\displaystyle{ \dot P_i = f_i P_i}

are easy to solve: all the populations change exponentially with time. We’re not seeing any of the interesting features of population biology, or even of dynamical systems in general. The theorem is just an observation about a collection of exponential functions growing or shrinking at different rates.

So, we should look for a more interesting theorem in this vicinity! And we will.

Before I bid you adieu, let’s record a result we almost reached, but didn’t yet state. It’s stronger than the one I just stated. In this version we don’t assume the fitness functions are constant, so we keep the term involving their gradient.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the Lotka–Volterra equations:

\displaystyle{ \dot P_i = f_i(P) P_i}

for some differentiable functions f_i \colon (0,\infty)^n \to \mathbb{R} called fitness functions. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f(P)  = \sum_j f_j(P) p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f(P)) =  \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance plus an extra term involving the gradients of the fitness functions:

\displaystyle{\frac{d}{d t} \overline f(P)}  =  \displaystyle{ \mathrm{Var}(f(P)) + \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j }

The proof just amounts to cobbling together the calculations we have already done, and not assuming the gradient term vanishes.

Acknowledgements

After writing this blog article I looked for a nice picture to grace it. I found one here:

• David Wakeham, Replicators and Fisher’s fundamental theorem, 30 November 2017.

I was mildly chagrined to discover that he said most of what I just said more simply and cleanly… in part because he went straight to the case where the fitness functions are constants. But my mild chagrin was instantly offset by this remark:

Fisher likened the result to the second law of thermodynamics, but there is an amusing amount of disagreement about what Fisher meant and whether he was correct. Rather than look at Fisher’s tortuous proof (or the only slightly less tortuous results of latter-day interpreters) I’m going to look at a simpler setup due to John Baez, and (unlike Baez) use it to derive the original version of Fisher’s theorem.

So, I’m just catching up with Wakeham, but luckily an earlier blog article of mine helped him avoid “Fisher’s tortuous proof” and the “only slightly less tortuous results of latter-day interpreters”. We are making progress here!

(By the way, a quiz show I listen to recently asked about the difference between “tortuous” and “torturous”. They mean very different things, but this particular case either word would apply.)

My earlier blog article, in turn, was inspired by this paper:

• Marc Harper, Information geometry and evolutionary game theory.


The whole series:

Part 1: the obscurity of Fisher’s original paper.

Part 2: a precise statement of Fisher’s fundamental theorem of natural selection, and conditions under which it holds.

Part 3: a modified version of the fundamental theorem of natural selection, which holds much more generally.

Part 4: my paper on the fundamental theorem of natural selection.


Fisher’s Fundamental Theorem (Part 1)

29 September, 2020

There are various ‘fundamental theorems’ in mathematics. The fundamental theorem of arithmetic, the fundamental theorem of algebra, and the fundamental theorem of calculus are three of the most famous. These are gems of mathematics.

The statistician, biologist and eugenicist Ronald Fisher had his own fundamental theorem: the ‘fundamental theorem of natural selection’. But this one is different—it’s a mess! The theorem was based on confusing definitions, Fisher’s proofs were packed with typos and downright errors, and people don’t agree on what the theorem says, whether it’s correct, and whether it’s nontrivial. Thus, people keep trying to clarify and strengthen it.

This paper analyzes Fisher’s work:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 32 (1972), 129–140.

Price writes:

It has long been a mystery how Fisher (1930, 1941, 1958) derived his famous ‘fundamental theorem of Natural Selection’ and exactly what he meant by it. He stated the theorem in these words (1930, p. 35; 1958, p. 37): ‘The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.’ And also in these words (1930, p. 46; 1958, p. 50): ‘The rate of increase of fitness of any species is equal to the genetic variance in fitness.’ He compared this result to the second law of thermodynamics, and described it as holding ‘the supreme position among the biological sciences’. Also, he spoke of the ‘rigour’ of his derivation of the theorem and of ‘the ease of its interpretation’. But others have variously described his derivation as ‘recondite’ (Crow & Kimura, 1970), ‘very difficult’ (Turner, 1970), or ‘entirely obscure’ (Kempthorne, 1957). And no one has ever found any other way to derive the result that Fisher seems to state. Hence, many authors (not reviewed here) have maintained that the theorem holds only under very special conditions, while only a few (e.g. Edwards, 1967) have thought that Fisher may have been correct—if only we could understand what he meant!

It will be shown here that this latter view is correct. Fisher’s theorem does indeed hold with the generality that he claimed for it. The mystery and the controversy result from incomprehensibility rather than error.

I won’t try to explain the result here—or the various improved versions that people have invented, which may actually be more interesting than the original. I’ll try to do this in a series of posts. Right now I just want to quote Price’s analysis of why Fisher’s result is so difficult to understand! It’s amusing:

In addition to the central confusion resulting from the use of the word fitness in two highly different senses, Fisher’s three publications on his theorem contain an astonishing number of lesser obscurities, infelicities of expression, typographical errors, omissions of crucial explanations, and contradictions between different passages about the same point. It is necessary to clarify some of this confusion before explaining the derivation of the theorem.

He analyzes the problems in detail, calling one passage “most confusing published scientific writing I know of”.

Part of the problem, though only part, is that Fisher rewrote part of his paper while not remembering to change the rest to make the terminology match. It reminds me a bit of how the typesetter accidentally omitted a line from one of Bohr’s papers on quantum mechanics, creating a sentence that made absolutely no sense—though in Bohr’s case, his writing was so obscure that nobody even noticed until many years later.

Given its legendary obscurity, I will not try to fight my way through Fisher’s original paper. I will start with some later work. Next time!


The whole series:

Part 1: the obscurity of Fisher’s original paper.

Part 2: a precise statement of Fisher’s fundamental theorem of natural selection, and conditions under which it holds.

Part 3: a modified version of the fundamental theorem of natural selection, which holds much more generally.

Part 4: my paper on the fundamental theorem of natural selection.


Diary, 2003-2020

8 August, 2020

I keep putting off organizing my written material, but with coronavirus I’m feeling more mortal than usual, so I’d like get this out into the world now:

• John Baez, Diary, 2003–2020.

Go ahead and grab a copy!

It’s got all my best tweets and Google+ posts, mainly explaining math and physics, but also my travel notes and other things… starting in 2003 with my ruminations on economics and ecology. It’s too big to read all at once, but I think you can dip into it more or less anywhere and pull out something fun.

It goes up to July 2020. It’s 2184 pages long.

I fixed a few problems like missing pictures, but there are probably more. If you let me know about them, I’ll fix them (if it’s easy).