Compositional Game Theory and Climate Microeconomics

5 October, 2020

guest post by Jules Hedges

Hi all

This is a post I’ve been putting off for a long time until I was sure I was ready. I am the “lead developer” of a thing called compositional game theory (CGT). It’s an approach to game theory based on category theory, but we are now at the point where you don’t need to know that anymore: it’s an approach to game theory that has certain specific benefits over the traditional approach.

I would like to start a conversation about “using my powers for good”. I am hoping particularly that it is possible to model microeconomic aspects of climate science. This seems to be a very small field and I’m not really hopeful that anyone on Azimuth will have the right background, but it’s worth a shot. The kind of thing I’m imagining (possibly completely wrongly) is to create models that will suggest when a technically-feasible solution is not socially feasible. Social dilemmas and tragedies of the commons are at the heart of the climate crisis, and modelling instances of them is in scope.

I have a software tool (https://github.com/jules-hedges/open-games-hs) that is designed to be an assistant for game-theoretic modelling. This I can’t emphasise enough: A human with expertise in game-theoretic modelling is the most important thing, CGT is merely an assistant. (Right now the tool also probably can’t be used without me being in the loop, but that’s not an inherent thing.)

To give an idea what sort of things CGT can do, my 2 current ongoing research collaborations are: (1) a social science project modelling examples of institution governance, and (2) a cryptoeconomics project modelling an attack against a protocol using bribes. On a technical level the best fit is for Bayesian games, which are finite-horizon, have common knowledge priors, and private knowledge with agents who do Bayesian updating.

A lot of the (believed) practical benefits of CGT come from the fact that the model is code (in a high level language designed specifically for expressing games) and thus the model can be structured according to existing wisdom for structuring code. Really stress-testing this claim is an ongoing research project. My tool does equilibrium-checking for all games (the technical term is “model checker”), and we’ve had some success doing other things by looping an equilibrium check over a parameter space. It makes no attempt to be an equilibrium solver, that is left for the human.

This is not me trying to push my pet project (I do that elsewhere) but me trying to find a niche where I can do some genuine good, even if small. If you are a microeconomist (or a social scientist who uses applied game theory) and share the goals of Azimuth, I would like to hear from you, even if it’s just for some discussion.


Stretched Water

3 October, 2020

The physics of water is endlessly fascinating. The phase diagram of water at positive temperature and pressure is already remarkably complex, as shown in this diagram by Martin Chaplin:

Click for a larger version. And read this post of mine for more:

Ice.

But it turns out there’s more: water is also interesting at negative pressure.

I don’t know why I never wondered about this! But people study stretched water, essentially putting a piston of water under tension and measuring its properties.

You probably know one weird thing about water: ice floats. Unlike most liquids, water at standard pressure reaches its maximum density above the freezing point, at about 4 °C. And for any fixed pressure, you can try to find the temperature at which water reaches its maximum density. You get a curve of density maxima in the pressure-temperature plane. And with stretched water experiments, you can even study this curve for negative pressures:

This graph is from here:

• Gaël Pallares, Miguel A. Gonzalez, Jose Luis F. Abascal, Chantal Valeriani, and Frédéric Caupin, Equation of state for water and its line of density maxima down to -120 MPa, Physical Chemistry Chemical Physics 18 (2016), 5896–5900.

-120 MPa is about -1200 times atmospheric pressure.

This is just the tip of the iceberg. I’m reading some papers and discovering lots of amazing things that I barely understand:

• Stacey L. Meadley and C. Austen Angell, Water and its relatives: the stable, supercooled and particularly the stretched, regimes.

• Jeremy C. Palmer, Peter H. Poole, Francesco Sciortino and Pablo G. Debenedetti, Advances in computational studies of the liquid–liquid transition in water and water-like models, Chemical Reviews 118 (2018), 9129–9151.

I would like to learn about some of these things and explain them. But for now, let me just quote the second paper to illustrate how strange water actually is:

Water is ubiquitous and yet also unusual. It is central to life, climate, agriculture, and industry, and an understanding of its properties is key in essentially all of the disciplines of the natural sciences and engineering. At the same time, and despite its apparent molecular simplicity, water is a highly unusual substance, possessing bulk properties that differ greatly, and often qualitatively, from those of other compounds. As a consequence, water has long been the subject of intense scientific scrutiny.

In this review, we describe the development and current status of the proposal that a liquid−liquid transition (LLT) occurs in deeply supercooled water. The focus of this review is on computational work, but we also summarize the relevant experimental and theoretical background. Since first proposed in 1992, this hypothesis has generated considerable interest and debate. In particular, in the past few years several works have challenged the evidence obtained from computer simulations of the ST2 model of water that support in principle the existence of an LLT, proposing instead that what was previously interpreted as an LLT is in fact ice crystallization. This challenge to the LLT hypothesis has stimulated a significant amount of new work aimed at resolving the controversy and to better understand the nature of an LLT in water-like computer models.

Unambiguously resolving this debate, it has been shown recently that the code used in the studies that most sharply challenge the LLT hypothesis contains a serious conceptual error that prevented the authors from properly characterizing the phase behavior of the ST2 water model. Nonetheless, the burst of renewed activity focusing on simulations of an LLT in water has yielded considerable new insights. Here, we review this recent work, which clearly demonstrates that an LLT is a well-defined and readily observed phenomenon in computer simulations of water-like models and is unambiguously distinguished from the crystal−liquid phase transition.

Yes, you heard that right: a phase transition between two phases of liquid water below the freezing point!

Both these phases are metastable: pretty quickly the water will freeze. But apparently it still makes some sense to talk about phases, and a phase transition between them!

What does this have to do with stretched water? I’m not sure, but apparently understanding this stuff is connected to understanding water at negative pressures. It also involves the ‘liquid-vapor spinodal’.

Huh?

The liquid-vapor spinodal is another curve in the pressure-temperature plane. As far as I can tell, it works like this: when the pressure drops below this curve, the liquid—which is already unstable: it would evaporate given time—suddenly forms bubbles of vapor.

At negative pressures the liquid-vapor spinodal has a pretty intuitive meaning: if you stretch water too much, it breaks!

There’s a conjecture due to a guy named Robin J. Speedy, which implies the liquid-vapor spinodal intersects the curve of density maxima! And it does so at negative pressures. I don’t really understand the significance of this, but it sounds cool. Super-cool.

Here’s what Palmer, Poole, Sciortino and Debenedetti have to say about this:

The development of a thermodynamically self-consistent picture of the behavior of the deeply supercooled liquid that correctly predicts these experimental observations remains at the center of research on water. While a number of competing scenarios have been advanced over the years, the fact that consensus continues to be elusive demonstrates the complexity of the theoretical problem and the difficulty of the experiments required to distinguish between scenarios.

One of the first of these scenarios, Speedy’s “stability limit conjecture” (SLC), exemplifies the challenge. As formulated by Speedy, and comprehensively analyzed by Debenedetti and D’Antonio, the SLC proposes that water’s line of density maxima in the P−T plane intersects the liquid−vapor spinodal at negative pressure. At such an intersection, thermodynamics requires that the spinodal pass through a minimum and reappear in the positive pressure region under deeply supercooled conditions. Interestingly, this scenario has recently been observed in a numerical study of model colloidal particles. The apparent power law behavior of water’s response functions is predicted by the SLC in terms of the approach to the line of thermodynamic singularities found at the spinodal.

Although the SLC has recently been shown to be thermodynamically incompatible with other features of the supercooled water phase diagram, it played a key role in the development of new scenarios. The SLC also pointed out the importance of considering the behavior of “stretched” water at negative pressure, a regime in which the liquid is metastable with respect to the nucleation of bubbles of the vapor phase. The properties of stretched water have been probed directly in several innovative experiments which continue to generate results that may help discriminate among the competing scenarios that have been formulated to explain the thermodynamic behavior of supercooled water.


Fock Space Techniques for Stochastic Physics

2 October, 2020

I’ve been fascinated for a long time about the relation between classical probability theory and quantum mechanics. This story took a strange new turn when people discovered that stochastic Petri nets, good for describing classical probabilistic models of interacting entities, can also be described using ideas from the quantum field theory!

I’ll be talking about this at the online category theory seminar at UNAM, the National Autonomous University of Mexico, on Wednesday October 7th at 18:00 UTC (11 am Pacific Time):

Fock space techniques for stochastic physics

Abstract. Some ideas from quantum theory are beginning to percolate back to classical probability theory. For example, the master equation for a chemical reaction network—also known as a stochastic Petri net—describes particle interactions in a stochastic rather than quantum way. If we look at this equation from the perspective of quantum theory, this formalism turns out to involve creation and annihilation operators, coherent states and other well-known ideas—but with a few big differences.

You can watch the talk here:

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!

My students Joe Moeller and Jade Master will also be giving talks in this seminar—on Petri nets and structured cospans.

 


Reframing Superintelligence

30 September, 2020

Eric Drexler has a document calling for a view of superintelligent systems where instead of focusing on agents or minds we should focus on intelligent services. This is, of course, the approach taken by industrial AI so far. But the idea of a superintelligent agent with its own personality, desires and motivations still has a strong grip on our fantasies of the future.

• Eric Drexler, Reframing Superintelligence: Comprehensive AI Services as General Intelligence, Technical Report #2019-1, Future of Humanity Institute.

His abstract begins thus:

Studies of superintelligent-level systems have typically posited AI functionality that plays the role of a mind in a rational utility-directed agent, and hence employ an abstraction initially developed as an idealized model of human decision makers. Today, developments in AI technology highlight intelligent systems that are quite unlike minds, and provide a basis for a different approach to understanding them.

The desire to build an independent self-motivated superintelligent agents (“AGI”: artificial general intelligence) still beckons to many. But Drexler suggests treating this as a deviant branch we should avoid. He instead wants us to focus on “CAIS”: comprehensive AI services.

First, we don’t have any practical reason to want AGI:

In practical terms, we value potential AI systems for what they could do, whether driving a car, designing a spacecraft, caring for a patient, disarming an opponent, proving a theorem, or writing a symphony. Scientific curiosity and long-standing aspirations will encourage the development of AGI agents with open-ended, self-directed, human-like capabilities, but the more powerful drives of military competition, economic competition, and improving human welfare do not in themselves call for such agents. What matters in practical terms are the concrete AI services provided (their scope, quality, and reliability) and the ease or difficulty of acquiring them (in terms of time, cost,and human effort).

Second, it’s harder to create agents with their own motives than to create services. And third, they are more risky.

But there’s no sharp line between “AI as service” and “AI as agent”, so endless care is required if we want CAIS but not AGI:

There is no bright line between safe CAI services and unsafe AGI agents, and AGI is perhaps best regarded as a potential branch from an R&D-automation/CAIS path. To continue along safe paths from today’s early AI R&D automation to superintelligent-level CAIS calls for an improved understanding of the preconditions for AI risk, while for any given level ofsafety, a better understanding of risk will widen the scope of known-safe system architectures and capabilities. The analysis presented above suggests that CAIS models of the emergence of superintelligent-level AI capabilities, including AGI, should be given substantial and arguably predominant weight in considering questions of AI safety and strategy.

Although it is important to distinguish between pools of AI services and classic conceptions of integrated, opaque, utility-maximizing agents, we should be alert to the potential for coupled AI services to develop emergent, unintended,and potentially risky agent-like behaviors. Because there is no bright line between agents and non-agents, or between rational utility maximization and reactive behaviors shaped by blind evolution, avoiding risky behaviors calls for at least two complementary perspectives: both (1) design-oriented studies that can guide implementation of systems that will provide requisite degrees of e.g., stability, reliability, and transparency, and (2) agent-oriented studies support design by exploring the characteristics of systems that could display emergent, unintended, and potentially risky agent-like behaviors. The possibility (or likelihood) of humans implementing highly-adaptive agents that pursue open-ended goals in the world (e.g., money-maximizers) presents particularly difficult problems.

Perhaps “slippage” toward agency is a bigger risk than the deliberate creation of a superintelligent agent. I feel extremely unconfident in the ability of humans to successfully manage anything, except for short periods of time. I’m not confident in any superintelligence being better at this, either: they could be better at managing things, but they’d have more to manage. Drexler writes:

Superintelligent-level aid in understanding and implementing solutions to the AGI control problem could greatly improve our strategic position.

and while this is true, this offers even more opportunity for “slippage”.

I suspect that whatever can go wrong eventually does. Luckily a lot goes right, too. We fumble, stumble and tumble forward into the future.


Fisher’s Fundamental Theorem (Part 2)

29 September, 2020

Here’s how Fisher stated his fundamental theorem:

The rate of increase of fitness of any species is equal to the genetic variance in fitness.

But clearly this is only going to be true under some conditions!

A lot of early criticism of Fisher’s fundamental theorem centered on the fact that the fitness of a species can vary due to changing external conditions. For example: suppose the Sun goes supernova. The fitness of all organisms on Earth will suddenly drop. So the conclusions of Fisher’s theorem can’t hold under these circumstances.

I find this obvious and thus uninteresting. So, let’s tackle situations where the fitness changes due to changing external conditions later. But first let’s see what happens if the fitness isn’t changing for these external reasons.

What’s ‘fitness’, anyway? To define this we need a mathematical model of how populations change with time. We’ll start with a very simple, very general model. While it’s often used in population biology, it will have very little to do with biology per se. Indeed, the reason I’m digging into Fisher’s fundamental theorem is that it has a mathematical aspect that doesn’t require much knowledge of biology to understand. Applying it to biology introduces lots of complications and caveats, but that won’t be my main focus here. I’m looking for the simple abstract core.

The Lotka–Volterra equation

The Lotka–Volterra equation is a simplified model of how populations change with time. Suppose we have n different types of self-replicating entity. We will call these entities replicators. We will call the types of replicators species, but they do not need to be species in the biological sense!

For example, the replicators could be organisms of one single biological species, and the types could be different genotypes. Or the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains. In what follows these details won’t matter: we’ll have just have different ‘species’ of ‘replicators’.

Let P_i(t) or just P_i for short, be the population of the ith species at time t. We will treat this population as a differentiable real-valued function of time, which is a reasonable approximation when the population is fairly large.

Let’s assume the population obeys the Lotka–Volterra equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each function f_i depends in a differentiable way on all the populations. Thus each population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it depends on the populations of all the species.

We call f_i the fitness function of the ith species. Note: we are assuming this function does not depend on time.

To write the Lotka–Volterra equation more concisely, we can create a vector whose components are all the populations:

P = (P_1, \dots , P_n).

Let’s call this the population vector. In terms of the population vector, the Lotka–Volterra equation become

\displaystyle{ \dot P_i = f_i(P) P_i}

where the dot stands for a time derivative.

To define concepts like ‘mean fitness’ or ‘variance in fitness’ we need to introduce probability theory, and the replicator equation.

The replicator equation

Starting from the populations P_i, we can work out the probability p_i that a randomly chosen replicator belongs to the ith species. More precisely, this is the fraction of replicators belonging to that species:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

As a mnemonic, remember that the big Population P_i is being normalized to give a little probability p_i. I once had someone scold me for two minutes during a talk I was giving on this subject, for using lower-case and upper-case P’s to mean different things. But it’s my blog and I’ll do what I want to.

How do these probabilities p_i change with time? We can figure this out using the Lotka–Volterra equation. We pull out the trusty quotient rule and calculate:

\displaystyle{ \dot{p}_i = \frac{\dot{P}_i \left(\sum_j P_j\right) - P_i \left(\sum_j \dot{P}_j \right)}{\left(  \sum_j P_j \right)^2 } }

Then the Lotka–Volterra equation gives

\displaystyle{ \dot{p}_i = \frac{ f_i(P) P_i \; \left(\sum_j P_j\right) - P_i \left(\sum_j f_j(P) P_j \right)} {\left(  \sum_j P_j \right)^2 } }

Using the definition of p_i this simplifies and we get

\displaystyle{ \dot{p}_i =  f_i(P) p_i  - \left( \sum_j f_j(P) p_j \right) p_i }

The expression in parentheses here has a nice meaning: it is the mean fitness. In other words, it is the average, or expected, fitness of a replicator chosen at random from the whole population. Let us write it thus:

\displaystyle{ \overline f(P) = \sum_j f_j(P) p_j  }

This gives the replicator equation in its classic form:

\displaystyle{ \dot{p}_i = \left( f_i(P) - \overline f(P) \right) \, p_i }

where the dot stands for a time derivative. Thus, for the fraction of replicators of the ith species to increase, their fitness must exceed the mean fitness.

The moral is clear:

To become numerous you have to be fit.
To become predominant you have to be fitter than average.

This picture by David Wakeham illustrates the idea:

The fundamental theorem

What does the fundamental theorem of natural selection say, in this context? It says the rate of increase in mean fitness is equal to the variance of the fitness. As an equation, it says this:

\displaystyle{ \frac{d}{d t} \overline f(P) = \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

The left hand side is the rate of increase in mean fitness—or decrease, if it’s negative. The right hand side is the variance of the fitness: the thing whose square root is the standard deviation. This can never be negative!

A little calculation suggests that there’s no way in the world that this equation can be true without extra assumptions!

We can start computing the left hand side:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j(P) p_j } \\  \\  &=& \displaystyle{ \sum_j  \frac{d f_j(P)}{d t} \, p_j \; + \; f_j(P) \, \frac{d p_j}{d t} } \\ \\  &=& \displaystyle{ \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j \; + \; f_j(P) \dot{p}_j }  \end{array}

Before your eyes glaze over, let’s look at the two terms and think about what they mean. The first term says: the mean fitness will change since the fitnesses f_j(P) depend on P, which is changing. The second term says: the mean fitness will change since the fraction p_j of replicators that are in the jth species is changing.

We could continue the computation by using the Lotka–Volterra equation for \dot{P} and the replicator equation for \dot{p}. But it already looks like we’re doomed without invoking an extra assumption. The left hand side of Fisher’s fundamental theorem involves the gradients of the fitness functions, \nabla f_j(P). The right hand side:

\displaystyle{ \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

does not!

This suggests an extra assumption we can make. Let’s assume those gradients \nabla f_j vanish!

In other words, let’s assume that the fitness of each replicator is a constant, independent of the populations:

f_j(P_1, \dots, P_n) = f_j

where f_j at right is just a number.

Then we can redo our computation of the rate of change of mean fitness. The gradient term doesn’t appear:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j p_j } \\  \\  &=& \displaystyle{ \sum_j f_j \dot{p}_j }  \end{array}

We can use the replicator equation for \dot{p}_j and get

\begin{array}{ccl} \displaystyle{ \frac{d}{d t} \overline f } &=&  \displaystyle{ \sum_j f_j \Big( f_j - \overline f \Big) p_j } \\ \\  &=& \displaystyle{ \sum_j f_j^2 p_j - f_j p_j  \overline f  } \\ \\  &=& \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

This is the mean of the squares of the f_j minus the square of their mean. And if you’ve done enough probability theory, you’ll recognize this as the variance! Remember, the variance is

\begin{array}{ccl} \displaystyle{ \sum_j \Big( f_j - \overline f \Big)^2 \, p_j  }  &=&  \displaystyle{ \sum_j f_j^2 \, p_j - 2 f_j \overline f \, p_j + \overline f^2 p_j } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 \, p_j\Big) - 2 \overline f^2 + \overline f^2 } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

Same thing.

So, we’ve gotten a simple version of Fisher’s fundamental theorem. Given all the confusion swirling around this subject, let’s summarize it very clearly.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the equations

\displaystyle{ \dot P_i = f_i P_i}

for some constants f_i. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f = \sum_j f_j p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f) =  \sum_j \Big( f_j - \overline f \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance of the fitness:

\displaystyle{  \frac{d}{d t} \overline f = \mathrm{Var}(f) }

This is nice—but as you can see, our extra assumption that the fitness functions are constants has trivialized the problem. The equations

\displaystyle{ \dot P_i = f_i P_i}

are easy to solve: all the populations change exponentially with time. We’re not seeing any of the interesting features of population biology, or even of dynamical systems in general. The theorem is just an observation about a collection of exponential functions growing or shrinking at different rates.

So, we should look for a more interesting theorem in this vicinity! And we will.

Before I bid you adieu, let’s record a result we almost reached, but didn’t yet state. It’s stronger than the one I just stated. In this version we don’t assume the fitness functions are constant, so we keep the term involving their gradient.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the Lotka–Volterra equations:

\displaystyle{ \dot P_i = f_i(P) P_i}

for some differentiable functions f_i \colon (0,\infty)^n \to \mathbb{R} called fitness functions. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f(P)  = \sum_j f_j(P) p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f(P)) =  \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance plus an extra term involving the gradients of the fitness functions:

\displaystyle{\frac{d}{d t} \overline f(P)}  =  \displaystyle{ \mathrm{Var}(f(P)) + \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j }

The proof just amounts to cobbling together the calculations we have already done, and not assuming the gradient term vanishes.

Acknowledgements

After writing this blog article I looked for a nice picture to grace it. I found one here:

• David Wakeham, Replicators and Fisher’s fundamental theorem, 30 November 2017.

I was mildly chagrined to discover that he said most of what I just said more simply and cleanly… in part because he went straight to the case where the fitness functions are constants. But my mild chagrin was instantly offset by this remark:

Fisher likened the result to the second law of thermodynamics, but there is an amusing amount of disagreement about what Fisher meant and whether he was correct. Rather than look at Fisher’s tortuous proof (or the only slightly less tortuous results of latter-day interpreters) I’m going to look at a simpler setup due to John Baez, and (unlike Baez) use it to derive the original version of Fisher’s theorem.

So, I’m just catching up with Wakeham, but luckily an earlier blog article of mine helped him avoid “Fisher’s tortuous proof” and the “only slightly less tortuous results of latter-day interpreters”. We are making progress here!

(By the way, a quiz show I listen to recently asked about the difference between “tortuous” and “torturous”. They mean very different things, but this particular case either word would apply.)

My earlier blog article, in turn, was inspired by this paper:

• Marc Harper, Information geometry and evolutionary game theory.


Fisher’s Fundamental Theorem (Part 1)

29 September, 2020

There are various ‘fundamental theorems’ in mathematics. The fundamental theorem of arithmetic, the fundamental theorem of algebra, and the fundamental theorem of calculus are three of the most famous. These are gems of mathematics.

The statistician, biologist and eugenicist Ronald Fisher had his own fundamental theorem: the ‘fundamental theorem of natural selection’. But this one is different—it’s a mess! The theorem was based on confusing definitions, Fisher’s proofs were packed with typos and downright errors, and people don’t agree on what the theorem says, whether it’s correct, and whether it’s nontrivial. Thus, people keep trying to clarify and strengthen it.

This paper analyzes Fisher’s work:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 32 (1972), 129–140.

Price writes:

It has long been a mystery how Fisher (1930, 1941, 1958) derived his famous ‘fundamental theorem of Natural Selection’ and exactly what he meant by it. He stated the theorem in these words (1930, p. 35; 1958, p. 37): ‘The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.’ And also in these words (1930, p. 46; 1958, p. 50): ‘The rate of increase of fitness of any species is equal to the genetic variance in fitness.’ He compared this result to the second law of thermodynamics, and described it as holding ‘the supreme position among the biological sciences’. Also, he spoke of the ‘rigour’ of his derivation of the theorem and of ‘the ease of its interpretation’. But others have variously described his derivation as ‘recondite’ (Crow & Kimura, 1970), ‘very difficult’ (Turner, 1970), or ‘entirely obscure’ (Kempthorne, 1957). And no one has ever found any other way to derive the result that Fisher seems to state. Hence, many authors (not reviewed here) have maintained that the theorem holds only under very special conditions, while only a few (e.g. Edwards, 1967) have thought that Fisher may have been correct—if only we could understand what he meant!

It will be shown here that this latter view is correct. Fisher’s theorem does indeed hold with the generality that he claimed for it. The mystery and the controversy result from incomprehensibility rather than error.

I won’t try to explain the result here—or the various improved versions that people have invented, which may actually be more interesting than the original. I’ll try to do this in a series of posts. Right now I just want to quote Price’s analysis of why Fisher’s result is so difficult to understand! It’s amusing:

In addition to the central confusion resulting from the use of the word fitness in two highly different senses, Fisher’s three publications on his theorem contain an astonishing number of lesser obscurities, infelicities of expression, typographical errors, omissions of crucial explanations, and contradictions between different passages about the same point. It is necessary to clarify some of this confusion before explaining the derivation of the theorem.

He analyzes the problems in detail, calling one passage “most confusing published scientific writing I know of”.

Part of the problem, though only part, is that Fisher rewrote part of his paper while not remembering to change the rest to make the terminology match. It reminds me a bit of how the typesetter accidentally omitted a line from one of Bohr’s papers on quantum mechanics, creating a sentence that made absolutely no sense—though in Bohr’s case, his writing was so obscure that nobody even noticed until many years later.

Given its legendary obscurity, I will not try to fight my way through Fisher’s original paper. I will start with some later work. Next time!


Banning Lead in Wetlands

27 September, 2020

An European Union commission has voted to ban the use of lead ammunition near wetlands and waterways! The proposal now needs to be approved by the European Parliament and Council. They are expected to approve the ban. If so, it will go into effect in 2022. The same commission, called REACH, may debate a complete ban on lead ammunition and fishing weights later this year.

Why does this matter? The European Chemicals Agency has estimated that as many as 1.5 million aquatic birds die annually from lead poisoning because they swallow some of the 5000 tonnes of lead shot that land in European wetlands each year. Water birds are more likely to be poisoned by lead because they mistake small lead shot pellets for stones they deliberately ingest to help grind their food.

In fact, about 20,000 tonnes of lead shot is fired each year in the EU, and 60,000 in the US. Eating game shot with lead is not good for you—but also, even low levels of lead in the environment can cause health damage and negative changes in behavior.

How much lead is too much? This is a tricky question, so I’ll just give some data. In the U.S., the geometric mean of the blood lead level among adults was 1.2 micrograms per deciliter (μg/dL) in 2009–2010. Blood lead concentrations in poisoning victims ranges from 30-80 µg/dL in children exposed to lead paint in older houses, 80–100 µg/dL in people working with pottery glazes, 90–140 µg/dL in individuals consuming contaminated herbal medicines, 110–140 µg/dL in indoor shooting range instructors and as high as 330 µg/dL in those drinking fruit juices from glazed earthenware containers!

The amount of lead that US children are exposed to has been dropping, thanks to improved regulations:

However, what seem like low levels now may be high in the grand scheme of things. The amount of lead has increased by a factor of about 300 in the Greenland ice sheet during the past 3000 years. Most of this is due to industrial emissions:

• Amy Ng and Clair Patterson, Natural concentrations of lead in ancient Arctic and Antarctic ice, Geochimica et Cosmochimica Acta 45 (1981), 2109–2121.


Electric Cars

24 September, 2020

Some good news! According to this article, we’re rapidly approaching the tipping point when, even without subsidies, it will be as cheaper to own an electric car than one that burns fossil fuels.

• Jack Ewing, The age of electric cars is dawning ahead of schedule, New York Times, September 20, 2020.

FRANKFURT — An electric Volkswagen ID.3 for the same price as a Golf. A Tesla Model 3 that costs as much as a BMW 3 Series. A Renault Zoe electric subcompact whose monthly lease payment might equal a nice dinner for two in Paris.

As car sales collapsed in Europe because of the pandemic, one category grew rapidly: electric vehicles. One reason is that purchase prices in Europe are coming tantalizingly close to the prices for cars with gasoline or diesel engines.

At the moment this near parity is possible only with government subsidies that, depending on the country, can cut more than $10,000 from the final price. Carmakers are offering deals on electric cars to meet stricter European Union regulations on carbon dioxide emissions. In Germany, an electric Renault Zoe can be leased for 139 euros a month, or $164.

Electric vehicles are not yet as popular in the United States, largely because government incentives are less generous. Battery-powered cars account for about 2 percent of new car sales in America, while in Europe the market share is approaching 5 percent. Including hybrids, the share rises to nearly 9 percent in Europe, according to Matthias Schmidt, an independent analyst in Berlin.

As electric cars become more mainstream, the automobile industry is rapidly approaching the tipping point when, even without subsidies, it will be as cheap, and maybe cheaper, to own a plug-in vehicle than one that burns fossil fuels. The carmaker that reaches price parity first may be positioned to dominate the segment.

A few years ago, industry experts expected 2025 would be the turning point. But technology is advancing faster than expected, and could be poised for a quantum leap. Elon Musk is expected to announce a breakthrough at Tesla’s “Battery Day” event on Tuesday that would allow electric cars to travel significantly farther without adding weight.

The balance of power in the auto industry may depend on which carmaker, electronics company or start-up succeeds in squeezing the most power per pound into a battery, what’s known as energy density. A battery with high energy density is inherently cheaper because it requires fewer raw materials and less weight to deliver the same range.

“We’re seeing energy density increase faster than ever before,” said Milan Thakore, a senior research analyst at Wood Mackenzie, an energy consultant which recently pushed its prediction of the tipping point ahead by a year, to 2024.

However, the article also points out that this tipping point is of the overall lifetime cost of the vehicle! The sticker price of electric cars will still be higher for a while. And there aren’t nearly enough charging stations!

My next car will be electric. But first I’m installing solar power for my house. I’m working on it now.


Ascendancy vs. Reserve

22 September, 2020

Why is biodiversity ‘good’? To what extent is this sort of goodness even relevant to ecosystems—as opposed to us humans? I’d like to study this mathematically.

To do this, we’d need to extract some answerable questions out of the morass of subtlety and complexity. For example: what role does biodiversity play in the ability of ecosystems to be robust under sudden changes of external conditions? This is already plenty hard to study mathematically, since it requires understanding ‘biodiversity’ and ‘robustness’.

Luckily there has already been a lot of work on the mathematics of biodiversity and its connection to entropy. For example:

• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.

But how does biodiversity help robustness?

There’s been a lot of work on this. This paper has some inspiring passages:

• Robert E. Ulanowicz,, Sally J. Goerner, Bernard Lietaer and Rocio Gomez, Quantifying sustainability: Resilience, efficiency and the return of information theory, Ecological Complexity 6 (2009), 27–36.

I’m not sure the math lives up to their claims, but I like these lines:

In other words, (14) says that the capacity for a system to undergo evolutionary change or self-organization consists of two aspects: It must be capable of exercising sufficient directed power (ascendancy) to maintain its integrity over time. Simultaneously, it must possess a reserve of flexible actions that can be used to meet the exigencies of novel disturbances. According to (14) these two aspects are literally complementary.

The two aspects are ‘ascendancy’, which is something like efficiency or being optimized, and ‘reserve capacity’, which is something like random junk that might come in handy if something unexpected comes up.

You know those gadgets you kept in the back of your kitchen drawer and never needed… until you did? If you’re aiming for ‘ascendancy’ you’d clear out those drawers. But if you keep that stuff, you’ve got more ‘reserve capacity’. They both have their good points. Ideally you want to strike a wise balance. You’ve probably sensed this every time you clean out your house: should I keep this thing because I might need it, or should I get rid of it?

I think it would be great to make these concepts precise. The paper at hand attempts this by taking a matrix of nonnegative numbers T_{i j} to describe flows in an ecological network. They define a kind of entropy for this matrix, very similar in look to Shannon entropy. Then they write this as a sum of two parts: a part closely analogous to mutual information, and a part closely analogous to conditional entropy. This decomposition is standard in information theory. This is their equation (14).

If you want to learn more about the underlying math, click on this picture:

The new idea of these authors is that in the context of an ecological network, the mutual information can be understood as ‘ascendancy’, while the conditional entropy can be understood as ‘reserve capacity’.

I don’t know if I believe this! But I like the general idea of a balance between ascendancy and reserve capacity.

They write:

While the dynamics of this dialectic interaction can be quite subtle and highly complex, one thing is boldly clear—systems with either vanishingly small ascendancy or insignificant reserves are destined to perish before long. A system lacking ascendancy has neither the extent of activity nor the internal organization needed to survive. By contrast, systems that are so tightly constrained and honed to a particular environment appear ‘‘brittle’’ in the sense of Holling (1986) or ‘‘senescent’’ in the sense of Salthe (1993) and are prone to collapse in the face of even minor novel disturbances. Systems that endure—that is, are sustainable—lie somewhere between these extremes. But, where?


Enayat on Nonstandard Numbers

21 September, 2020

Michael Weiss and I have been carrying on a dialog on nonstandard models of arithmetic, and after a long break we’re continuing, here:

• Michael Weiss and John Baez, Non-standard models of arithmetic (Part 18).

In this part we reach a goal we’ve been heading toward for a long time! We’ve been reading this paper:

• Ali Enayat, Standard models of arithmetic.

and we’ve finally gone through the main theorems and explained what they say. We’ll talk about the proofs later.

The simplest one is this:

• Every ZF-standard model of PA that is not V-standard is recursively saturated.

What does this theorem mean, roughly? Let me be very sketchy here, to keep things simple and give just a flavor of what’s going on.

Peano arithmetic is a well-known axiomatic theory of the natural numbers. People study different models of Peano arithmetic in some universe of sets, say U. If we fix our universe U there is typically one ‘standard’ model of Peano arithmetic, built using the set

\omega = \{0,\{0\},\{0,\{0\}\}, \dots \}

or in other words

\omega = \{0,1,2, \dots \}

All models of Peano arithmetic not isomorphic to this one are called ‘nonstandard’. You can show that any model of Peano arithmetic contains an isomorphic copy of standard model as an initial segment. This uniquely characterizes the standard model.

But different axioms for set theory give different concepts of U, the universe of sets. So the uniqueness of the standard model of Peano arithmetic is relative to that choice!

Let’s fix a choice of axioms for set theory: the Zermelo–Fraenkel or ‘ZF’ axioms are a popular choice. For the sake of discussion I’ll assume these axioms are consistent. (If they turn out not to be, I’ll correct this post.)

We can say the universe U is just what ZF is talking about, and only theorems of ZF count as things we know about U. Or, we can take a different attitude. After all, there are a couple of senses in which the ZF axioms don’t completely pin down the universe of sets.

First, there are statements in set theory that are neither provable nor disprovable from the ZF axioms. For any of these statements we’re free to assume it holds, or it doesn’t hold. We can add either it or its negation to the ZF axioms, and still get a consistent theory.

Second, a closely related sense in which the ZF axioms don’t uniquely pin down U is this: there are many different models of the ZF axioms.

Here I’m talking about models in some universe of sets, say V. This may seem circular! But it’s not really: first we choose some way to deal with set theory, and then we study models of the ZF axioms in this context. It’s a useful thing to do.

So fix this picture in mind. We start with a universe of sets V. Then we look at different models of ZF in V, each of which gives a universe U. U is sitting inside V, but from inside it looks like ‘the universe of all sets’.

Now, for each of these universes U we can study models of Peano arithmetic in U. And as I already explained, inside each U there will be a standard model of Peano arithmetic. But of course this depends on U.

So, we get lots of standard models of Peano arithmetic, one for each choice of U. Enayat calls these ZF-standard models of Peano arithmetic.

But there is one very special model of ZF in V, namely V itself. In other words, one choice of U is to take U = V. There’s a standard model of Peano arithmetic in V itself. This is an example of a ZF-standard model, but this is a very special one. Let’s call any model of Peano arithmetic isomorphic to this one V-standard.

Enayat’s theorem is about ZF-standard models of Peano arithmetic that aren’t V-standard. He shows that any ZF-standard model that’s not V-standard is ‘recursively saturated’.

What does it mean for a model M of Peano arithmetic to be ‘recursively saturated’? The idea is very roughly that ‘anything that can happen in any model, happens in M’.

Let me be a bit more precise. It means that if you write any computer program that prints out an infinite list of properties of an n-tuple of natural numbers, and there’s some model of Peano arithmetic that has an n-tuple with all these properties, then there’s an n-tuple of natural numbers in the model M with all these properties.

For example, there are models of Peano arithmetic that have a number x such that

0 < x
1 < x
2 < x
3 < x

and so on, ad infinitum. These are the nonstandard models. So a recursively saturated model must have such a number x. So it must be nonstandard.

In short, Enayat has found that ZF-standard models of Peano arithmetic in the universe V come in two drastically different kinds. They are either ‘as standard as possible’, namely V-standard. Or, they are ‘extremely rich’, containing n-tuples with all possible lists of consistent properties that you can print out with a computer program: they are recursively saturated.

I am probably almost as confused about this as you are. But Michael and I will dig into this more in our series of posts.

In fact we’ve been at this a while already. Here is a description of the whole series of posts so far:

Posts 1–10 are available as pdf files, formatted for small and medium screens.

Non-standard Models of Arithmetic 1: John kicks off the series by asking about recursively saturated models, and Michael says a bit about n-types and the overspill lemma. He also mentions the arithmetical hierarchy.

Non-standard Models of Arithmetic 2: John mention some references, and sets a goal: to understand this paper:

• Ali Enayat, Standard models of arithmetic.

John describes his dream: to show that “the” standard model is a much more nebulous notion than many seem to believe. He says a bit about the overspill lemma, and Joel David Hamkins gives a quick overview of saturation.

Non-standard Models of Arithmetic 3: A few remarks on the ultrafinitists Alexander Yessenin-Volpin and Edward Nelson; also Michael’s grad-school friend who used to argue that 7 might be nonstandard.

Non-standard Models of Arithmetic 4: Some back-and-forth on Enayat’s term “standard model” (or “ZF-standard model”) for the omega of a model of ZF. Philosophy starts to rear its head.

Non-standard Models of Arithmetic 5: Hamlet and Polonius talk math, and Michael holds forth on his philosophies of mathematics.

Non-standard Models of Arithmetic 6: John weighs in with why he finds “the standard model of Peano arithmetic” a problematic phrase. The Busy Beaver function is mentioned.

Non-standard Models of Arithmetic 7: We start on Enayat’s paper in earnest. Some throat-clearing about Axiom SM, standard models of ZF, inaccessible cardinals, and absoluteness. “As above, so below”: how ZF makes its “gravitational field” felt in PA.

Non-standard Models of Arithmetic 8: A bit about the Paris-Harrington and Goodstein theorems. In preparation, the equivalence (of sorts) between PA and ZF¬∞. The universe Vω of hereditarily finite sets and its correspondence with \mathbb{N}. A bit about Ramsey’s theorem (needed for Paris-Harrington). Finally, we touch on the different ways theories can be “equivalent”, thanks to a comment by Jeffrey Ketland.

Non-standard Models of Arithmetic 9: Michael sketches the proof of the Paris-Harrington theorem.

Non-Standard Models of Arithmetic 10: Ordinal analysis, the function growth hierarchies, and some fragments of PA. Some questions that neither of us knows how to answer.

Non-standard Models of Arithmetic 11: Back to Enayat’s paper: his definition of PAT for a recursive extension T of ZF. This uses the translation of formulas of PA into formulas of ZF, \varphi\mapsto \varphi^\mathbb{N}. Craig’s trick and Rosser’s trick.

Non-standard Models of Arithmetic 12: The strength of PAT for various T‘s. PAZF is equivalent to PAZFC+GCH, but PAZFI is strictly stronger than PAZF. (ZFI = ZF + “there exists an inaccessible cardinal”.)

Non-standard Models of Arithmetic 13: Enayat’s “natural” axiomatization of PAT, and his proof that this works. A digression into Tarski’s theorem on the undefinability of truth, and how to work around it. For example, while truth is not definable, we can define truth for statements with at most a fixed number of quantifiers.

Non-standard Models of Arithmetic 14: The previous post showed that PAT implies ΦT, where ΦT is Enayat’s “natural” axiomatization of PAT. Here we show the converse. We also interpret ΦT as saying, “Trust T”.

Non-standard Models of Arithmetic 15: We start to look at Truth (aka Satisfaction). Tarski gave a definition of Truth, and showed that Truth is undefinable. Less enigmatically put, there is no formula True(x) in the language of Peano arithmetic (L(PA)) that holds exactly for the Gödel numbers of true sentences of Peano arithmetic. On the other hand, Truth for Peano arithmetic can be formalized in the language of set theory (L(ZF)), and there are other work-arounds. We give an analogy with the Cantor diagonal argument.

Non-standard Models of Arithmetic 16: We look at the nitty-gritty of Trued(x), the formula in L(PA) that expresses truth in PA for formulas with parse-tree depth at most d. We see how the quantifiers “bleed through”, and why this prevents us from combining the whole series of Trued(x)’s into a single formula True(x). We also look at the variant SatΣn(x,y).

Non-standard Models of Arithmetic 17: More about how Trued evades Tarski’s undefinability theorem. The difference between Trued and SatΣn, and how it doesn’t matter for us. How Trued captures Truth for models of arithmetic: PA ⊢ Trued(⌜φ⌝) ↔ φ, for any φ of parse-tree depth at most d. Sketch of why this holds.

Non-standard Models of Arithmetic 18: The heart of Enayat’s paper: characterizing countable nonstandard T-standard models of PA (Prop. 6, Thm. 7, Cor. 8). Refresher on types. Meaning of ‘recursively saturated’. Standard meaning of ‘nonstandard’; standard and nonstandard meanings of ‘standard’.

Non-standard Models of Arithmetic 19: We marvel a bit over Enayat’s Prop. 6, and especially Cor. 8. The triple-decker sandwich, aka three-layer cake: ωUUV. More about why the omegas of standard models of ZF are standard. Refresher on ΦT. The smug confidence of a ZF-standard model.

Non-standard Models of Arithmetic 20: We start to develop the proof of Enayat’s Prop. 6. We get as far as a related result: any nonstandard model of PA is recursively d-saturated. (‘Recursively d-saturated’ is our user-friendly version of the professional-grade concept: recursively Σn-saturated.)