Exponential Discounting

25 October, 2020

Most of us seem to agree that the promise of a dollar in the future is worth less to us than a dollar today, even if the promise is certain to be fulfilled. Economists often assume ‘exponential discounting’, which says that a dollar promised at some time s is worth

\exp(-\alpha(s - t))

dollars in hand at time t. The constant \alpha is connected to the ‘interest rate’.

Why are economists so wedded to exponential discounting? The main reason is probably that it’s mathematically simple. But one argument for it goes roughly like this: if your decisions today are to look rational at any future time, you need to use exponential discounting.

In practice, humans, pigeons and rats do not use exponential discounting. So, economists say they are ‘dynamically inconsistent’:

• Wikipedia, Dynamic inconsistency.

In economics, dynamic inconsistency or time inconsistency is a situation in which a decision-maker’s preferences change over time in such a way that a preference can become inconsistent at another point in time. This can be thought of as there being many different “selves” within decision makers, with each “self” representing the decision-maker at a different point in time; the inconsistency occurs when not all preferences are aligned.

I this ‘inconsistent’ could be a misleading term for what’s going on here. It suggests that something bad is happening. That may not be true.

Anyway, some of the early research on this was done by George Ainslie, and here is what he found:

Ainslie’s research showed that a substantial number of subjects reported that they would prefer $50 immediately rather than $100 in six months, but would NOT prefer $50 in 3 months rather than $100 in nine months, even though this was the same choice seen at 3 months’ greater distance. More significantly, those subjects who said they preferred $50 in 3 months to $100 in 9 months said they would NOT prefer $50 in 12 months to $100 in 18 months—again, the same pair of options at a different distance—showing that the preference-reversal effect did not depend on the excitement of getting an immediate reward. Nor does it depend on human culture; the first preference reversal findings were in rats and pigeons.

Let me give a mathematical argument for exponential discounting. Of course it will rely on some assumptions. I’m not claiming these assumptions are true! Far from it. I’m just claiming that if we don’t use exponential discounting, we are violating one or more of these assumptions… or breaking out of the whole framework of my argument. The widespread prevalence of ‘dynamic inconsistency’ suggests that the argument doesn’t apply to real life.

Here’s the argument:

Suppose the value to us at any time t of a dollar given to us at some other time s is V(t,s).

Let us assume:

1) The ratio

\displaystyle{ \frac{V(t,s_2)}{V(t,s_1)} }

is independent of t. E.g., the ratio of value of a “dollar on Friday” to “a dollar on Thursday” is the same if you’re computing it on Monday, or on Tuesday, or on Wednesday.

2) The quantity V(t,s) depends only on the difference s - t.

3) The quantity V(t,s) is a continuous function of s and t.

Then we can show

V(t,s) = k \exp(-\alpha(s-t))

for some constants \alpha and k. Typically we assume k = 1 since the value of a dollar given to us right now is 1. But let’s just see how we get this formula for V(t,s) out of assumptions 1), 2) and 3).

The proof goes like this. By 2) we know

V(t,s) = F(s-t)

for some function F. By 1) it follows that

\displaystyle{ \frac{F(s_2 - t)}{F(s_1 - t)} }

is independent of t, so

\displaystyle{ \frac{F(s_2 - t)}{F(s_1 - t)} =  \frac{F(s_2)}{F(s_1)} }

or in other words

F(s_2 - t) F(s_1) = F(s_2) F(s_1 - t)

Ugh! What next? Well, if we take s_1 = t, we get a simpler equation that’s probably still good enough to get the job done:

F(s_2 - t) F(t) = F(s_2) F(0)

Now let’s make up a variable t' = s_2 - t, so that s_2 = t + t'. Then we can rewrite our equation as

F(t') F(t) = F(t+t') F(0)


F(t) F(t') = F(t+t') F(0)

This is beautiful except for the constant F(0). Let’s call that k and factor it out by writing

F(t) = k G(t)

Then we get

G(t) G(t') = G(t+t')

A theorem of Cauchy implies that any continuous solution of this equation is of the form

G(t) = \exp(-\alpha t)

So, we get

F(t) = k \exp(-\alpha t)


V(t,s) = k \exp(-\alpha(s-t))

as desired!

By the way, we don’t need to assume G is continuous: it’s enough to assume G is measurable. You can get bizarre nonmeasurable solutions of G(t) G(t') = G(t+t') using the axiom of choice, but they are not of practical interest.

So, assumption 3) is not the assumption I’d want to attack in trying to argue against exponential discounting. In fact both assumptions 1) and 2) are open to quite a few objections. Can you name some? Here’s one: in real life the interest rate changes with time. There must be some reason.

By the way, nothing in the argument I gave shows that \alpha \ge 0. So there could be people who obey assumptions 1)–3) yet believe the promise of a dollar in the future is worth more than a dollar in hand today.

Also, nothing in my argument for the form of V(t,s) assumes that s \ge t. That is, my assumptions as stated also concern the value of a dollar that was promised in the past. So, you might have fun seeing what changes, or does not change, if you restrict the assumptions to say they only apply when s \ge t. The arrow of time seems to be built into economics, after all.

Also, you may enjoy finding the place in my derivation where I might have divided by zero, and figure out to do about that.

If you don’t like exponential discounting—for example, because people use it to argue against spending money now to fight climate change—you might prefer hyperbolic discounting:

• Wikipedia, Hyperbolic discounting.

Epidemiological Modeling With Structured Cospans

19 October, 2020

This is a wonderful development! Micah Halter and Evan Patterson have taken my work on structured cospans with Kenny Courser and open Petri nets with Jade Master, together with Joachim Kock’s whole-grain Petri nets, and turned them into a practical software tool!

Then they used that to build a tool for ‘compositional’ modeling of the spread of infectious disease. By ‘compositional’, I mean that they make it easy to build more complex models by sticking together smaller, simpler models.

Even better, they’ve illustrated the use of this tool by rebuilding part of the model that the UK has been using to make policy decisions about COVID19.

All this software was written in the programming language Julia.

I had expected structured cospans to be useful in programming and modeling, but I didn’t expect it to happen so fast!

For details, read this great article:

• Micah Halter and Evan Patterson, Compositional epidemiological modeling using structured cospans, 17 October 2020.

Abstract. The field of applied category theory (ACT) aims to put the compositionality inherent to scientific and engineering processes on a firm mathematical footing. In this post, we show how the mathematics of ACT can be operationalized to build complex epidemiological models in a compositional way. In the first two sections, we review the idea of structured cospans, a formalism for turning closed systems into open ones, and we illustrate its use in Catlab through the simple example of open graphs. Finally, we put this machinery to work in the setting of Petri nets and epidemiological models. We construct a portion of the COEXIST model for the COVID-19 pandemic and we simulate the resulting ODEs.

You can see related articles by James Fairbanks, Owen Lynch and Evan Patterson here:

AlgebraicJulia Blog.

Also try these videos:

• James Fairbanks, AlgebraicJulia: Applied category theory in Julia, 29 July 2020.

• Evan Patterson, Realizing applied category theory in Julia, 16 January 2020.

I’m biased, but I think this is really cool cutting-edge stuff. If you want to do work along these lines let me know here and I’ll get Patterson to take a look.

Here’s part of a network created using their software:

Open Petri Nets and Their Categories of Processes

14 October, 2020

My student Jade Master will be talking about her work on open Petri nets at the online category theory seminar at UNAM on Wednesday October 21st at 18:00 UTC (11 am Pacific Time):

Open Petri Nets and Their Categories of Processes

Abstract. In this talk we will discuss Petri nets from a categorical perspective. A Petri net freely generates a symmetric monoidal category whose morphisms represent its executions. We will discuss how to make Petri nets ‘open’—i.e., equip them with input and output boundaries where resources can flow in and out. Open Petri nets freely generate open symmetric monoidal categories: symmetric monoidal categories which can be glued together along a shared boundary. The mapping from open Petri nets to their open symmetric monoidal categories is functorial and this gives a compositional framework for reasoning about the executions of Petri nets.

You can see the talk live, or later recorded, here:

You can read more about this work here:

• John Baez and Jade Master, Open Petri nets.

• Jade Master, Generalized Petri nets.

You can see Jade’s slides for a related talk here:

Open Petri nets

Abstract. The reachability semantics for Petri nets can be studied using open Petri nets. For us an ‘open’ Petri net is one with certain places designated as inputs and outputs via a cospan of sets. We can compose open Petri nets by gluing the outputs of one to the inputs of another. Open Petri nets can be treated as morphisms of a category \mathsf{Open}(\mathsf{Petri}), which becomes symmetric monoidal under disjoint union. However, since the composite of open Petri nets is defined only up to isomorphism, it is better to treat them as morphisms of a symmetric monoidal double category \mathbb{O}\mathbf{pen}(\mathsf{Petri}). Various choices of semantics for open Petri nets can be described using symmetric monoidal double functors out of \mathbb{O}\mathbf{pen}(\mathsf{Petri}). Here we describe the reachability semantics, which assigns to each open Petri net the relation saying which markings of the outputs can be obtained from a given marking of the inputs via a sequence of transitions. We show this semantics gives a symmetric monoidal lax double functor from \mathbb{O}\mathbf{pen}(\mathsf{Petri}) to the double category of relations. A key step in the proof is to treat Petri nets as presentations of symmetric monoidal categories; for this we use the work of Meseguer, Montanari, Sassone and others.

Recursive Saturation

13 October, 2020

Woo-hoo! Michael Weiss is teaching me about ‘recursively saturated’ models of Peano arithmetic:

• John Baez and Michael Weiss, Non-standard models of arithmetic 20, 12 October 2020.

Suppose you have a model of Peano arithemetic. Suppose you have an infinite list of predicates in Peano arithmetic: \phi_1(x), \phi_2(x), \dots. Suppose that for any finite subset of these, your model of PA has an element x making them true. Is there an element x making all these predicates true?

Of course this doesn’t hold in the standard model of PA. Consider this example:

x > 0, x > 1, x > 2, \dots

For any finite collection of these inequalities, we can find a standard natural number x large enough to make them true. But there’s no standard natural number that makes all these inequalities true.

On the other hand, for any nonstandard model of PA we can find an element x that obeys all of these:

x > 0, x > 1, x > 2, \dots

In fact this is the defining feature of a nonstandard model. (To be clear, I mean x > n where n ranges over standard natural numbers. A model of PA is nonstandard iff it contains an element greater than all the standard natural numbers.)

For a more interesting example, consider these predicates:

2|n, 3|n, 5|n, 7|n, 11|n, \dots

For any finite set of primes we can find a natural number divisible by all the primes in this set. We can’t find a standard natural number divisible by every prime, of course. But remarkably, in any nonstandard model of PA there is an element divisible by every prime—or more precisely, every standard prime.

In fact, suppose we have a model of PA, and an infinite list of predicates \phi_i in PA, and for any finite subset of these our model has an element x obeying the predicates in that subset. Then there is an element x obeying all the predicates \phi_i if:

  1. the model is nonstandard,
  2. you can write a computer program that lists all these predicates \phi_1(x), \phi_2(x), \dots,
  3. there’s an upper bound on the number of alternating quantifiers \forall \exists \forall \exists \cdots in the predicates \phi_i.

Intuitively, this result says that nonstandard models are very ‘fat’, containing nonstandard numbers with a wide range of properties. More technically, 2. says the predicates can be ‘recursively enumerated’, and this remarkable result is summarized by saying every nonstandard model of PA is ‘recursively \Sigma_n-saturated’.

In our conversation, Michael led me through the proof of this result. To do this, we used the fact that despite Tarski’s theorem on the undefinability of truth, truth is arithmetically definable for statements with any fixed upper bound on their number of alternating quantifiers! Michael had explained this end run around the undefinability of truth earlier, in Part 15 and Part 16 of our conversation.

Next we’ll show that if our nonstandard model is ‘ZF-standard’—that is, if it’s the \omega in some model of ZF—we can drop condition 3. above. So, in technical jargon, we’ll show that any nonstandard but ZF-standard model is ‘recursively saturated’.

I’m really enjoying these explorations of logic!

Decimal Digits of 1/π²

10 October, 2020

This formula may let you compute a decimal digit of 1/\pi^2 without computing all the previous digits:

\displaystyle{ \frac{1}{\pi^2} = \frac{2^5}{3} \sum_{n=0}^\infty \frac{(6n)!}{n!^6} (532n^2 + 126n + 9) \frac{1}{1000^{2n+1}}  }

It was discovered here:

• Gert Almkvist and Jesús Guillera, Ramanujan-like series for 1/\pi^2 and string theory, Experimental Mathematics, 21 (2012), 223–234.

They give some sort of argument for it, but apparently not a rigorous proof. Experts seem to believe it:

• Tito Piezas III, A compilation of Ramanujan-type formulas for 1/\pi^m.

It’s reminiscent of the famous Bailey–Borwein–Plouffe formula for \pi:

\displaystyle{ \pi = \sum_{n = 0}^\infty  \frac{1}{16^n} \left( \frac{4}{8n + 1} - \frac{2}{8n + 4} - \frac{1}{8n + 5} - \frac{1}{8n + 6} \right)  }

This lets you compute the nth hexadecimal digit of \pi without computing all the previous ones. It takes cleverness to do this, due to all those fractions.

A similar formula was found by Bellard:

\begin{array}{ccl}  \pi &=& \displaystyle{ \frac{1}{2^6} \sum_{n=0}^\infty \frac{(-1)^n}{2^{10n}} \, \left(-\frac{2^5}{4n+1} - \frac{1}{4n+3} +  \right. } \\ \\  & & \displaystyle{ \left. \frac{2^8}{10n+1} - \frac{2^6}{10n+3} - \frac{2^2}{10n+5} - \frac{2^2}{10n+7} + \frac{1}{10n+9} \right) } \end{array}

Between 1998 and 2000, the distributed computing project PiHex used Bellard’s formula to compute the quadrillionth bit of \pi, which turned out to be… [drum roll]…

A lot of work for nothing!

No formula of this sort is known that lets you compute individual decimal digits of \pi, but it’s cool that we can do it for 1/\pi^2, at least if Almkvist and Guillera’s formula is true.

Someday I’d like to understand any one of these Ramanujan-type formulas. The search for lucid conceptual clarity that makes me love category theory runs into a big challenge when it meets the work of Ramanujan! But it’s a worthwhile challenge. I started here with one of Ramanujan’s easiest formulas:

• John Baez, Chasing the Tail of the Gaussian: Part 1 and Part 2, The n-Category Café, 28 August 28 and 3 September 2020.

But the ideas involved in this formula all predate Ramanujan. For more challenging examples one could try this paper:

• Srinivasa Ramanujan, Modular equations and approximations to \pi, Quarterly Journal of Mathematics, XLV (1914), 350–372.

Here Ramanujan gave 17 formulas for pi, without proof. A friendly-looking explanation of one is given here:

• J. M. Borwein, P. B. Borwein and D. H. Bailey, Ramanujan, modular equations, and approximations to pi or How to compute one billion digits of pi, American Mathematical Monthly 96 (1989), 201–221.

So, this is where I’ll start!

Roger Penrose’s Nobel Prize

8 October, 2020

Roger Penrose just won Nobel Prize in Physics “for the discovery that black hole formation is a robust prediction of the general theory of relativity.” He shared it with Reinhard Genzel and Andrea Ghez, who won it “for the discovery of a supermassive compact object at the centre of our galaxy.”

This is great news! It’s a pity that Stephen Hawking is no longer alive, because if he were he would surely have shared in this prize. Hawking’s most impressive piece of work—his prediction of black hole evaporation—was too far from being experimentally confirmed to win a Nobel prize before his death. It still is today. The Nobel Prize is conservative in this way: it doesn’t go to theoretical developments that haven’t been experimentally confirmed. That makes a lot of sense. But sometimes they go overboard: Einstein never won a Nobel for general relativity or even special relativity. I consider that a scandal!

I’m glad that the Penrose–Hawking singularity theorems are considered Nobel-worthy. Let me just say a little about what Penrose and Hawking proved.

The most dramatic successful predictions of general relativity are black holes and the Big Bang. According to general relativity, as you follow a particle back in time toward the Big Bang or forward in time as it falls into a black hole, spacetime becomes more and more curved… and eventually it stops! This is roughly what we mean by a singularity. Penrose and Hawking made this idea mathematically precise, and proved that under reasonable assumptions singularities are inevitable in general relativity.

General relativity does not take quantum mechanics into account, so while Penrose and Hawking’s results are settled theorems, their applicability to our universe is not a settled fact. Many physicists hope that a theory of quantum gravity will save physics from singularities! Indeed this is one of the reasons physicist are fascinated by quantum gravity. But we know very little for sure about quantum gravity. So, it makes a lot of sense to work with general relativity as a mathematically precise theory and see what it says. That is what Hawking and Penrose did in their singularity theorems.

Let’s start with a quick introduction to general relativity, and then get an idea of why this theory predicts singularities are inevitable in certain situations.

General relativity says that spacetime is a 4-dimensional Lorentzian manifold. Thus, it can be covered by patches equipped with coordinates, so that in each patch we can describe points by lists of four numbers. Any curve \gamma(s) going through a point then has a tangent vector v whose components are v^\mu = d \gamma^\mu(s)/ds. Furthermore, given two tangent vectors v,w at the same point we can take their inner product

g(v,w) = g_{\mu \nu} v^\mu w^\nu

where as usual we sum over repeated indices, and g_{\mu \nu} is a 4 \times 4 matrix called the metric, depending smoothly on the point. We require that at any point we can find some coordinate system where this matrix takes the usual Minkowski form:

\displaystyle{  g = \left( \begin{array}{cccc} -1 & 0 &0 & 0 \\ 0 & 1 &0 & 0 \\ 0 & 0 &1 & 0 \\ 0 & 0 &0 & 1 \\ \end{array}\right) }

However, as soon as we move away from our chosen point, the form of the matrix g in these particular coordinates may change.

General relativity says how the metric is affected by matter. It does this in a single equation, Einstein’s equation, which relates the ‘curvature’ of the metric at any point to the flow of energy-momentum through that point. To define the curvature, we need some differential geometry. Indeed, Einstein had to learn this subject from his mathematician friend Marcel Grossman in order to write down his equation. Here I will take some shortcuts and try to explain Einstein’s equation with a bare minimum of differential geometry.

Consider a small round ball of test particles that are initially all at rest relative to each other. This requires a bit of explanation. First, because spacetime is curved, it only looks like Minkowski spacetime—the world of special relativity—in the limit of very small regions. The usual concepts of ’round’ and ‘at rest relative to each other’ only make sense in this limit. Thus, all our forthcoming statements are precise only in this limit, which of course relies on the fact that spacetime is a continuum.

Second, a test particle is a classical point particle with so little mass that while it is affected by gravity, its effects on the geometry of spacetime are negligible. We assume our test particles are affected only by gravity, no other forces. In general relativity this means that they move along timelike geodesics. Roughly speaking, these are paths that go slower than light and bend as little as possible. We can make this precise without much work.

For a path in space to be a geodesic means that if we slightly vary any small portion of it, it can only become longer. However, a path \gamma(s) in spacetime traced out by particle moving slower than light must be ‘timelike’, meaning that its tangent vector v = \gamma'(s) satisfies g(v,v) < 0. We define the proper time along such a path from s = s_0 to s = s_1 to be

\displaystyle{  \int_{s_0}^{s_1} \sqrt{-g(\gamma'(s),\gamma'(s))} \, ds }

This is the time ticked out by a clock moving along that path. A timelike path is a geodesic if the proper time can only decrease when we slightly vary any small portion of it. Particle physicists prefer the opposite sign convention for the metric, and then we do not need the minus sign under the square root. But the fact remains the same: timelike geodesics locally maximize the proper time.

Actual particles are not test particles! First, the concept of test particle does not take quantum theory into account. Second, all known particles are affected by forces other than gravity. Third, any actual particle affects the geometry of the spacetime it inhabits. Test particles are just a mathematical trick for studying the geometry of spacetime. Still, a sufficiently light particle that is affected very little by forces other than gravity can be approximated by a test particle. For example, an artificial satellite moving through the Solar System behaves like a test particle if we ignore the solar wind, the radiation pressure of the Sun, and so on.

If we start with a small round ball consisting of many test particles that are initially all at rest relative to each other, to first order in time it will not change shape or size. However, to second order in time it can expand or shrink, due to the curvature of spacetime. It may also be stretched or squashed, becoming an ellipsoid. This should not be too surprising, because any linear transformation applied to a ball gives an ellipsoid.

Let V(t) be the volume of the ball after a time t has elapsed, where time is measured by a clock attached to the particle at the center of the ball. Then in units where c = 8 \pi G = 1, Einstein’s equation says:

\displaystyle{  \left.{\ddot V\over V} \right|_{t = 0} = -{1\over 2} \left( \begin{array}{l} {\rm flow \; of \;} t{\rm -momentum \; in \; the \;\,} t {\rm \,\; direction \;} + \\ {\rm flow \; of \;} x{\rm -momentum \; in \; the \;\,} x {\rm \; direction \;} + \\ {\rm flow \; of \;} y{\rm -momentum \; in \; the \;\,} y {\rm \; direction \;} + \\ {\rm flow \; of \;} z{\rm -momentum \; in \; the \;\,} z {\rm \; direction} \end{array} \right) }

These flows here are measured at the center of the ball at time zero, and the coordinates used here take advantage of the fact that to first order, at any one point, spacetime looks like Minkowski spacetime.

The flows in Einstein’s equation are the diagonal components of a 4 \times 4 matrix T called the ‘stress-energy tensor’. The components T_{\alpha \beta} of this matrix say how much momentum in the \alpha direction is flowing in the \beta direction through a given point of spacetime. Here \alpha and \beta range from 0 to 3, corresponding to the t,x,y and z coordinates.

For example, T_{00} is the flow of t-momentum in the t-direction. This is just the energy density, usually denoted \rho. The flow of x-momentum in the x-direction is the pressure in the x direction, denoted P_x, and similarly for y and z. You may be more familiar with direction-independent pressures, but it is easy to manufacture a situation where the pressure depends on the direction: just squeeze a book between your hands!

Thus, Einstein’s equation says

\displaystyle{ {\ddot V\over V} \Bigr|_{t = 0} = -{1\over 2} (\rho + P_x + P_y + P_z) }

It follows that positive energy density and positive pressure both curve spacetime in a way that makes a freely falling ball of point particles tend to shrink. Since E = mc^2 and we are working in units where c = 1, ordinary mass density counts as a form of energy density. Thus a massive object will make a swarm of freely falling particles at rest around it start to shrink. In short, gravity attracts.

Already from this, gravity seems dangerously inclined to create singularities. Suppose that instead of test particles we start with a stationary cloud of ‘dust’: a fluid of particles having nonzero energy density but no pressure, moving under the influence of gravity alone. The dust particles will still follow geodesics, but they will affect the geometry of spacetime. Their energy density will make the ball start to shrink. As it does, the energy density \rho will increase, so the ball will tend to shrink ever faster, approaching infinite density in a finite amount of time. This in turn makes the curvature of spacetime become infinite in a finite amount of time. The result is a ‘singularity’.

In reality, matter is affected by forces other than gravity. Repulsive forces may prevent gravitational collapse. However, this repulsion creates pressure, and Einstein’s equation says that pressure also creates gravitational attraction! In some circumstances this can overwhelm whatever repulsive forces are present. Then the matter collapses, leading to a singularity—at least according to general relativity.

When a star more than 8 times the mass of our Sun runs out of fuel, its core suddenly collapses. The surface is thrown off explosively in an event called a supernova. Most of the energy—the equivalent of thousands of Earth masses—is released in a ten-second burst of neutrinos, formed as a byproduct when protons and electrons combine to form neutrons. If the star’s mass is below 20 times that of our the Sun, its core crushes down to a large ball of neutrons with a crust of iron and other elements: a neutron star.

However, this ball is unstable if its mass exceeds the Tolman–Oppenheimer–Volkoff limit, somewhere between 1.5 and 3 times that of our Sun. Above this limit, gravity overwhelms the repulsive forces that hold up the neutron star. And indeed, no neutron stars heavier than 3 solar masses have been observed. Thus, for very heavy stars, the endpoint of collapse is not a neutron star, but something else: a black hole, an object that bends spacetime so much even light cannot escape.

If general relativity is correct, a black hole contains a singularity. Many physicists expect that general relativity breaks down inside a black hole, perhaps because of quantum effects that become important at strong gravitational fields. The singularity is considered a strong hint that this breakdown occurs. If so, the singularity may be a purely theoretical entity, not a real-world phenomenon. Nonetheless, everything we have observed about black holes matches what general relativity predicts.

The Tolman–Oppenheimer–Volkoff limit is not precisely known, because it depends on properties of nuclear matter that are not well understood. However, there are theorems that say singularities must occur in general relativity under certain conditions.

One of the first was proved by Raychauduri and Komar in the mid-1950’s. It applies only to ‘dust’, and indeed it is a precise version of our verbal argument above. It introduced the Raychauduri’s equation, which is the geometrical way of thinking about spacetime curvature as affecting the motion of a small ball of test particles. It shows that under suitable conditions, the energy density must approach infinity in a finite amount of time along the path traced out out by a dust particle.

The first required condition is that the flow of dust be initally converging, not expanding. The second condition, not mentioned in our verbal argument, is that the dust be ‘irrotational’, not swirling around. The third condition is that the dust particles be affected only by gravity, so that they move along geodesics. Due to the last two conditions, the Raychauduri–Komar theorem does not apply to collapsing stars.

The more modern singularity theorems eliminate these conditions. But they do so at a price: they require a more subtle concept of singularity! There are various possible ways to define this concept. They’re all a bit tricky, because a singularity is not a point or region in spacetime.

For our present purposes, we can define a singularity to be an ‘incomplete timelike or null geodesic’. As already explained, a timelike geodesic is the kind of path traced out by a test particle moving slower than light. Similarly, a null geodesic is the kind of path traced out by a test particle moving at the speed of light. We say a geodesic is ‘incomplete’ if it ceases to be well-defined after a finite amount of time. For example, general relativity says a test particle falling into a black hole follows an incomplete geodesic. In a rough-and-ready way, people say the particle ‘hits the singularity’. But the singularity is not a place in spacetime. What we really mean is that the particle’s path becomes undefined after a finite amount of time.

The first modern singularity theorem was proved by Penrose in 1965. It says that if space is infinite in extent, and light becomes trapped inside some bounded region, and no exotic matter is present to save the day, either a singularity or something even more bizarre must occur. This theorem applies to collapsing stars. When a star of sufficient mass collapses, general relativity says that its gravity becomes so strong that light becomes trapped inside some bounded region. We can then use Penrose’s theorem to analyze the possibilities.

Here is Penrose’s story of how he discovered this:

At that time I was at Birkbeck College, and a friend of mine, Ivor Robinson, whose an Englishman but he was working in Dallas, Texas at the time, and he was talking to me … I forget what it was … he was a very … he had a wonderful way with words and so he was talking to me, and we got to this crossroad and as we crossed the road he stopped talking as we were watching out for traffic. We got to the other side and then he started talking again. And then when he left I had this strange feeling of elation and I couldn’t quite work out why I was feeling like that. So I went through all the things that had happened to me during the day—you know, what I had for breakfast and goodness knows what—and finally it came to this point when I was crossing the street, and I realised that I had a certain idea, and this idea what the crucial characterisation of when a collapse had reached a point of no return, without assuming any symmetry or anything like that. So this is what I called a trapped surface. And this was the key thing, so I went back to my office and I sketched out a proof of the collapse theorem. The paper I wrote was not that long afterwards, which went to Physical Review Letters, and it was published in 1965 I think.

Shortly thereafter Hawking proved a second singularity theorem, which applies to the Big Bang. It says that if space is finite in extent, and no exotic matter is present, generically either a singularity or something even more bizarre must occur. The singularity here could be either a Big Bang in the past, a Big Crunch in the future, both—or possibly something else. Hawking also proved a version of his theorem that applies to certain Lorentzian manifolds where space is infinite in extent, as seems to be the case in our Universe. This version requires extra conditions.

There are some undefined phrases in my summary of the Penrose–Hawking singularity theorems, most notably these:

• ‘exotic matter’

• ‘something even more bizarre’.

In each case I mean something precise.

These singularity theorems precisely specify what is meant by ‘exotic matter’. All known forms of matter obey the ‘dominant energy condition’, which says that

|P_x|, \, |P_y|, \, |P_z| \le \rho

at all points and in all locally Minkowskian coordinates. Exotic matter is anything that violates this condition.

The Penrose–Hawking singularity theorems also say what counts as ‘something even more bizarre’. An example would be a closed timelike curve. A particle following such a path would move slower than light yet eventually reach the same point where it started—and not just the same point in space, but the same point in spacetime! If you could do this, perhaps you could wait, see if it would rain tomorrow, and then go back and decide whether to buy an umbrella today. There are certainly solutions of Einstein’s equation with closed timelike curves. The first interesting one was found by Einstein’s friend Gödel in 1949, as part of an attempt to probe the nature of time. However, closed timelike curves are generally considered less plausible than singularities.

In the Penrose–Hawking singularity theorems, ‘something even more bizarre’ means precisely this: spacetime is not ‘globally hyperbolic’. To understand this, we need to think about when we can predict the future or past given initial data. When studying field equations like Maxwell’s theory of electromagnetism or Einstein’s theory of gravity, physicists like to specify initial data on space at a given moment of time. However, in general relativity there is considerable freedom in how we choose a slice of spacetime and call it ‘space’. What should we require? For starters, we want a 3-dimensional submanifold S of spacetime that is ‘spacelike’: every vector v tangent to S should have g(v,v) > 0. However, we also want any timelike or null curve to hit S exactly once. A spacelike surface with this property is called a Cauchy surface, and a Lorentzian manifold containing a Cauchy surface is said to be globally hyperbolic. There are many theorems justifying the importance of this concept. Globally hyperbolicity excludes closed timelike curves, but also other bizarre behavior.

By now the original singularity theorems have been greatly generalized and clarified. Hawking and Penrose gave a unified treatment of both theorems in 1970, which you can read here:

• Stephen William Hawking and Roger Penrose, The singularities of gravitational collapse and cosmology, Proc. Royal Soc. London A 314 (1970), 529–548.

The 1973 textbook by Hawking and Ellis gives a systematic introduction to this subject. A paper by Garfinkle and Senovilla reviews the subject and its history up to 2015. Also try the first two chapters of this wonderful book:

• Stephen Hawking and Roger Penrose, The Nature of Space and Time, Princeton U. Press, 1996.

You can find the first chapter, by Hawking, here: it describes the singularity theorems. The second, by Penrose, discusses the nature of singlarities in general relativity.

I’m sure Penrose’s Nobel Lecture will also be worth watching. Three cheers to Roger Penrose!

Network Models

7 October, 2020

Good news: my student Joe Moeller will be taking a job at NIST, the National Institute of Standards and Technology! He’ll be working with Spencer Breiner and Eswaran Subrahmanian on categories in engineering and system design.

Joe Moeller will be talking about his work on ‘network models’ at the online category theory seminar at UNAM on Wednesday October 14th at 18:00 UTC (11 am Pacific Time):

Network Models

Abstract. Networks can be combined in various ways, such as overlaying one on top of another or setting two side by side. We introduce `network models’ to encode these ways of combining networks. Different network models describe different kinds of networks. We show that each network model gives rise to an operad, whose operations are ways of assembling a network of the given kind from smaller parts. Such operads, and their algebras, can serve as tools for designing networks. Technically, a network model is a lax symmetric monoidal functor from the free symmetric monoidal category on some set to Cat, and the construction of the corresponding operad proceeds via a symmetric monoidal version of the Grothendieck construction.

You can watch the talk here:

You can read more about network models here:

Complex adaptive system design (part 6).

and here’s the original paper:

• John Baez, John Foley, Blake Pollard and Joseph Moeller, Network models, Theory and Applications of Categories 35 (2020), 700–744.

Compositional Game Theory and Climate Microeconomics

5 October, 2020

guest post by Jules Hedges

Hi all

This is a post I’ve been putting off for a long time until I was sure I was ready. I am the “lead developer” of a thing called compositional game theory (CGT). It’s an approach to game theory based on category theory, but we are now at the point where you don’t need to know that anymore: it’s an approach to game theory that has certain specific benefits over the traditional approach.

I would like to start a conversation about “using my powers for good”. I am hoping particularly that it is possible to model microeconomic aspects of climate science. This seems to be a very small field and I’m not really hopeful that anyone on Azimuth will have the right background, but it’s worth a shot. The kind of thing I’m imagining (possibly completely wrongly) is to create models that will suggest when a technically-feasible solution is not socially feasible. Social dilemmas and tragedies of the commons are at the heart of the climate crisis, and modelling instances of them is in scope.

I have a software tool (https://github.com/jules-hedges/open-games-hs) that is designed to be an assistant for game-theoretic modelling. This I can’t emphasise enough: A human with expertise in game-theoretic modelling is the most important thing, CGT is merely an assistant. (Right now the tool also probably can’t be used without me being in the loop, but that’s not an inherent thing.)

To give an idea what sort of things CGT can do, my 2 current ongoing research collaborations are: (1) a social science project modelling examples of institution governance, and (2) a cryptoeconomics project modelling an attack against a protocol using bribes. On a technical level the best fit is for Bayesian games, which are finite-horizon, have common knowledge priors, and private knowledge with agents who do Bayesian updating.

A lot of the (believed) practical benefits of CGT come from the fact that the model is code (in a high level language designed specifically for expressing games) and thus the model can be structured according to existing wisdom for structuring code. Really stress-testing this claim is an ongoing research project. My tool does equilibrium-checking for all games (the technical term is “model checker”), and we’ve had some success doing other things by looping an equilibrium check over a parameter space. It makes no attempt to be an equilibrium solver, that is left for the human.

This is not me trying to push my pet project (I do that elsewhere) but me trying to find a niche where I can do some genuine good, even if small. If you are a microeconomist (or a social scientist who uses applied game theory) and share the goals of Azimuth, I would like to hear from you, even if it’s just for some discussion.

Fock Space Techniques for Stochastic Physics

2 October, 2020

I’ve been fascinated for a long time about the relation between classical probability theory and quantum mechanics. This story took a strange new turn when people discovered that stochastic Petri nets, good for describing classical probabilistic models of interacting entities, can also be described using ideas from the quantum field theory!

I’ll be talking about this at the online category theory seminar at UNAM, the National Autonomous University of Mexico, on Wednesday October 7th at 18:00 UTC (11 am Pacific Time):

Fock space techniques for stochastic physics

Abstract. Some ideas from quantum theory are beginning to percolate back to classical probability theory. For example, the master equation for a chemical reaction network—also known as a stochastic Petri net—describes particle interactions in a stochastic rather than quantum way. If we look at this equation from the perspective of quantum theory, this formalism turns out to involve creation and annihilation operators, coherent states and other well-known ideas—but with a few big differences.

You can watch the talk here:

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!

My students Joe Moeller and Jade Master will also be giving talks in this seminar—on Petri nets and structured cospans.


Fisher’s Fundamental Theorem (Part 2)

29 September, 2020

Here’s how Fisher stated his fundamental theorem:

The rate of increase of fitness of any species is equal to the genetic variance in fitness.

But clearly this is only going to be true under some conditions!

A lot of early criticism of Fisher’s fundamental theorem centered on the fact that the fitness of a species can vary due to changing external conditions. For example: suppose the Sun goes supernova. The fitness of all organisms on Earth will suddenly drop. So the conclusions of Fisher’s theorem can’t hold under these circumstances.

I find this obvious and thus uninteresting. So, let’s tackle situations where the fitness changes due to changing external conditions later. But first let’s see what happens if the fitness isn’t changing for these external reasons.

What’s ‘fitness’, anyway? To define this we need a mathematical model of how populations change with time. We’ll start with a very simple, very general model. While it’s often used in population biology, it will have very little to do with biology per se. Indeed, the reason I’m digging into Fisher’s fundamental theorem is that it has a mathematical aspect that doesn’t require much knowledge of biology to understand. Applying it to biology introduces lots of complications and caveats, but that won’t be my main focus here. I’m looking for the simple abstract core.

The Lotka–Volterra equation

The Lotka–Volterra equation is a simplified model of how populations change with time. Suppose we have n different types of self-replicating entity. We will call these entities replicators. We will call the types of replicators species, but they do not need to be species in the biological sense!

For example, the replicators could be organisms of one single biological species, and the types could be different genotypes. Or the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains. In what follows these details won’t matter: we’ll have just have different ‘species’ of ‘replicators’.

Let P_i(t) or just P_i for short, be the population of the ith species at time t. We will treat this population as a differentiable real-valued function of time, which is a reasonable approximation when the population is fairly large.

Let’s assume the population obeys the Lotka–Volterra equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each function f_i depends in a differentiable way on all the populations. Thus each population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it depends on the populations of all the species.

We call f_i the fitness function of the ith species. Note: we are assuming this function does not depend on time.

To write the Lotka–Volterra equation more concisely, we can create a vector whose components are all the populations:

P = (P_1, \dots , P_n).

Let’s call this the population vector. In terms of the population vector, the Lotka–Volterra equation become

\displaystyle{ \dot P_i = f_i(P) P_i}

where the dot stands for a time derivative.

To define concepts like ‘mean fitness’ or ‘variance in fitness’ we need to introduce probability theory, and the replicator equation.

The replicator equation

Starting from the populations P_i, we can work out the probability p_i that a randomly chosen replicator belongs to the ith species. More precisely, this is the fraction of replicators belonging to that species:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

As a mnemonic, remember that the big Population P_i is being normalized to give a little probability p_i. I once had someone scold me for two minutes during a talk I was giving on this subject, for using lower-case and upper-case P’s to mean different things. But it’s my blog and I’ll do what I want to.

How do these probabilities p_i change with time? We can figure this out using the Lotka–Volterra equation. We pull out the trusty quotient rule and calculate:

\displaystyle{ \dot{p}_i = \frac{\dot{P}_i \left(\sum_j P_j\right) - P_i \left(\sum_j \dot{P}_j \right)}{\left(  \sum_j P_j \right)^2 } }

Then the Lotka–Volterra equation gives

\displaystyle{ \dot{p}_i = \frac{ f_i(P) P_i \; \left(\sum_j P_j\right) - P_i \left(\sum_j f_j(P) P_j \right)} {\left(  \sum_j P_j \right)^2 } }

Using the definition of p_i this simplifies and we get

\displaystyle{ \dot{p}_i =  f_i(P) p_i  - \left( \sum_j f_j(P) p_j \right) p_i }

The expression in parentheses here has a nice meaning: it is the mean fitness. In other words, it is the average, or expected, fitness of a replicator chosen at random from the whole population. Let us write it thus:

\displaystyle{ \overline f(P) = \sum_j f_j(P) p_j  }

This gives the replicator equation in its classic form:

\displaystyle{ \dot{p}_i = \left( f_i(P) - \overline f(P) \right) \, p_i }

where the dot stands for a time derivative. Thus, for the fraction of replicators of the ith species to increase, their fitness must exceed the mean fitness.

The moral is clear:

To become numerous you have to be fit.
To become predominant you have to be fitter than average.

This picture by David Wakeham illustrates the idea:

The fundamental theorem

What does the fundamental theorem of natural selection say, in this context? It says the rate of increase in mean fitness is equal to the variance of the fitness. As an equation, it says this:

\displaystyle{ \frac{d}{d t} \overline f(P) = \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

The left hand side is the rate of increase in mean fitness—or decrease, if it’s negative. The right hand side is the variance of the fitness: the thing whose square root is the standard deviation. This can never be negative!

A little calculation suggests that there’s no way in the world that this equation can be true without extra assumptions!

We can start computing the left hand side:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j(P) p_j } \\  \\  &=& \displaystyle{ \sum_j  \frac{d f_j(P)}{d t} \, p_j \; + \; f_j(P) \, \frac{d p_j}{d t} } \\ \\  &=& \displaystyle{ \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j \; + \; f_j(P) \dot{p}_j }  \end{array}

Before your eyes glaze over, let’s look at the two terms and think about what they mean. The first term says: the mean fitness will change since the fitnesses f_j(P) depend on P, which is changing. The second term says: the mean fitness will change since the fraction p_j of replicators that are in the jth species is changing.

We could continue the computation by using the Lotka–Volterra equation for \dot{P} and the replicator equation for \dot{p}. But it already looks like we’re doomed without invoking an extra assumption. The left hand side of Fisher’s fundamental theorem involves the gradients of the fitness functions, \nabla f_j(P). The right hand side:

\displaystyle{ \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j  }

does not!

This suggests an extra assumption we can make. Let’s assume those gradients \nabla f_j vanish!

In other words, let’s assume that the fitness of each replicator is a constant, independent of the populations:

f_j(P_1, \dots, P_n) = f_j

where f_j at right is just a number.

Then we can redo our computation of the rate of change of mean fitness. The gradient term doesn’t appear:

\begin{array}{ccl} \displaystyle{\frac{d}{d t} \overline f(P)}  &=&  \displaystyle{ \frac{d}{d t} \sum_j f_j p_j } \\  \\  &=& \displaystyle{ \sum_j f_j \dot{p}_j }  \end{array}

We can use the replicator equation for \dot{p}_j and get

\begin{array}{ccl} \displaystyle{ \frac{d}{d t} \overline f } &=&  \displaystyle{ \sum_j f_j \Big( f_j - \overline f \Big) p_j } \\ \\  &=& \displaystyle{ \sum_j f_j^2 p_j - f_j p_j  \overline f  } \\ \\  &=& \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

This is the mean of the squares of the f_j minus the square of their mean. And if you’ve done enough probability theory, you’ll recognize this as the variance! Remember, the variance is

\begin{array}{ccl} \displaystyle{ \sum_j \Big( f_j - \overline f \Big)^2 \, p_j  }  &=&  \displaystyle{ \sum_j f_j^2 \, p_j - 2 f_j \overline f \, p_j + \overline f^2 p_j } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 \, p_j\Big) - 2 \overline f^2 + \overline f^2 } \\ \\  &=&  \displaystyle{ \Big(\sum_j f_j^2 p_j\Big) - \overline f^2  }  \end{array}

Same thing.

So, we’ve gotten a simple version of Fisher’s fundamental theorem. Given all the confusion swirling around this subject, let’s summarize it very clearly.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the equations

\displaystyle{ \dot P_i = f_i P_i}

for some constants f_i. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f = \sum_j f_j p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f) =  \sum_j \Big( f_j - \overline f \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance of the fitness:

\displaystyle{  \frac{d}{d t} \overline f = \mathrm{Var}(f) }

This is nice—but as you can see, our extra assumption that the fitness functions are constants has trivialized the problem. The equations

\displaystyle{ \dot P_i = f_i P_i}

are easy to solve: all the populations change exponentially with time. We’re not seeing any of the interesting features of population biology, or even of dynamical systems in general. The theorem is just an observation about a collection of exponential functions growing or shrinking at different rates.

So, we should look for a more interesting theorem in this vicinity! And we will.

Before I bid you adieu, let’s record a result we almost reached, but didn’t yet state. It’s stronger than the one I just stated. In this version we don’t assume the fitness functions are constant, so we keep the term involving their gradient.

Theorem. Suppose the functions P_i \colon \mathbb{R} \to (0,\infty) obey the Lotka–Volterra equations:

\displaystyle{ \dot P_i = f_i(P) P_i}

for some differentiable functions f_i \colon (0,\infty)^n \to \mathbb{R} called fitness functions. Define probabilities by

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Define the mean fitness by

\displaystyle{ \overline f(P)  = \sum_j f_j(P) p_j  }

and the variance of the fitness by

\displaystyle{ \mathrm{Var}(f(P)) =  \sum_j \Big( f_j(P) - \overline f(P) \Big)^2 \, p_j }

Then the time derivative of the mean fitness is the variance plus an extra term involving the gradients of the fitness functions:

\displaystyle{\frac{d}{d t} \overline f(P)}  =  \displaystyle{ \mathrm{Var}(f(P)) + \sum_j (\nabla f_j(P) \cdot \dot{P}) p_j }

The proof just amounts to cobbling together the calculations we have already done, and not assuming the gradient term vanishes.


After writing this blog article I looked for a nice picture to grace it. I found one here:

• David Wakeham, Replicators and Fisher’s fundamental theorem, 30 November 2017.

I was mildly chagrined to discover that he said most of what I just said more simply and cleanly… in part because he went straight to the case where the fitness functions are constants. But my mild chagrin was instantly offset by this remark:

Fisher likened the result to the second law of thermodynamics, but there is an amusing amount of disagreement about what Fisher meant and whether he was correct. Rather than look at Fisher’s tortuous proof (or the only slightly less tortuous results of latter-day interpreters) I’m going to look at a simpler setup due to John Baez, and (unlike Baez) use it to derive the original version of Fisher’s theorem.

So, I’m just catching up with Wakeham, but luckily an earlier blog article of mine helped him avoid “Fisher’s tortuous proof” and the “only slightly less tortuous results of latter-day interpreters”. We are making progress here!

(By the way, a quiz show I listen to recently asked about the difference between “tortuous” and “torturous”. They mean very different things, but this particular case either word would apply.)

My earlier blog article, in turn, was inspired by this paper:

• Marc Harper, Information geometry and evolutionary game theory.