dollars in hand at time The constant is connected to the ‘interest rate’.

Why are economists so wedded to exponential discounting? The main reason is probably that it’s mathematically simple. But one argument for it goes roughly like this: if your decisions today are to look rational at any future time, you need to use exponential discounting.

In practice, humans, pigeons and rats do *not* use exponential discounting. So, economists say they are ‘dynamically inconsistent’:

• Wikipedia, Dynamic inconsistency.

In economics, dynamic inconsistency or time inconsistency is a situation in which a decision-maker’s preferences change over time in such a way that a preference can become inconsistent at another point in time. This can be thought of as there being many different “selves” within decision makers, with each “self” representing the decision-maker at a different point in time; the inconsistency occurs when not all preferences are aligned.

I this ‘inconsistent’ could be a misleading term for what’s going on here. It suggests that something *bad* is happening. That may not be true.

Anyway, some of the early research on this was done by George Ainslie, and here is what he found:

Ainslie’s research showed that a substantial number of subjects reported that they would prefer $50 immediately rather than $100 in six months, but would NOT prefer $50 in 3 months rather than $100 in nine months, even though this was the same choice seen at 3 months’ greater distance. More significantly, those subjects who said they preferred $50 in 3 months to $100 in 9 months said they would NOT prefer $50 in 12 months to $100 in 18 months—again, the same pair of options at a different distance—showing that the preference-reversal effect did not depend on the excitement of getting an immediate reward. Nor does it depend on human culture; the first preference reversal findings were in rats and pigeons.

Let me give a mathematical argument for exponential discounting. Of course it will rely on some assumptions. I’m not claiming these assumptions are true! Far from it. I’m just claiming that if we *don’t* use exponential discounting, we are violating one or more of these assumptions… or breaking out of the whole framework of my argument. The widespread prevalence of ‘dynamic inconsistency’ suggests that the argument doesn’t apply to real life.

Here’s the argument:

Suppose the value to us at any time of a dollar given to us at some other time is

Let us assume:

1) The ratio

is independent of E.g., the ratio of value of a “dollar on Friday” to “a dollar on Thursday” is the same if you’re computing it on Monday, or on Tuesday, or on Wednesday.

2) The quantity depends only on the difference

3) The quantity is a continuous function of and

Then we can show

for some constants and Typically we assume since the value of a dollar given to us right now is 1. But let’s just see how we get this formula for out of assumptions 1), 2) and 3).

The proof goes like this. By 2) we know

for some function . By 1) it follows that

is independent of so

or in other words

Ugh! What next? Well, if we take we get a simpler equation that’s probably still good enough to get the job done:

Now let’s make up a variable so that Then we can rewrite our equation as

or

This is beautiful except for the constant Let’s call that and factor it out by writing

Then we get

A theorem of Cauchy implies that any continuous solution of this equation is of the form

So, we get

or

as desired!

By the way, we don’t need to assume is continuous: it’s enough to assume is measurable. You can get bizarre nonmeasurable solutions of using the axiom of choice, but they are not of practical interest.

So, assumption 3) is not the assumption I’d want to attack in trying to argue against exponential discounting. In fact both assumptions 1) and 2) are open to quite a few objections. Can you name some? Here’s one: in real life the interest rate changes with time. There must be some reason.

By the way, nothing in the argument I gave shows that So there could be people who obey assumptions 1)–3) yet believe the promise of a dollar in the future is worth *more* than a dollar in hand today.

Also, nothing in my argument for the form of assumes that That is, my assumptions as stated also concern the value of a dollar that was promised in the *past*. So, you might have fun seeing what changes, or does not change, if you restrict the assumptions to say they only apply when The arrow of time seems to be built into economics, after all.

Also, you may enjoy finding the place in my derivation where I might have divided by zero, and figure out to do about that.

If you don’t like exponential discounting—for example, because people use it to argue against spending money *now* to fight climate change—you might prefer hyperbolic discounting:

• Wikipedia, Hyperbolic discounting.

]]>There’s a lot of excitement about a new approach to fusion power:

• Henry Fountain, Compact nuclear fusion reactor is ‘very likely to work,’ studies suggest, *The New York Times*, 29 September 2020.

Scientists developing a compact version of a nuclear fusion reactor have shown in a series of research papers that it should work, renewing hopes that the long-elusive goal of mimicking the way the sun produces energy might be achieved and eventually contribute to the fight against climate change.

Construction of a reactor, called SPARC, which is being developed by researchers at the Massachusetts Institute of Technology and a spinoff company, Commonwealth Fusion Systems, is expected to begin next spring and take three or four years, the researchers and company officials said.

Although many significant challenges remain, the company said construction would be followed by testing and, if successful, building of a power plant that could use fusion energy to generate electricity, beginning in the next decade.

This ambitious timetable is far faster than that of the world’s largest fusion-power project, a multinational effort in Southern France called ITER, for International Thermonuclear Experimental Reactor. That reactor has been under construction since 2013 and, although it is not designed to generate electricity, is expected to produce a fusion reaction by 2035.

But fusion has been twenty years off since the 1950s. What’s the evidence that Sparc will work? I guess most of the evidence is here—a series of seven papers, which luckily are available open-access:

• Status of the SPARC physics basics, *Journal of Plasma Physics* **86** (2020).

I have not read these! And even if I did, since I’m not an expert on fusion reactors—obviously a tricky subject—I’m not sure how much my impression would help.

Do you know any commentary on SPARC from other experts on fusion reactors? The more detailed, the better. All I’ve seen so far are very sketchy remarks from people who don’t seem to know what they’re talking about.

]]>Then they used *that* to build a tool for ‘compositional’ modeling of the spread of infectious disease. By ‘compositional’, I mean that they make it easy to build more complex models by sticking together smaller, simpler models.

Even better, they’ve illustrated the use of this tool by rebuilding part of the model that the UK has been using to make policy decisions about COVID19.

All this software was written in the programming language Julia.

I had expected structured cospans to be useful in programming and modeling, but I didn’t expect it to happen so fast!

For details, read this great article:

• Micah Halter and Evan Patterson, Compositional epidemiological modeling using structured cospans, 17 October 2020.

Abstract.The field of applied category theory (ACT) aims to put the compositionality inherent to scientific and engineering processes on a firm mathematical footing. In this post, we show how the mathematics of ACT can be operationalized to build complex epidemiological models in a compositional way. In the first two sections, we review the idea of structured cospans, a formalism for turning closed systems into open ones, and we illustrate its use in Catlab through the simple example of open graphs. Finally, we put this machinery to work in the setting of Petri nets and epidemiological models. We construct a portion of the COEXIST model for the COVID-19 pandemic and we simulate the resulting ODEs.

You can see related articles by James Fairbanks, Owen Lynch and Evan Patterson here:

Also try these videos:

• James Fairbanks, AlgebraicJulia: Applied category theory in Julia, 29 July 2020.

• Evan Patterson, Realizing applied category theory in Julia, 16 January 2020.

I’m biased, but I think this is really cool cutting-edge stuff. If you want to do work along these lines let me know here and I’ll get Patterson to take a look.

Here’s part of a network created using their software:

]]>

Open Petri Nets and Their Categories of Processes

Abstract.In this talk we will discuss Petri nets from a categorical perspective. A Petri net freely generates a symmetric monoidal category whose morphisms represent its executions. We will discuss how to make Petri nets ‘open’—i.e., equip them with input and output boundaries where resources can flow in and out. Open Petri nets freely generate open symmetric monoidal categories: symmetric monoidal categories which can be glued together along a shared boundary. The mapping from open Petri nets to their open symmetric monoidal categories is functorial and this gives a compositional framework for reasoning about the executions of Petri nets.

You can see the talk live, or later recorded, here:

You can read more about this work here:

• John Baez and Jade Master, Open Petri nets.

• Jade Master, Generalized Petri nets.

You can see Jade’s slides for a related talk here:

]]>

Abstract.The reachability semantics for Petri nets can be studied using open Petri nets. For us an ‘open’ Petri net is one with certain places designated as inputs and outputs via a cospan of sets. We can compose open Petri nets by gluing the outputs of one to the inputs of another. Open Petri nets can be treated as morphisms of a category which becomes symmetric monoidal under disjoint union. However, since the composite of open Petri nets is defined only up to isomorphism, it is better to treat them as morphisms of a symmetric monoidaldoublecategory Various choices of semantics for open Petri nets can be described using symmetric monoidal double functors out of Here we describe the reachability semantics, which assigns to each open Petri net the relation saying which markings of the outputs can be obtained from a given marking of the inputs via a sequence of transitions. We show this semantics gives a symmetric monoidal lax double functor from to the double category of relations. A key step in the proof is to treat Petri nets as presentations of symmetric monoidal categories; for this we use the work of Meseguer, Montanari, Sassone and others.

• John Baez and Michael Weiss, Non-standard models of arithmetic 20, 12 October 2020.

Suppose you have a model of Peano arithemetic. Suppose you have an infinite list of predicates in Peano arithmetic: Suppose that for any *finite* subset of these, your model of PA has an element making them true. Is there an element making *all* these predicates true?

Of course this doesn’t hold in the standard model of PA. Consider this example:

For any *finite* collection of these inequalities, we can find a standard natural number large enough to make them true. But there’s no standard natural number that makes *all* these inequalities true.

On the other hand, for any *nonstandard* model of PA we *can* find an element *x* that obeys all of these:

In fact this is the defining feature of a nonstandard model. (To be clear, I mean where ranges over *standard* natural numbers. A model of PA is nonstandard iff it contains an element greater than all the standard natural numbers.)

For a more interesting example, consider these predicates:

For any *finite* set of primes we can find a natural number divisible by all the primes in this set. We can’t find a standard natural number divisible by *every* prime, of course. But remarkably, in any nonstandard model of PA there *is* an element divisible by every prime—or more precisely, every standard prime.

In fact, suppose we have a model of PA, and an infinite list of predicates in PA, and for any finite subset of these our model has an element obeying the predicates in that subset. Then there is an element obeying *all* the predicates if:

- the model is nonstandard,
- you can write a computer program that lists all these predicates
- there’s an upper bound on the number of alternating quantifiers in the predicates

Intuitively, this result says that nonstandard models are very ‘fat’, containing nonstandard numbers with a wide range of properties. More technically, 2. says the predicates can be ‘recursively enumerated’, and this remarkable result is summarized by saying every nonstandard model of PA is ‘-saturated’.

In our conversation, Michael led me through the proof of this result. To do this, we used the fact that despite Tarski’s theorem on the undefinability of truth, truth *is* arithmetically definable for statements with any fixed upper bound on their number of alternating quantifiers! Michael had explained this end run around the undefinability of truth earlier, in Part 15 and Part 16 of our conversation.

Next we’ll show that if our nonstandard model is ‘ZF-standard’—that is, if it’s the in some model of ZF—we can drop condition 3. above. So, in technical jargon, we’ll show that any nonstandard but ZF-standard model is ‘recursively saturated’.

I’m really enjoying these explorations of logic!

]]>It was discovered here:

• Gert Almkvist and Jesús Guillera, Ramanujan-like series for and string theory, *Experimental Mathematics*, **21** (2012), 223–234.

They give some sort of argument for it, but apparently not a rigorous proof. Experts seem to believe it:

• Tito Piezas III, A compilation of Ramanujan-type formulas for

It’s reminiscent of the famous Bailey–Borwein–Plouffe formula for

This lets you compute the *n*th hexadecimal digit of without computing all the previous ones. It takes cleverness to do this, due to all those fractions.

A similar formula was found by Bellard:

Between 1998 and 2000, the distributed computing project PiHex used Bellard’s formula to compute the quadrillionth bit of which turned out to be… [drum roll]…

A lot of work for nothing!

No formula of this sort is known that lets you compute individual *decimal* digits of but it’s cool that we can do it for at least if Almkvist and Guillera’s formula is true.

Someday I’d like to understand any one of these Ramanujan-type formulas. The search for lucid conceptual clarity that makes me love category theory runs into a big challenge when it meets the work of Ramanujan! But it’s a worthwhile challenge. I started here with one of Ramanujan’s *easiest* formulas:

• John Baez, Chasing the Tail of the Gaussian: Part 1 and Part 2, *The n-Category Café*, 28 August 28 and 3 September 2020.

But the ideas involved in this formula all predate Ramanujan. For more challenging examples one could try this paper:

• Srinivasa Ramanujan, Modular equations and approximations to *Quarterly Journal of Mathematics*, **XLV** (1914), 350–372.

Here Ramanujan gave 17 formulas for pi, without proof. A friendly-looking explanation of one is given here:

• J. M. Borwein, P. B. Borwein and D. H. Bailey, Ramanujan, modular equations, and approximations to pi or How to compute one billion digits of pi, *American Mathematical Monthly* **96** (1989), 201–221.

So, this is where I’ll start!

]]>Roger Penrose just won Nobel Prize in Physics “for the discovery that black hole formation is a robust prediction of the general theory of relativity.” He shared it with Reinhard Genzel and Andrea Ghez, who won it “for the discovery of a supermassive compact object at the centre of our galaxy.”

This is great news! It’s a pity that Stephen Hawking is no longer alive, because if he were he would surely have shared in this prize. Hawking’s most impressive piece of work—his prediction of black hole evaporation—was too far from being experimentally confirmed to win a Nobel prize before his death. It still is today. The Nobel Prize is conservative in this way: it doesn’t go to theoretical developments that haven’t been experimentally confirmed. That makes a lot of sense. But sometimes they go overboard: Einstein never won a Nobel for general relativity or even special relativity. I consider that a scandal!

I’m glad that the Penrose–Hawking singularity theorems are considered Nobel-worthy. Let me just say a little about what Penrose and Hawking proved.

The most dramatic successful predictions of general relativity are black holes and the Big Bang. According to general relativity, as you follow a particle back in time toward the Big Bang or forward in time as it falls into a black hole, spacetime becomes more and more curved… and eventually it *stops!* This is roughly what we mean by a singularity. Penrose and Hawking made this idea mathematically precise, and proved that under reasonable assumptions singularities are inevitable in general relativity.

General relativity does not take quantum mechanics into account, so while Penrose and Hawking’s results are settled theorems, their applicability to *our universe* is not a settled fact. Many physicists hope that a theory of quantum gravity will save physics from singularities! Indeed this is one of the reasons physicist are fascinated by quantum gravity. But we know very little for sure about quantum gravity. So, it makes a lot of sense to work with general relativity as a mathematically precise theory and see what it says. That is what Hawking and Penrose did in their singularity theorems.

Let’s start with a quick introduction to general relativity, and then get an idea of why this theory predicts singularities are inevitable in certain situations.

General relativity says that spacetime is a 4-dimensional Lorentzian manifold. Thus, it can be covered by patches equipped with coordinates, so that in each patch we can describe points by lists of four numbers. Any curve going through a point then has a tangent vector whose components are Furthermore, given two tangent vectors at the same point we can take their inner product

where as usual we sum over repeated indices, and is a matrix called the metric, depending smoothly on the point. We require that at any point we can find some coordinate system where this matrix takes the usual Minkowski form:

However, as soon as we move away from our chosen point, the form of the matrix in these particular coordinates may change.

General relativity says how the metric is affected by matter. It does this in a single equation, Einstein’s equation, which relates the ‘curvature’ of the metric at any point to the flow of energy-momentum through that point. To define the curvature, we need some differential geometry. Indeed, Einstein had to learn this subject from his mathematician friend Marcel Grossman in order to write down his equation. Here I will take some shortcuts and try to explain Einstein’s equation with a bare minimum of differential geometry.

Consider a small round ball of test particles that are initially all at rest relative to each other. This requires a bit of explanation. First, because spacetime is curved, it only looks like Minkowski spacetime—the world of special relativity—in the limit of very small regions. The usual concepts of ’round’ and ‘at rest relative to each other’ only make sense in this limit. Thus, all our forthcoming statements are precise only in this limit, which of course relies on the fact that spacetime is a continuum.

Second, a test particle is a classical point particle with so little mass that while it is affected by gravity, its effects on the geometry of spacetime are negligible. We assume our test particles are affected only by gravity, no other forces. In general relativity this means that they move along timelike geodesics. Roughly speaking, these are paths that go slower than light and bend as little as possible. We can make this precise without much work.

For a path in *space* to be a geodesic means that if we slightly vary any small portion of it, it can only become longer. However, a path in *spacetime* traced out by particle moving slower than light must be ‘timelike’, meaning that its tangent vector satisfies We define the proper time along such a path from to to be

This is the time ticked out by a clock moving along that path. A timelike path is a geodesic if the proper time can only *decrease* when we slightly vary any small portion of it. Particle physicists prefer the opposite sign convention for the metric, and then we do not need the minus sign under the square root. But the fact remains the same: timelike geodesics locally maximize the proper time.

Actual particles are not test particles! First, the concept of test particle does not take quantum theory into account. Second, all known particles are affected by forces other than gravity. Third, any actual particle affects the geometry of the spacetime it inhabits. Test particles are just a mathematical trick for studying the geometry of spacetime. Still, a sufficiently light particle that is affected very little by forces other than gravity can be approximated by a test particle. For example, an artificial satellite moving through the Solar System behaves like a test particle if we ignore the solar wind, the radiation pressure of the Sun, and so on.

If we start with a small round ball consisting of many test particles that are initially all at rest relative to each other, to first order in time it will not change shape or size. However, to second order in time it can expand or shrink, due to the curvature of spacetime. It may also be stretched or squashed, becoming an ellipsoid. This should not be too surprising, because any linear transformation applied to a ball gives an ellipsoid.

Let be the volume of the ball after a time has elapsed, where time is measured by a clock attached to the particle at the center of the ball. Then in units where Einstein’s equation says:

These flows here are measured at the center of the ball at time zero, and the coordinates used here take advantage of the fact that to first order, at any one point, spacetime looks like Minkowski spacetime.

The flows in Einstein’s equation are the diagonal components of a matrix called the ‘stress-energy tensor’. The components of this matrix say how much momentum in the direction is flowing in the direction through a given point of spacetime. Here and range from to corresponding to the and coordinates.

For example, is the flow of -momentum in the -direction. This is just the energy density, usually denoted The flow of -momentum in the -direction is the pressure in the direction, denoted and similarly for and You may be more familiar with direction-independent pressures, but it is easy to manufacture a situation where the pressure depends on the direction: just squeeze a book between your hands!

Thus, Einstein’s equation says

It follows that positive energy density and positive pressure both curve spacetime in a way that makes a freely falling ball of point particles tend to shrink. Since and we are working in units where ordinary mass density counts as a form of energy density. Thus a massive object will make a swarm of freely falling particles at rest around it start to shrink. In short, *gravity attracts*.

Already from this, gravity seems dangerously inclined to create singularities. Suppose that instead of test particles we start with a stationary cloud of ‘dust’: a fluid of particles having nonzero energy density but no pressure, moving under the influence of gravity alone. The dust particles will still follow geodesics, but they will affect the geometry of spacetime. Their energy density will make the ball start to shrink. As it does, the energy density will increase, so the ball will tend to shrink ever faster, approaching infinite density in a finite amount of time. This in turn makes the curvature of spacetime become infinite in a finite amount of time. The result is a ‘singularity’.

In reality, matter is affected by forces other than gravity. Repulsive forces may prevent gravitational collapse. However, this repulsion creates pressure, and Einstein’s equation says that pressure also creates gravitational attraction! In some circumstances this can overwhelm whatever repulsive forces are present. Then the matter collapses, leading to a singularity—at least according to general relativity.

When a star more than 8 times the mass of our Sun runs out of fuel, its core suddenly collapses. The surface is thrown off explosively in an event called a supernova. Most of the energy—the equivalent of thousands of Earth masses—is released in a ten-second burst of neutrinos, formed as a byproduct when protons and electrons combine to form neutrons. If the star’s mass is below 20 times that of our the Sun, its core crushes down to a large ball of neutrons with a crust of iron and other elements: a neutron star.

However, this ball is unstable if its mass exceeds the Tolman–Oppenheimer–Volkoff limit, somewhere between 1.5 and 3 times that of our Sun. Above this limit, gravity overwhelms the repulsive forces that hold up the neutron star. And indeed, no neutron stars heavier than 3 solar masses have been observed. Thus, for very heavy stars, the endpoint of collapse is not a neutron star, but something else: a *black hole*, an object that bends spacetime so much even light cannot escape.

If general relativity is correct, a black hole contains a singularity. Many physicists expect that general relativity breaks down inside a black hole, perhaps because of quantum effects that become important at strong gravitational fields. The singularity is considered a strong hint that this breakdown occurs. If so, the singularity may be a purely theoretical entity, not a real-world phenomenon. Nonetheless, everything we have observed about black holes matches what general relativity predicts.

The Tolman–Oppenheimer–Volkoff limit is not precisely known, because it depends on properties of nuclear matter that are not well understood. However, there are theorems that say singularities *must* occur in general relativity under certain conditions.

One of the first was proved by Raychauduri and Komar in the mid-1950’s. It applies only to ‘dust’, and indeed it is a precise version of our verbal argument above. It introduced the Raychauduri’s equation, which is the geometrical way of thinking about spacetime curvature as affecting the motion of a small ball of test particles. It shows that under suitable conditions, the energy density must approach infinity in a finite amount of time along the path traced out out by a dust particle.

The first required condition is that the flow of dust be initally converging, not expanding. The second condition, not mentioned in our verbal argument, is that the dust be ‘irrotational’, not swirling around. The third condition is that the dust particles be affected only by gravity, so that they move along geodesics. Due to the last two conditions, the Raychauduri–Komar theorem does not apply to collapsing stars.

The more modern singularity theorems eliminate these conditions. But they do so at a price: they require a more subtle concept of singularity! There are various possible ways to define this concept. They’re all a bit tricky, because a singularity is not a point or region in spacetime.

For our present purposes, we can define a singularity to be an ‘incomplete timelike or null geodesic’. As already explained, a timelike geodesic is the kind of path traced out by a test particle moving slower than light. Similarly, a null geodesic is the kind of path traced out by a test particle moving at the speed of light. We say a geodesic is ‘incomplete’ if it ceases to be well-defined after a finite amount of time. For example, general relativity says a test particle falling into a black hole follows an incomplete geodesic. In a rough-and-ready way, people say the particle ‘hits the singularity’. But the singularity is not a place in spacetime. What we really mean is that the particle’s path becomes undefined after a finite amount of time.

The first modern singularity theorem was proved by Penrose in 1965. It says that if space is infinite in extent, and light becomes trapped inside some bounded region, and no exotic matter is present to save the day, either a singularity or something even more bizarre must occur. This theorem applies to collapsing stars. When a star of sufficient mass collapses, general relativity says that its gravity becomes so strong that light becomes trapped inside some bounded region. We can then use Penrose’s theorem to analyze the possibilities.

Here is Penrose’s story of how he discovered this:

At that time I was at Birkbeck College, and a friend of mine, Ivor Robinson, whose an Englishman but he was working in Dallas, Texas at the time, and he was talking to me … I forget what it was … he was a very … he had a wonderful way with words and so he was talking to me, and we got to this crossroad and as we crossed the road he stopped talking as we were watching out for traffic. We got to the other side and then he started talking again. And then when he left I had this strange feeling of elation and I couldn’t quite work out why I was feeling like that. So I went through all the things that had happened to me during the day—you know, what I had for breakfast and goodness knows what—and finally it came to this point when I was crossing the street, and I realised that I had a certain idea, and this idea what the crucial characterisation of when a collapse had reached a point of no return, without assuming any symmetry or anything like that. So this is what I called a trapped surface. And this was the key thing, so I went back to my office and I sketched out a proof of the collapse theorem. The paper I wrote was not that long afterwards, which went to Physical Review Letters, and it was published in 1965 I think.

Shortly thereafter Hawking proved a second singularity theorem, which applies to the Big Bang. It says that if space is finite in extent, and no exotic matter is present, generically either a singularity or something even more bizarre must occur. The singularity here could be either a Big Bang in the past, a Big Crunch in the future, both—or possibly something else. Hawking also proved a version of his theorem that applies to certain Lorentzian manifolds where space is infinite in extent, as seems to be the case in our Universe. This version requires extra conditions.

There are some undefined phrases in my summary of the Penrose–Hawking singularity theorems, most notably these:

• ‘exotic matter’

• ‘something even more bizarre’.

In each case I mean something precise.

These singularity theorems precisely specify what is meant by ‘exotic matter’. All known forms of matter obey the ‘dominant energy condition’, which says that

at all points and in all locally Minkowskian coordinates. Exotic matter is anything that violates this condition.

The Penrose–Hawking singularity theorems also say what counts as ‘something even more bizarre’. An example would be a closed timelike curve. A particle following such a path would move slower than light yet eventually reach the same point where it started—and not just the same point in space, but the same point in *spacetime!* If you could do this, perhaps you could wait, see if it would rain tomorrow, and then go back and decide whether to buy an umbrella today. There are certainly solutions of Einstein’s equation with closed timelike curves. The first interesting one was found by Einstein’s friend Gödel in 1949, as part of an attempt to probe the nature of time. However, closed timelike curves are generally considered less plausible than singularities.

In the Penrose–Hawking singularity theorems, ‘something even more bizarre’ means precisely this: spacetime is not ‘globally hyperbolic’. To understand this, we need to think about when we can predict the future or past given initial data. When studying field equations like Maxwell’s theory of electromagnetism or Einstein’s theory of gravity, physicists like to specify initial data on space at a given moment of time. However, in general relativity there is considerable freedom in how we choose a slice of spacetime and call it ‘space’. What should we require? For starters, we want a 3-dimensional submanifold of spacetime that is ‘spacelike’: every vector tangent to should have However, we also want any timelike or null curve to hit exactly once. A spacelike surface with this property is called a Cauchy surface, and a Lorentzian manifold containing a Cauchy surface is said to be globally hyperbolic. There are many theorems justifying the importance of this concept. Globally hyperbolicity excludes closed timelike curves, but also other bizarre behavior.

By now the original singularity theorems have been greatly generalized and clarified. Hawking and Penrose gave a unified treatment of both theorems in 1970, which you can read here:

• Stephen William Hawking and Roger Penrose, The singularities of gravitational collapse and cosmology, *Proc. Royal Soc. London A* **314** (1970), 529–548.

The 1973 textbook by Hawking and Ellis gives a systematic introduction to this subject. A paper by Garfinkle and Senovilla reviews the subject and its history up to 2015. Also try the first two chapters of this wonderful book:

• Stephen Hawking and Roger Penrose, *The Nature of Space and Time*, Princeton U. Press, 1996.

You can find the first chapter, by Hawking, here: it describes the singularity theorems. The second, by Penrose, discusses the nature of singlarities in general relativity.

I’m sure Penrose’s Nobel Lecture will also be worth watching. Three cheers to Roger Penrose!

]]>To broaden the scope of Fisher’s fundamental theorem we need to do one of two things:

1) change the left side of the equation: talk about some other quantity other than rate of change of mean fitness.

2) change the right side of the question: talk about some other quantity than the variance in fitness.

Or we could do both! People have spent a lot of time generalizing Fisher’s fundamental theorem. I don’t think there are, or should be, any hard rules on what counts as a generalization.

But today we’ll take alternative 1). We’ll show the square of something called the ‘Fisher speed’ *always* equals the variance in fitness. One nice thing about this result is that we can drop the restrictive condition I mentioned. Another nice thing is that the Fisher speed is a concept from information theory! It’s defined using the Fisher metric on the space of probability distributions.

And yes—that metric is named after the same guy who proved Fisher’s fundamental theorem! So, arguably, *Fisher* should have proved this generalization of Fisher’s fundamental theorem. But in fact it seems that I was the first to prove it, around February 1st, 2017. Some similar results were already known, and I will discuss those someday. But they’re a bit different.

A good way to think about the Fisher speed is that it’s ‘the rate at which information is being updated’. A population of replicators of different species gives a probability distribution. Like any probability distribution, this has information in it. As the populations of our replicators change, the Fisher speed says the rate at which this information is being updated. So, in simple terms, we’ll show

The square of the rate at which information is updated is equal to the variance in fitness.

This is quite a change from Fisher’s original idea, namely:

The rate of increase of mean fitness is equal to the variance in fitness.

But it has the advantage of always being true… as long the population dynamics are described by the general framework we introduced last time. So let me remind you of the general setup, and then prove the result!

We start out with population functions one for each species of replicator obeying the **Lotka–Volterra equation**

for some differentiable functions called **fitness functions**. The probability of a replicator being in the *i*th species is

Using the Lotka–Volterra equation we showed last time that these probabilities obey the **replicator equation**

Here is short for the whole list of populations and

is the **mean fitness**.

The space of probability distributions on the set is called the **(n-1)-simplex**

It’s called because it’s (n-1)-dimensional. When it looks like the letter

The **Fisher metric** is a Riemannian metric on the interior of the (n-1)-simplex. That is, given a point in the interior of and two tangent vectors at this point the Fisher metric gives a number

Here we are describing the tangent vectors as vectors in with the property that the sum of their components is zero: that’s what makes them tangent to the (n-1)-simplex. And we’re demanding that be in the interior of the simplex to avoid dividing by zero, since on the boundary of the simplex we have for at least one choice of $i.$

If we have a probability distribution moving around in the interior of the (n-1)-simplex as a function of time, its **Fisher speed** is

if the derivative exists. This is the usual formula for the speed of a curve moving in a Riemannian manifold, specialized to the case at hand.

Now we’ve got all the formulas we’ll need to prove the result we want. But for those who don’t already know and love it, it’s worthwhile saying a bit more about the Fisher metric.

The factor of in the Fisher metric changes the geometry of the simplex so that it becomes *round*, like a portion of a sphere:

But the reason the Fisher metric is important, I think, is its connection to relative information. Given two probability distributions the **information of relative to ** is

You can show this is the expected amount of information gained if was your prior distribution and you receive information that causes you to update your prior to So, sometimes it’s called the **information gain**. It’s also called **relative entropy** or—my least favorite, since it sounds so mysterious—the **Kullback–Leibler divergence**.

Suppose is a smooth curve in the interior of the (n-1)-simplex. We can ask the rate at which information is gained as time passes. Perhaps surprisingly, a calculation gives

That is, in some sense ‘to first order’ no information is being gained at any moment However, we have

So, the square of the Fisher speed has a nice interpretation in terms of relative entropy!

For a derivation of these last two equations, see Part 7 of my posts on information geometry. For more on the meaning of relative entropy, see Part 6.

It’s now extremely easy to show what we want, but let me state it formally so all the assumptions are crystal clear.

**Theorem.** Suppose the functions obey the Lotka–Volterra equations:

for some differentiable functions called fitness functions. Define probabilities and the mean fitness as above, and define the **variance of the fitness** by

Then if none of the populations are zero, the square of the Fisher speed of the probability distribution is the variance of the fitness:

**Proof.** The proof is near-instantaneous. We take the square of the Fisher speed:

and plug in the replicator equation:

We obtain:

as desired. █

It’s hard to imagine anything simpler than this. We see that given the Lotka–Volterra equation, what causes information to be updated is nothing more and nothing less than variance in fitness! But there are other variants of Fisher’s fundamental theorem worth discussing, so I’ll talk about those in future posts.

]]>Joe Moeller will be talking about his work on ‘network models’ at the online category theory seminar at UNAM on Wednesday October 14th at 18:00 UTC (11 am Pacific Time):

Network Models

Abstract.Networks can be combined in various ways, such as overlaying one on top of another or setting two side by side. We introduce `network models’ to encode these ways of combining networks. Different network models describe different kinds of networks. We show that each network model gives rise to an operad, whose operations are ways of assembling a network of the given kind from smaller parts. Such operads, and their algebras, can serve as tools for designing networks. Technically, a network model is a lax symmetric monoidal functor from the free symmetric monoidal category on some set to Cat, and the construction of the corresponding operad proceeds via a symmetric monoidal version of the Grothendieck construction.

You can watch the talk here:

You can read more about network models here:

• Complex adaptive system design (part 6).

and here’s the original paper:

• John Baez, John Foley, Blake Pollard and Joseph Moeller, Network models, *Theory and Applications of Categories* **35** (2020), 700–744.

• NIMBioS Adaptive Management Webinar Series, 2020 October 26-29 (Monday-Thursday).

Adaptive management seeks to determine sound management strategies in the face of uncertainty concerning the behavior of the system being managed. Specifically, it attempts to find strategies for managing dynamic systems while learning the behavior of the system. These webinars review the key concept of a Markov Decision Process (MDP) and demonstrate how quantitative adaptive management strategies can be developed using MDPs. Additional conceptual, computational and application aspects will be discussed, including dynamic programming and Bayesian formalization of learning.

Here are the topics:

Session 1: Introduction to decision problems

Session 2: Introduction to Markov decision processes (MDPs)

Session 3: Solving Markov decision processes (MDPs)

Session 4: Modeling beliefs

Session 5: Conjugacy and discrete model adaptive management (AM)

Session 6: More on AM problems (Dirichlet/multinomial and Gaussian prior/likelihood)

Session 7: Partially observable Markov decision processes (POMDPs)

Session 8: Frontier topics (projection methods, approximate DP, communicating solutions)

]]>