Enayat on Nonstandard Numbers

21 September, 2020

Michael Weiss and I have been carrying on a dialog on nonstandard models of arithmetic, and after a long break we’re continuing, here:

• Michael Weiss and John Baez, Non-standard models of arithmetic (Part 18).

In this part we reach a goal we’ve been heading toward for a long time! We’ve been reading this paper:

• Ali Enayat, Standard models of arithmetic.

and we’ve finally gone through the main theorems and explained what they say. We’ll talk about the proofs later.

The simplest one is this:

• Every ZF-standard model of PA that is not V-standard is recursively saturated.

What does this theorem mean, roughly? Let me be very sketchy here, to keep things simple and give just a flavor of what’s going on.

Peano arithmetic is a well-known axiomatic theory of the natural numbers. People study different models of Peano arithmetic in some universe of sets, say U. If we fix our universe U there is typically one ‘standard’ model of Peano arithmetic, built using the set

\omega = \{0,\{0\},\{0,\{0\}\}, \dots \}

or in other words

\omega = \{0,1,2, \dots \}

All models of Peano arithmetic not isomorphic to this one are called ‘nonstandard’. You can show that any model of Peano arithmetic contains an isomorphic copy of standard model as an initial segment. This uniquely characterizes the standard model.

But different axioms for set theory give different concepts of U, the universe of sets. So the uniqueness of the standard model of Peano arithmetic is relative to that choice!

Let’s fix a choice of axioms for set theory: the Zermelo–Fraenkel or ‘ZF’ axioms are a popular choice. For the sake of discussion I’ll assume these axioms are consistent. (If they turn out not to be, I’ll correct this post.)

We can say the universe U is just what ZF is talking about, and only theorems of ZF count as things we know about U. Or, we can take a different attitude. After all, there are a couple of senses in which the ZF axioms don’t completely pin down the universe of sets.

First, there are statements in set theory that are neither provable nor disprovable from the ZF axioms. For any of these statements we’re free to assume it holds, or it doesn’t hold. We can add either it or its negation to the ZF axioms, and still get a consistent theory.

Second, a closely related sense in which the ZF axioms don’t uniquely pin down U is this: there are many different models of the ZF axioms.

Here I’m talking about models in some universe of sets, say V. This may seem circular! But it’s not really: first we choose some way to deal with set theory, and then we study models of the ZF axioms in this context. It’s a useful thing to do.

So fix this picture in mind. We start with a universe of sets V. Then we look at different models of ZF in V, each of which gives a universe U. U is sitting inside V, but from inside it looks like ‘the universe of all sets’.

Now, for each of these universes U we can study models of Peano arithmetic in U. And as I already explained, inside each U there will be a standard model of Peano arithmetic. But of course this depends on U.

So, we get lots of standard models of Peano arithmetic, one for each choice of U. Enayat calls these ZF-standard models of Peano arithmetic.

But there is one very special model of ZF in V, namely V itself. In other words, one choice of U is to take U = V. There’s a standard model of Peano arithmetic in V itself. This is an example of a ZF-standard model, but this is a very special one. Let’s call any model of Peano arithmetic isomorphic to this one V-standard.

Enayat’s theorem is about ZF-standard models of Peano arithmetic that aren’t V-standard. He shows that any ZF-standard model that’s not V-standard is ‘recursively saturated’.

What does it mean for a model M of Peano arithmetic to be ‘recursively saturated’? The idea is very roughly that ‘anything that can happen in any model, happens in M’.

Let me be a bit more precise. It means that if you write any computer program that prints out an infinite list of properties of an n-tuple of natural numbers, and there’s some model of Peano arithmetic that has an n-tuple with all these properties, then there’s an n-tuple of natural numbers in the model M with all these properties.

For example, there are models of Peano arithmetic that have a number x such that

0 < x
1 < x
2 < x
3 < x

and so on, ad infinitum. These are the nonstandard models. So a recursively saturated model must have such a number x. So it must be nonstandard.

In short, Enayat has found that ZF-standard models of Peano arithmetic in the universe V come in two drastically different kinds. They are either ‘as standard as possible’, namely V-standard. Or, they are ‘extremely rich’, containing n-tuples with all possible lists of consistent properties that you can print out with a computer program: they are recursively saturated.

I am probably almost as confused about this as you are. But Michael and I will dig into this more in our series of posts.

In fact we’ve been at this a while already. Here is a description of the whole series of posts so far:

Posts 1–10 are available as pdf files, formatted for small and medium screens.

Non-standard Models of Arithmetic 1: John kicks off the series by asking about recursively saturated models, and Michael says a bit about n-types and the overspill lemma. He also mentions the arithmetical hierarchy.

Non-standard Models of Arithmetic 2: John mention some references, and sets a goal: to understand this paper:

• Ali Enayat, Standard models of arithmetic.

John describes his dream: to show that “the” standard model is a much more nebulous notion than many seem to believe. He says a bit about the overspill lemma, and Joel David Hamkins gives a quick overview of saturation.

Non-standard Models of Arithmetic 3: A few remarks on the ultrafinitists Alexander Yessenin-Volpin and Edward Nelson; also Michael’s grad-school friend who used to argue that 7 might be nonstandard.

Non-standard Models of Arithmetic 4: Some back-and-forth on Enayat’s term “standard model” (or “ZF-standard model”) for the omega of a model of ZF. Philosophy starts to rear its head.

Non-standard Models of Arithmetic 5: Hamlet and Polonius talk math, and Michael holds forth on his philosophies of mathematics.

Non-standard Models of Arithmetic 6: John weighs in with why he finds “the standard model of Peano arithmetic” a problematic phrase. The Busy Beaver function is mentioned.

Non-standard Models of Arithmetic 7: We start on Enayat’s paper in earnest. Some throat-clearing about Axiom SM, standard models of ZF, inaccessible cardinals, and absoluteness. “As above, so below”: how ZF makes its “gravitational field” felt in PA.

Non-standard Models of Arithmetic 8: A bit about the Paris-Harrington and Goodstein theorems. In preparation, the equivalence (of sorts) between PA and ZF¬∞. The universe Vω of hereditarily finite sets and its correspondence with \mathbb{N}. A bit about Ramsey’s theorem (needed for Paris-Harrington). Finally, we touch on the different ways theories can be “equivalent”, thanks to a comment by Jeffrey Ketland.

Non-standard Models of Arithmetic 9: Michael sketches the proof of the Paris-Harrington theorem.

Non-Standard Models of Arithmetic 10: Ordinal analysis, the function growth hierarchies, and some fragments of PA. Some questions that neither of us knows how to answer.

Non-standard Models of Arithmetic 11: Back to Enayat’s paper: his definition of PAT for a recursive extension T of ZF. This uses the translation of formulas of PA into formulas of ZF, \varphi\mapsto \varphi^\mathbb{N}. Craig’s trick and Rosser’s trick.

Non-standard Models of Arithmetic 12: The strength of PAT for various T‘s. PAZF is equivalent to PAZFC+GCH, but PAZFI is strictly stronger than PAZF. (ZFI = ZF + “there exists an inaccessible cardinal”.)

Non-standard Models of Arithmetic 13: Enayat’s “natural” axiomatization of PAT, and his proof that this works. A digression into Tarski’s theorem on the undefinability of truth, and how to work around it. For example, while truth is not definable, we can define truth for statements with at most a fixed number of quantifiers.

Non-standard Models of Arithmetic 14: The previous post showed that PAT implies ΦT, where ΦT is Enayat’s “natural” axiomatization of PAT. Here we show the converse. We also interpret ΦT as saying, “Trust T”.

Non-standard Models of Arithmetic 15: We start to look at Truth (aka Satisfaction). Tarski gave a definition of Truth, and showed that Truth is undefinable. Less enigmatically put, there is no formula True(x) in the language of Peano arithmetic (L(PA)) that holds exactly for the Gödel numbers of true sentences of Peano arithmetic. On the other hand, Truth for Peano arithmetic can be formalized in the language of set theory (L(ZF)), and there are other work-arounds. We give an analogy with the Cantor diagonal argument.

Non-standard Models of Arithmetic 16: We look at the nitty-gritty of Trued(x), the formula in L(PA) that expresses truth in PA for formulas with parse-tree depth at most d. We see how the quantifiers “bleed through”, and why this prevents us from combining the whole series of Trued(x)’s into a single formula True(x). We also look at the variant SatΣn(x,y).

Non-standard Models of Arithmetic 17: More about how Trued evades Tarski’s undefinability theorem. The difference between Trued and SatΣn, and how it doesn’t matter for us. How Trued captures Truth for models of arithmetic: PA ⊢ Trued(⌜φ⌝) ↔ φ, for any φ of parse-tree depth at most d. Sketch of why this holds.

Non-standard Models of Arithmetic 18: The heart of Enayat’s paper: characterizing countable nonstandard T-standard models of PA (Prop. 6, Thm. 7, Cor. 8). Refresher on types. Meaning of ‘recursively saturated’. Standard meaning of ‘nonstandard’; standard and nonstandard meanings of ‘standard’.

Non-standard Models of Arithmetic 19: We marvel a bit over Enayat’s Prop. 6, and especially Cor. 8. The triple-decker sandwich, aka three-layer cake: ωUUV. More about why the omegas of standard models of ZF are standard. Refresher on ΦT. The smug confidence of a ZF-standard model.

Non-standard Models of Arithmetic 20: We start to develop the proof of Enayat’s Prop. 6. We get as far as a related result: any nonstandard model of PA is recursively d-saturated. (‘Recursively d-saturated’ is our user-friendly version of the professional-grade concept: recursively Σn-saturated.)


The Brownian Map

19 September, 2020

\phantom{x}

Nina Holden won the 2021 Maryam Mirzakhani New Frontiers Prize for her work on random surfaces and the mathematics of quantum gravity. I’d like to tell you what she did… but I’m so far behind I’ll just explain a bit of the background.

Suppose you randomly choose a triangulation of the sphere with n triangles. This is a purely combinatorial thing, but you can think of it as a metric space if each of the triangles is equilateral with all sides of length 1.

This is a distorted picture of what you might get, drawn by Jérémie Bettinelli:


The triangles are not drawn as equilateral, so we can fit this shape into 3d space. Visit Bettinelli’s page for images that you can rotate:

• Jérémie Bettinelli, Computer simulations of random maps.

I’ve described how to build a random space out of n triangles. In the limit n \to \infty, if you rescale the resulting space by a factor of n^{-1/4} so it doesn’t get bigger and bigger, it converges to a ‘random metric space’ with fascinating properties. It’s called the Brownian map.

This random metric space is on average so wrinkly and crinkly that ‘almost surely’—that is, with probability 1—its Hausdorff dimension is not 2 but 4. And yet it is almost surely homeomorphic to a sphere!

Rigorously proving this is hard: a mix of combinatorics, probability theory and geometry.

Ideas from physics are also important here. There’s a theory called
Liouville quantum gravity’ that describes these random 2-dimensional surfaces. So, physicists have ways of—nonrigorously—figuring out answers to some questions faster than the mathematicians!

A key step in understanding the Brownian map was this paper from 2013:

• Jean-François Le Gall, Uniqueness and universality of the Brownian map, Annals of Probability 41 (2013), 2880–2960.



The Brownian map is to surfaces what Brownian motion is to curves. For example, the Hausdorff dimension of Brownian motion is almost surely 2: twice the dimension of a smooth curve. For the Brownian map it’s almost surely 4, twice the dimension of a smooth surface.

Let me just say one more technical thing. There’s a ‘space of all compact metric spaces’, and the Brownian map is actually a probability measure on this space! It’s called the Gromov-Hausdorff space, and it itself is a metric space… but not compact. (So no, we don’t have a set containing itself as an element.)


There’s a lot more to say about this… but I haven’t gotten very close to understanding Nina Holden’s work yet. She wrote a 7-paper series leading up to this one:

• Nina Holden and Xin Sun, Convergence of uniform triangulations under the Cardy embedding.

They show that random triangulations of a disk can be chosen to a random metric on the disk which can also be obtained from Liouville quantum gravity.

This is a much easier place to start learning this general subject:

• Ewain Gwynne, Random surfaces and Liouville quantum gravity.

One reason I find all this interesting is that when I worked on ‘spin foam models’ of quantum gravity, we were trying to develop combinatorial theories of spacetime that had nice limits as the number of discrete building blocks approached infinity. We were studying theories much more complicated than random 2-dimensional triangulations, and it quickly became clear to me how much work it would be to carefully analyze these. So it’s interesting to see how mathematicians have entered this subject—starting with a much more tractable class of theories, which are already quite hard.

While the theory I just described gives random metric spaces whose Hausdorff dimension is twice their topological dimension, Liouville quantum gravity actually contains an adjustable parameter that lets you force these metric spaces to become less wrinkled, with lower Hausdorff dimension. Taming the behavior of random triangulations gets harder in higher dimensions. Renate Loll, Jan Ambjørn and others have argued that we need to work with Lorentzian rather than Riemannian geometries to get physically reasonable behavior. This approach to quantum gravity is called causal dynamical triangulations.


Open Systems: A Double Categorical Perspective (Part 2)

16 September, 2020

Back to Kenny Courser’s thesis:

• Kenny Courser, Open Systems: A Double Categorical Perspective, Ph.D. thesis, U. C. Riverside, 2020.

One thing Kenny does here is explain the flaws in a well-known framework for studying open systems: decorated cospans. Decorated cospans were developed by my student Brendan Fong. Since I was Brendan’s advisor at the time, a hefty helping of blame for not noticing the problems belongs to me! But luckily, Kenny doesn’t just point out the problems: he shows how to fix them. As a result, everything we’ve done with decorated cospans can be saved.

The main theorem on decorated cospans is correct; it’s just less useful than we’d hoped! The idea is to cook up a category where the morphisms are open systems. The objects of this category could be something simple like sets, but morphisms from X to Y could be something more interesting, like ‘open graphs’:

Here X and Y are mapped into a third set in the middle, but this set in the middle is the set of nodes of a graph. We say the set in the middle has been ‘decorated’ with the structure of a graph.

Here’s how the original theory of decorated cospans seeks to make this precise.

Fong’s Theorem. Suppose C is a category with finite colimits, and make C into a symmetric monoidal category with its coproduct as the tensor product. Suppose F\colon (C,+) \to (\mathrm{Set},\times) is a lax symmetric monoidal functor. Define an F-decorated cospan to be a cospan

in C together with an element of F(N). Then there is a symmetric monoidal category with

• objects of C as objects,
• isomorphism classes of F-decorated cospans as morphisms.

I won’t go into many details, but let me say how to compose two decorated spans, and also how this ‘isomorphism class’ business works.

Given two decorated cospans we compose their underlying cospans in the usual way, via pushout:

We get a cospan from X to Z. To decorate this we need an element of F(M +_Y N). So, we take the decorations we have on the cospans being composed, which together give an element of F(N) \times F(M), and apply this composite map:

F(N) \times F(M)  \longrightarrow F(N+M) \longrightarrow F(N+_Y M)

Here the first map, called the laxator, comes from the fact that F is a lax monoidal functor, while the second comes from applying F to the canonical map N+M \to N+_Y M.

Since composing cospans involves a pushout, which is defined via a universal property, the composite is only well-defined up to isomorphism. So, to get an actual category, we take isomorphism classes of decorated cospans as our morphisms.

Here an isomorphism of cospans is a commuting diagram like this:

where h is an isomorphism. If the first cospan here has a decoration d \in F(N) and the second has a decoration d' \in F(N'), then we have an isomorphism of decorated cospans if F(h)(d) = d'.

So, that’s the idea. The theorem is true, and it works fine for some applications—but not so well for others, like the example of open graphs!

Why not? Well, let’s look at this example in more detail. Given a finite set N, let’s define a graph on N to be a finite set E together with two functions s, t \colon E \to N. We call N the set of nodes, E the set of edges, and the functions s and t map each edge to its source and target, respectively. So, a graph on N is a way of choosing a graph whose set of nodes is N.

We can try to apply the above theorem taking

C = \mathrm{FinSet}

and letting

F \colon \mathrm{FinSet} \to \mathrm{Set}

be the functor sending each finite set N to the set of all graphs on N.

The first problem, which Brendan and I noticed right away, is that there’s not really a set of graphs on N. There’s a proper class! E ranges over all possible finite sets, and there’s not a set of all finite sets.

This should have set alarm bells ringing right away. But we used a standard dodge. In fact there are two. One is to replace \mathrm{FinSet} with an equivalent small category, and define a graph just as before but taking N and E to be objects in this equivalent category. Another is to invoke the axiom of universes. Either way, we get a set of graphs on each N.

Then Fong’s theorem applies, and we get a decorated cospan category with:

• ‘finite sets’ as objects,
• isomorphism classes of open graphs as morphisms.

Here I’m putting ‘finite sets’ in quotes because of the trickery I just mentioned, but it’s really not a big deal so I’ll stop now. An open graph has a finite set N of nodes, a finite set E of edges, maps s,t \colon E \to N saying the source and target of each edge, and two maps f \colon X \to N and g \colon Y \to N.

These last two maps are what make it an open graph going from X to Y:

Isomorphism classes of open graphs from X to Y are the morphisms from X to Y in our decorated cospan category.

But later, Kenny and I noticed a second problem. The concept of ‘isomorphic decorated cospan’ doesn’t behave well in this example: the concept of isomorphism is too narrowly defined!

Suppose you and I have isomorphic decorated cospans:

In the example at hand, this means you have a graph on the finite set N and I have a graph on the finite set N'. Call yours d \in F(N) and mine d' \in F(N').

We also have a bijection h \colon N \to N' such that

F(h)(d) = d'

What does this mean? I haven’t specified the functor F in detail so you’ll have to take my word for it, but it should be believable. It means that my graph is exactly like yours except that we replace the nodes of your graph, which are elements of N, by the elements of N' that they correspond to. But this means the edges of my graph must be exactly the same as the edges of yours. It’s not good enough for our graphs to have isomorphic sets of edges: they need to be equal!.

For a more precise account of this, with pictures, read the introduction to Chapter 3 of Kenny’s thesis.

So, our decorated cospan category has ‘too many morphisms’. Two open graphs will necessarily define different morphisms if they have different sets of edges.

This set Kenny and I to work on a new formalism, structured cospans, that avoids this problem. Later Kenny and Christina Vasilakopoulou also figured out a way to fix the decorated cospan formalism. Kenny’s thesis explains all this, and also how structured cospans are related to the ‘new, improved’ decorated cospans.

But then something else happened! Christina Vasilakopoulou was a postdoc at U.C. Riverside while all this was going on. She and my grad student Joe Moeller wrote a paper on something called the monoidal Grothendieck construction, which plays a key role in relating structured cospans to the new decorated cospans. But the anonymous referee of their paper pointed out another problem with the old decorated cospans!

Briefly, the problem is that the functor

F \colon (\mathrm{FinSet},+) \to (\mathrm{Set},\times)

that sends each N to the set of graphs having N as their set of vertices cannot be made lax monoidal in the desired way. To make F lax monoidal, we must pick a natural transformation called the laxator:

\phi_{N,M} \colon F(N) \times F(M) \to F(N+M)

I used this map earlier when explaining how to compose decorated cospans.

The idea seems straightforward enough: given a graph on N and a graph on M we get a graph on their disjoint union N+M. This is true, and there is a natural transformation \phi_{N,M} that does this.

But the definition of lax monoidal functor demands that the laxator make a certain hexagon commute! And it does not!

I won’t draw this hexagon here; you can see it at the link or in the intro to Chapter 3 of Kenny’s thesis, where he explains this problem. The problem arises because when we have three sets of edges, say E,E',E'', we typically have

(E + E') + E'' \ne E + (E' + E'')

There is a sneaky way to get around this problem, which he also explains: define graphs using a category equivalent to \mathrm{FinSet} where + is strictly associative, not just up to isomorphism!

This is good enough to make F lax monoidal. But Kenny noticed yet another problem: F is still not lax symmetric monoidal. If you try to show it is, you wind up needing two graphs to be equal: one with E+E' as its set of edges, and another with E'+E as its set of edges. These aren’t equal! And at this point it seems we hit a dead end. While there’s a category equivalent to \mathrm{FinSet} where + is strictly associative, there’s no such category where + is also strictly commutative.

In the end, by going through all these contortions, we can use a watered-down version of Fong’s theorem to get a category with open graphs as morphisms, and we can make it monoidal—but not symmetric monoidal.

It’s clear that something bad is going on here. We are not following the tao of mathematics. We keep wanting things to be equal, that should only be isomorphic. The problem is that we’re treating a graph on a set as extra structure on that set, when it’s actually extra stuff.

Structured cospans, and the new decorated cospans, are the solution! For example, the new decorated cospans let us use a category of graphs with a given set of nodes, instead of a mere set. Now F(N) is a category rather than a set. This category lets us talk about isomorphic graphs with the same set of vertices, and all our problems evaporate.


Part 1: an overview of Courser’s thesis and related papers.

Part 2: problems with decorated cospans.


Open Systems: A Double Categorical Perspective (Part 1)

15 August, 2020

My student Kenny Courser’s thesis has hit the arXiv:

• Kenny Courser, Open Systems: A Double Categorical Perspective, Ph.D. thesis, U. C. Riverside, 2020.

He’s been the driving force behind a lot of work on open systems and networks at U. C. Riverside. By the way, he’s looking for a job, so if you think you know a position that’s good for someone who can teach all kinds of math and also strong on applied category theory, give him or me a shout.

But let me describe his thesis.

His thesis is big! It lays out a general approach to open systems—systems that can interact with their environment. In this approach, you can attach open systems in series to form larger open systems, so they act like morphisms in a category:

But you can also study 2-morphisms between open systems. These describe ways to include a little system in a bigger one, or simplify a big complicated system down to smaller one:

To handle all this, Courser uses ‘double categories’, which he explains.

His formalism also lets you set two open systems side by side ‘in parallel’ and get a new open system:

To handle this, he uses ‘symmetric monoidal double categories’. He explains what these are, and how to get them. And he illustrates his setup with examples:

open Petri nets
open electrical circuits
open chemical reaction networks
open Markov processes

At a more technical level, Courser explains the problems with Brendan Fong’s and my work on decorated cospans and shows how to fix in not just one but two ways: using structured cospans, and using a new improved version of decorated cospans. He also shows that these two approaches are equivalent under fairly general conditions.

His thesis unifies a number of papers:

• Kenny Courser, A bicategory of decorated cospans, Theory and Applications of Categories 32 (2017), 995–1027.

• John Baez and Kenny Courser, Coarse-graining open Markov processes, Theory and Applications of Categories 33 (2018), 1223–1268. (Blog article here.)

• John Baez and Kenny Courser, Structured cospans. (Blog article here.)

• John Baez, Kenny Courser and Christina Vasilakopoulou, Structured versus decorated cospans.

The last, still being written, introduces the new improved decorated cospans and proves their equivalence to structured cospans under some conditions. For now you’ll have to read Kenny’s thesis to see how this works!

Next time I’ll explain the problems with the original decorated cospan formalism. Another nice thing about Kenny’s thesis is that it goes over a bunch of papers that were afflicted by these problems, and shows how to fix them.


Part 1: an overview of Courser’s thesis and related papers.

Part 2: problems with decorated cospans.


Diary, 2003-2020

8 August, 2020

I keep putting off organizing my written material, but with coronavirus I’m feeling more mortal than usual, so I’d like get this out into the world now:

• John Baez, Diary, 2003–2020.

Go ahead and grab a copy!

It’s got all my best tweets and Google+ posts, mainly explaining math and physics, but also my travel notes and other things… starting in 2003 with my ruminations on economics and ecology. It’s too big to read all at once, but I think you can dip into it more or less anywhere and pull out something fun.

It goes up to July 2020. It’s 2184 pages long.

I fixed a few problems like missing pictures, but there are probably more. If you let me know about them, I’ll fix them (if it’s easy).


Open Systems in Classical Mechanics

5 August, 2020

I think we need a ‘compositional’ approach to classical mechanics. A classical system is typically built from parts, and we describe the whole system by describing its parts and then saying how they are put together. But this aspect of classical mechanics is typically left informal. You learn how it works in a physics class by doing lots of homework problems, but the rules are never completely spelled out, which is one reason physics is hard.

I want an approach that makes the compositionality of classical mechanics formal: a category (or categories) where the morphisms are open classical systems—that is, classical systems with the ability to interact with the outside world—and composing these morphisms describes putting together open systems to form larger open systems.

There are actually two main approaches to classical mechanics: the Lagrangian approach, which describes the state of a system in terms of position and velocity, and the Hamiltonian approach, which describes the state of a system in terms of position and momentum. There’s a way to go from the first approach to the second, called the Legendre transformation. So we should have a least two categories, one for Lagrangian open systems and one for Hamiltonian open systems, and a functor from the first to the second.

That’s what this paper provides:

• John C. Baez, David Weisbart and Adam Yassine, Open systems in classical mechanics.

The basic idea is by not new—but there are some twists! I like treating open systems as cospans with extra structure. But in this case it makes more sense to use spans, since the space of states of a classical system maps to the space of states of any subsystem. We’ll compose these spans using pullbacks.

For example, suppose you have a spring with rocks at both ends:

spring

If it’s in 1-dimensional space, and we only care about the position and momentum of the two rocks (not vibrations of the spring), we can say the phase space of this system is the cotangent bundle T^\ast \mathbb{R}^2.

But this system has some interesting subsystems: the rocks at the ends! So we get a span. We could draw it like this:

span.jpg

but what I really mean is that we have a span of phase spaces:

span_2.jpg

Here the left-hand arrow maps the state of the whole system to the state of the left-hand rock, and the right-hand arrow maps the state of the whole system to the state of the right-hand rock. These maps are smooth maps between manifolds, but they’re better than that! They are Poisson maps between symplectic manifolds: that’s where the physics comes in. They’re also surjective.

Now suppose we have two such open systems. We can compose them, or ‘glue them together’, by identifying the right-hand rock of one with the left-hand rock of the other. We can draw this as follows:

two_spans.jpg

Now we have a big three-rock system on top, whose states map to states of our original two-rock systems, and then down to states of the individual rocks. This picture really stands for the following commutative diagram:

two_spans_2.jpg

Here the phase space of the big three-rock system on top is obtained as a pullback: that’s how we formalize the process of gluing together two open systems! We can then discard some information and get a span:

composite_span

Bravo! We’ve managed to build a more complicated open system by gluing together two simpler ones! Or in mathematical terms: we’ve taken two spans of symplectic manifolds, where the maps involved in are surjective Poisson maps, and composed them to get another such span.

Since we can compose them, it shouldn’t be surprising that there’s a category whose morphisms are such spans—or more precisely, isomorphism classes of such spans. But we can go further! We can equip all the symplectic manifolds in this story with Hamiltonians, to describe dynamics. And we get a category whose morphisms are open Hamiltonian systems, which we call \mathsf{HamSy}. This is Theorem 4.2 of our paper.

But be careful: to describe one of these open Hamiltonian systems, we need to choose a Hamiltonian not only on the symplectic manifold at the apex of the span, but also on the two symplectic manifolds at the bottom—its ‘feet’. We need this to be able to compute the new Hamiltonian we get when we compose, or glue together, two open Hamiltonian systems. If we just added Hamiltonians for two subsystems, we’d ‘double-count’ the energy when we glued them together.

This takes us further from the decorated cospan or structured cospan frameworks I’ve been talking about repeatedly on this blog. Using spans instead of cospans is not a big deal: a span in some category is just a cospan in the opposite category. What’s a bigger deal is that we’re decorating not just the apex of our spans with extra data, but its feet—and when we compose our spans, we need this data on the feet to compute the data for the apex of the new composite span.

Furthermore, doing pullbacks is subtler in categories of manifolds than in the categories I’d been using for decorated or structured cospans. To handle this nicely, my coauthors wrote a whole separate paper!

• David Weisbart and Adam Yassine, Constructing span categories from categories without pullbacks.

Anyway, in our present paper we get not only a category \mathsf{HamSy} of open Hamiltonian systems, but also a category \mathsf{LagSy} of open Lagrangian systems. So we can do both Hamiltonian and Lagrangian mechanics with open systems.

Moreover, they’re compatible! In classical mechanics we use the Legendre transformation to turn Lagrangian systems into their Hamiltonian counterparts. Now this becomes a functor:

\mathcal{L} \colon \mathsf{LagSy} \to \mathsf{HamSy}

That’s Theorem 5.5.

So, classical mechanics is becoming ‘compositional’. We can convert the Lagrangian descriptions of a bunch of little open systems into their Hamiltonian descriptions and then glue the results together, and we get the same answer as if we did that conversion on the whole big system. Thus, we’re starting to formalize the way physicists think about physical systems ‘one piece at a time’.


Linear Logic and Petri Nets

28 July, 2020

Wow! Elena Di Lavore and Xiaoyan Li explained how to make a category of Petri nets that’s a model of linear logic! I consider myself a sort of expert on Petri nets, but I didn’t know this stuff:

• Elena Di Lavore and Xiaoyan Li, Linear logic flavoured composition of Petri nets, The n-Category Café, 27 July 2020.

It has great pictures, too. Let me summarize a tiny bit.

A Petri net is a very simple thing. Here’s a Petri net that shows how healthy white blood cells (H), individual viruses (V) and infected white blood cells (I) interact when someone gets AIDS:

The yellow boxes are different kinds of things, called species; the aqua boxes are processes, called transitions.

There are different ways to form categories using Petri nets. Jade Master and I have focused on two:

1) How to turn a Petri net into a category where the morphisms say what the Petri net can do.

2) How to make a category with ‘open’ Petri nets as morphisms. Composing these lets you build big Petri nets from smaller pieces.

open_petri_2

Di Lavore and Li instead explain:

3) How to make a category with elementary Petri nets as objects: a morphism from one elementary Petri net to another turns each transition of the first into a transition of the second.

An elementary Petri net is one where each transition uses each species at most once as an input and at most once as an output:

nets

The category they get has lots of interesting structure! Like products, shown here:

product

In fact it has products, coproducts, two other monoidal structures, and exponentials—all fitting together in a wonderful way, as described by intuitionistic linear logic! To prove this, the key is to use Valeria de Paiva’s work on “Dialectica categories”. They explain how.

This is not original research: Elena Di Lavore and Xiaoyan Li wrote this blog article for the ACT2020 Adjoint School, and they’re explaining Carolyn Brown and Doug Gurr’s paper “A categorical linear framework for Petri nets”.

It’s worth comparing this paper:

• Uffe Engberg and Glynn Winskel, Petri nets as models of linear logic, in Colloquium on Trees in Algebra and Programming, Springer, Berlin, 1990, pp. 147–161.

Engberg and Winskel get a model of linear logic—or to be precise, a ‘commutative quantale’—by taking the category you get from a single Petri net as in item 1) and massaging it a bit. I explained it here:

• John Baez, Quantales from Petri nets, Azimuth, 6 October 2019.


Open Markov Processes

4 July, 2020

I gave a talk on some work I did with Kenny Courser. You can see slides of the talk, and also a video and the papers it’s based on:

Coarse-graining open Markov processes.

Abstract. We illustrate some new paradigms in applied category theory with the example of coarse-graining open Markov processes. Coarse-graining is a standard method of extracting a simpler Markov process from a more complicated one by identifying states. Here we extend coarse-graining to ‘open’ Markov processes: that is, those where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up an ordinary Markov process from smaller open pieces in two basic ways: composition, where we identify the outputs of one open Markov process with the inputs of another, and tensoring, where we set two open Markov processes side by side. These constructions make open Markov processes into the morphisms of a symmetric monoidal category. But we can go further and construct a symmetric monoidal double category where the 2-morphisms include ways of coarse-graining open Markov processes. We can describe the behavior of open Markov processes using double functors out of this double category.

For more, look at these:

• John Baez, Brendan Fong and Blake Pollard, A compositional framework for Markov processes. (Blog article here.)

• John Baez and Kenny Courser, Coarse-graining open Markov processes. (Blog article here.)

• Kenny Courser, Open Systems: A Double Categorical Perspective.


Getting to the Bottom of Noether’s Theorem

29 June, 2020

Most of us have been staying holed up at home lately. I spent the last month holed up writing a paper that expands on my talk at a conference honoring the centennial of Noether’s 1918 paper on symmetries and conservation laws. This made my confinement a lot more bearable. It was good getting back to this sort of mathematical physics after a long time spent on applied category theory. It turns out I really missed it.

While everyone at the conference kept emphasizing that Noether’s 1918 paper had two big theorems in it, my paper is just about the easy one—the one physicists call Noether’s theorem:

Getting to the bottom of Noether’s theorem.

People often summarize this theorem by saying “symmetries give conservation laws”. And that’s right, but it’s only true under some assumptions: for example, that the equations of motion come from a Lagrangian.

This leads to some interesting questions. For which types of physical theories do symmetries give conservation laws? What are we assuming about the world, if we assume it is described by a theories of this type? It’s hard to get to the bottom of these questions, but it’s worth trying.

We can prove versions of Noether’s theorem relating symmetries to conserved quantities in many frameworks. While a differential geometric framework is truer to Noether’s original vision, my paper studies the theorem algebraically, without mentioning Lagrangians.

Now, Atiyah said:

…algebra is to the geometer what you might call the Faustian offer. As you know, Faust in Goethe’s story was offered whatever he wanted (in his case the love of a beautiful woman), by the devil, in return for selling his soul. Algebra is the offer made by the devil to the mathematician. The devil says: I will give you this powerful machine, it will answer any question you like. All you need to do is give me your soul: give up geometry and you will have this marvellous machine.

While this is sometimes true, algebra is more than a computational tool: it allows us to express concepts in a very clear and distilled way. Furthermore, the geometrical framework developed for classical mechanics is not sufficient for quantum mechanics. An algebraic approach emphasizes the similarity between classical and quantum mechanics, clarifying their differences.

In talking about Noether’s theorem I keep using an interlocking trio of important concepts used to describe physical systems: ‘states’, ‘observables’ and `generators’. A physical system has a convex set of states, where convex linear combinations let us describe probabilistic mixtures of states. An observable is a real-valued quantity whose value depends—perhaps with some randomness—on the state. More precisely: an observable maps each state to a probability measure on the real line. A generator, on the other hand, is something that gives rise to a one-parameter group of transformations of the set of states—or dually, of the set of observables.

It’s easy to mix up observables and generators, but I want to distinguish them. When we say ‘the energy of the system is 7 joules’, we are treating energy as an observable: something you can measure. When we say ‘the Hamiltonian generates time translations’, we are treating the Hamiltonian as a generator.

In both classical mechanics and ordinary complex quantum mechanics we usually say the Hamiltonian is the energy, because we have a way to identify them. But observables and generators play distinct roles—and in some theories, such as real or quaternionic quantum mechanics, they are truly different. In all the theories I consider in my paper the set of observables is a Jordan algebra, while the set of generators is a Lie algebra. (Don’t worry, I explain what those are.)

When we can identify observables with generators, we can state Noether’s theorem as the following equivalence:


The generator a generates transformations that leave the
observable b fixed.

\Updownarrow

The generator b generates transformations that leave the observable a fixed.

In this beautifully symmetrical statement, we switch from thinking of a as the generator and b as the observable in the first part to thinking of b as the generator and a as the observable in the second part. Of course, this statement is true only under some conditions, and the goal of my paper is to better understand these conditions. But the most fundamental condition, I claim, is the ability to identify observables with generators.

In classical mechanics we treat observables as being the same as generators, by treating them as elements of a Poisson algebra, which is both a Jordan algebra and a Lie algebra. In quantum mechanics observables are not quite the same as generators. They are both elements of something called a ∗-algebra. Observables are self-adjoint, obeying

a^* = a

while generators are skew-adjoint, obeying

a^* = -a

The self-adjoint elements form a Jordan algebra, while the skew-adjoint elements form a Lie algebra.

In ordinary complex quantum mechanics we use a complex ∗-algebra. This lets us turn any self-adjoint element into a skew-adjoint one by multiplying it by \sqrt{-1}. Thus, the complex numbers let us identify observables with generators! In real and quaternionic quantum mechanics this identification is impossible, so the appearance of complex numbers in quantum mechanics is closely connected to Noether’s theorem.

In short, classical mechanics and ordinary complex quantum mechanics fit together in this sort of picture:

classical_and_quantum_mechanics

To dig deeper, it’s good to examine generators on their own: that is, Lie algebras. Lie algebras arise very naturally from the concept of ‘symmetry’. Any Lie group gives rise to a Lie algebra, and any element of this Lie algebra then generates a one-parameter family of transformations of that very same Lie algebra. This lets us state a version of Noether’s theorem solely in terms of generators:


The generator a generates transformations that leave the generator b fixed.

\Updownarrow

The generator b generates transformations that leave the generator a fixed.

And when we translate these statements into equations, their equivalence follows directly from this elementary property of the Lie bracket:


[a,b] = 0

\Updownarrow

[b,a] = 0

Thus, Noether’s theorem is almost automatic if we forget about observables and work solely with generators. The only questions left are: why should symmetries be described by Lie groups, and what is the meaning of this property of the Lie bracket?

In my paper I tackle both these questions, and point out that the Lie algebra formulation of Noether’s theorem comes from a more primitive group formulation, which says that whenever you have two group elements g and h,


g commutes with h.

\Updownarrow

h commutes with g.

That is: whenever you’ve got two ways of transforming a physical system, the first transformation is ‘conserved’ by second if and only if the second is conserved by the first!

However, observables are crucial in physics. Working solely with generators in order to make Noether’s theorem a tautology would be another sort of Faustian bargain. So, to really get to the bottom of Noether’s theorem, we need to understand the map from observables to generators. In ordinary quantum mechanics this comes from multiplication by i. But this just pushes the mystery back a notch: why should we be using the complex numbers in quantum mechanics?

For this it’s good to spend some time examining observables on their own: that is, Jordan algebras. Those of greatest importance in physics are the unital JB-algebras, which are unfortunately named not after me, but Jordan and Banach. These allow a unified approach to real, complex and quaternionic quantum mechanics, along with some more exotic theories. So, they let us study how the role of complex numbers in quantum mechanics is connected to Noether’s theorem.

Any unital JB-algebra O has a partial ordering: that is, we can talk about one observable being greater than or equal to another. With the help of this we can define states on O, and prove that any observable maps each state to a probability measure on the real line.

More surprisingly, any JB-algebra also gives rise to two Lie algebras. The smaller of these, say L, has elements that generate transformations of O that preserve all the structure of this unital JB-algebra. They also act on the set of states. Thus, elements of L truly deserve to be considered ‘generators’.

In a unital JB-algebra there is not always a way to reinterpret observables as generators. However, Alfsen and Shultz have defined the notion of a ‘dynamical correspondence’ for such an algebra, which is a well-behaved map

\psi \colon O \to L

One of the two conditions they impose on this map implies a version of Noether’s theorem. They prove that any JB-algebra with a dynamical correspondence gives a complex ∗-algebra where the observables are self-adjoint elements, the generators are skew-adjoint, and we can convert observables into generators by multiplying them by i.

This result is important, because the definition of JB-algebra does not involve the complex numbers, nor does the concept of dynamical correspondence. Rather, the role of the complex numbers in quantum mechanics emerges from a map from observables to generators that obeys conditions including Noether’s theorem!

To be a bit more precise, Alfsen and Shultz’s first condition on the map \psi \colon O \to L says that every observable a \in O generates transformations that leave a itself fixed. I call this the self-conservation principle. It implies Noether’s theorem.

However, in their definition of dynamical correspondence, Alfsen and Shultz also impose a second, more mysterious condition on the map \psi. I claim that that this condition is best understood in terms of the larger Lie algebra associated to a unital JB-algebra. As a vector space this is the direct sum

A = O \oplus L

but it’s equipped with a Lie bracket such that

[-,-] \colon L \times L \to L    \qquad [-,-] \colon L \times O \to O

[-,-] \colon O \times L \to O    \qquad [-,-] \colon O \times O \to L

As I mentioned, elements of L generate transformations of O that preserve all the structure on this unital JB-algebra. Elements of O also generate transformations of O, but these only preserve its vector space structure and partial ordering.

What’s the meaning of these other transformations? I claim they’re connected to statistical mechanics.

For example, consider ordinary quantum mechanics and let O be the unital JB-algebra of all bounded self-adjoint operators on a complex Hilbert space. Then L is the Lie algebra of all bounded skew-adjoint operators on this Hilbert space. There is a dynamical correpondence sending any observable H \in O to the generator \psi(H) = iH \in L, which then generates a one-parameter group of transformations of O like this:

a \mapsto e^{itH/\hbar} \, a \, e^{-itH/\hbar}  \qquad \forall t \in \mathbb{R}, a \in O

where \hbar is Planck’s constant. If H is the Hamiltonian of some system, this is the usual formula for time evolution of observables in the Heisenberg picture. But H also generates a one-parameter group of transformations of O as follows:

a \mapsto  e^{-\beta H/2} \, a \, e^{-\beta H/2}  \qquad \forall \beta \in \mathbb{R}, a \in O

Writing \beta = 1/kT where T is temperature and k is Boltzmann’s constant, I claim that these are ‘thermal transformations’. Acting on a state in thermal equilibrium at some temperature, these transformations produce states in thermal equilibrium at other temperatures (up to normalization).

The analogy between it/\hbar and 1/kT is often summarized by saying “inverse temperature is imaginary time”. The second condition in Alfsen and Shultz’s definition of dynamical correspondence is a way of capturing this principle in a way that does not explicitly mention the complex numbers. Thus, we may very roughly say their result explains the role of complex numbers in quantum mechanics starting from three assumptions:

• observables form Jordan algebra of a nice sort (a unital JB-algebra)

• the self-conservation principle (and thus Noether’s theorem)

• the relation between time and inverse temperature.

I still want to understand all of this more deeply, but the way statistical mechanics entered the game was surprising to me, so I feel I made a little progress.

I hope the paper is half as fun to read as it was to write! There’s a lot more in it than described here.


ACT2020 Program

27 June, 2020

Boston2

Applied Category Theory 2020 is coming up soon! After the Tutorial Day on Sunday July 6th, there will be talks from Monday July 7th to Friday July 10th. All talks will be live on Zoom and on YouTube. Recorded versions will appear on YouTube later.

Here is the program—click on it to download a more readable version:


Here are the talks! They come in three kinds: keynotes, regular presentations and short industry presentations. Within each I’ve listed them in alphabetical order by speaker: I believe the first author is the speaker.

This is gonna be fun.

Keynote presentations (35 minutes)

• Henry Adams, Johnathan Bush and Joshua Mirth, Operations on metric thickenings.

• Nicolas Blanco and Noam Zeilberger: Bifibrations of polycategories and classical linear logic.

• Bryce Clarke, Derek Elkins, Jeremy Gibbons, Fosco Loregian, Bartosz Milewski, Emily Pillmore and Mario Román: Profunctor optics, a categorical update.

• Tobias Fritz, Tomáš Gonda, Paolo Perrone and Eigil Rischel: Distribution functors, second-order stochastic dominance and the Blackwell–Sherman–Stein Theorem in categorical probability.

• Micah Halter, Evan Patterson, Andrew Baas and James Fairbanks: Compositional scientific computing with Catlab and SemanticModels.

• Joachim Kock: Whole-grain Petri nets and processes.

• Andre Kornell, Bert Lindenhovius and Michael Mislove: Quantum CPOs.

• Martha Lewis: Towards logical negation in compositional distributional semantics.

• Jade Master and John Baez: Open Petri nets.

• Lachlan McPheat, Mehrnoosh Sadrzadeh, Hadi Wazni and Gijs Wijnholds, Categorical vector space semantics for Lambek calculus with a relevant modality.

• David Jaz Myers: Double categories of open dynamical systems.

• Toby St Clere Smithe, Cyber Kittens, or first steps towards categorical cybernetics.

Regular presentations (20 minutes)

• Robert Atkey, Bruno Gavranović, Neil Ghani, Clemens Kupke, Jeremy Ledent and Fredrik Nordvall Forsberg: Compositional game theory, compositionally.

• John Baez and Kenny Courser: Coarse-graining open Markov processes.

• Georgios Bakirtzis, Christina Vasilakopoulou and Cody Fleming, Compositional cyber-physical systems modeling.

• Marco Benini, Marco Perin, Alexander Alexander Schenkel and Lukas Woike: Categorification of algebraic quantum field theories.

• Daniel Cicala: Rewriting structured cospans.

• Bryce Clarke: A diagrammatic approach to symmetric lenses.

• Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi, Stefano Gogioso and Nicolo Chiappori: Quantum natural language processing.

• Geoffrey Cruttwell, Jonathan Gallagher and Dorette Pronk: Categorical semantics of a simple differential programming language.

• Swaraj Dash and Sam Staton: A monad for probabilistic point processes.

• Giovanni de Felice, Elena Di Lavore, Mario Román and Alexis Toumi: Functorial language games for question answering.

• Giovanni de Felice, Alexis Toumi and Bob Coecke: DisCoPy: monoidal categories in Python.

• Brendan Fong, David Jaz Myers and David I. Spivak: Behavioral mereology: a modal logic for passing constraints.

• Rocco Gangle, Gianluca Caterina and Fernando Tohme, A generic figures reconstruction of Peirce’s existential graphs (alpha).

• Jules Hedges and Philipp Zahn: Open games in practice.

• Jules Hedges: Non-compositionality in categorical systems theory.

• Michael Johnson and Robert Rosebrugh, The more legs the merrier: A new composition for symmetric (multi-)lenses.

• Joe Moeller, John Baez and John Foley: Petri nets with catalysts.

• John Nolan and Spencer Breiner, Symmetric monoidal categories with attributes.

• Joseph Razavi and Andrea Schalk: Gandy machines made easy via category theory.

• Callum Reader: Measures and enriched categories.

• Mario Román: Open diagrams via coend calculus.

• Luigi Santocanale, Dualizing sup-preserving endomaps of a complete lattice.

• Dan Shiebler: Categorical stochastic processes and likelihood.

• Richard Statman, Products in a category with only one object.

• David I. Spivak: Poly: An abundant categorical setting for mode-dependent dynamics.

• Christine Tasson and Martin Hyland, The linear-non-linear substitution 2-monad.

• Tarmo Uustalu, Niccolò Veltri and Noam Zeilberger: Proof theory of partially normal skew monoidal categories.

• Dmitry Vagner, David I. Spivak and Evan Patterson: Wiring diagrams as normal forms for computing in symmetric monoidal categories.

• Matthew Wilson, James Hefford, Guillaume Boisseau and Vincent Wang: The safari of update structures: visiting the lens and quantum enclosures.

• Paul Wilson and Fabio Zanasi: Reverse derivative ascent: a categorical approach to learning Boolean circuits.

• Vladimir Zamdzhiev: Computational adequacy for substructural lambda calculi.

• Gioele Zardini, David I. Spivak, Andrea Censi and Emilio Frazzoli: A compositional sheaf-theoretic framework for event-based systems.

Industry presentations (8 minutes)

• Arquimedes Canedo (Siemens Corporate Technology).

• Brendan Fong (Topos Institute).

• Jelle Herold (Statebox): Industrial strength CT.

• Steve Huntsman (BAE): Inhabiting the value proposition for category theory.

• Ilyas Khan (Cambridge Quantum Computing).

• Alan Ransil (Protocol Labs): Compositional data structures for the decentralized web.

• Alberto Speranzon (Honeywell).

• Ryan Wisnesky (Conexus): Categorical informatics at scale.