Open Systems in Classical Mechanics

5 August, 2020

I think we need a ‘compositional’ approach to classical mechanics. A classical system is typically built from parts, and we describe the whole system by describing its parts and then saying how they are put together. But this aspect of classical mechanics is typically left informal. You learn how it works in a physics class by doing lots of homework problems, but the rules are never completely spelled out, which is one reason physics is hard.

I want an approach that makes the compositionality of classical mechanics formal: a category (or categories) where the morphisms are open classical systems—that is, classical systems with the ability to interact with the outside world—and composing these morphisms describes putting together open systems to form larger open systems.

There are actually two main approaches to classical mechanics: the Lagrangian approach, which describes the state of a system in terms of position and velocity, and the Hamiltonian approach, which describes the state of a system in terms of position and momentum. There’s a way to go from the first approach to the second, called the Legendre transformation. So we should have a least two categories, one for Lagrangian open systems and one for Hamiltonian open systems, and a functor from the first to the second.

That’s what this paper provides:

• John C. Baez, David Weisbart and Adam Yassine, Open systems in classical mechanics.

The basic idea is by not new—but there are some twists! I like treating open systems as cospans with extra structure. But in this case it makes more sense to use spans, since the space of states of a classical system maps to the space of states of any subsystem. We’ll compose these spans using pullbacks.

For example, suppose you have a spring with rocks at both ends:


If it’s in 1-dimensional space, and we only care about the position and momentum of the two rocks (not vibrations of the spring), we can say the phase space of this system is the cotangent bundle T^\ast \mathbb{R}^2.

But this system has some interesting subsystems: the rocks at the ends! So we get a span. We could draw it like this:


but what I really mean is that we have a span of phase spaces:


Here the left-hand arrow maps the state of the whole system to the state of the left-hand rock, and the right-hand arrow maps the state of the whole system to the state of the right-hand rock. These maps are smooth maps between manifolds, but they’re better than that! They are Poisson maps between symplectic manifolds: that’s where the physics comes in. They’re also surjective.

Now suppose we have two such open systems. We can compose them, or ‘glue them together’, by identifying the right-hand rock of one with the left-hand rock of the other. We can draw this as follows:


Now we have a big three-rock system on top, whose states map to states of our original two-rock systems, and then down to states of the individual rocks. This picture really stands for the following commutative diagram:


Here the phase space of the big three-rock system on top is obtained as a pullback: that’s how we formalize the process of gluing together two open systems! We can then discard some information and get a span:


Bravo! We’ve managed to build a more complicated open system by gluing together two simpler ones! Or in mathematical terms: we’ve taken two spans of symplectic manifolds, where the maps involved in are surjective Poisson maps, and composed them to get another such span.

Since we can compose them, it shouldn’t be surprising that there’s a category whose morphisms are such spans—or more precisely, isomorphism classes of such spans. But we can go further! We can equip all the symplectic manifolds in this story with Hamiltonians, to describe dynamics. And we get a category whose morphisms are open Hamiltonian systems, which we call \mathsf{HamSy}. This is Theorem 4.2 of our paper.

But be careful: to describe one of these open Hamiltonian systems, we need to choose a Hamiltonian not only on the symplectic manifold at the apex of the span, but also on the two symplectic manifolds at the bottom—its ‘feet’. We need this to be able to compute the new Hamiltonian we get when we compose, or glue together, two open Hamiltonian systems. If we just added Hamiltonians for two subsystems, we’d ‘double-count’ the energy when we glued them together.

This takes us further from the decorated cospan or structured cospan frameworks I’ve been talking about repeatedly on this blog. Using spans instead of cospans is not a big deal: a span in some category is just a cospan in the opposite category. What’s a bigger deal is that we’re decorating not just the apex of our spans with extra data, but its feet—and when we compose our spans, we need this data on the feet to compute the data for the apex of the new composite span.

Furthermore, doing pullbacks is subtler in categories of manifolds than in the categories I’d been using for decorated or structured cospans. To handle this nicely, my coauthors wrote a whole separate paper!

• David Weisbart and Adam Yassine, Constructing span categories from categories without pullbacks.

Anyway, in our present paper we get not only a category \mathsf{HamSy} of open Hamiltonian systems, but also a category \mathsf{LagSy} of open Lagrangian systems. So we can do both Hamiltonian and Lagrangian mechanics with open systems.

Moreover, they’re compatible! In classical mechanics we use the Legendre transformation to turn Lagrangian systems into their Hamiltonian counterparts. Now this becomes a functor:

\mathcal{L} \colon \mathsf{LagSy} \to \mathsf{HamSy}

That’s Theorem 5.5.

So, classical mechanics is becoming ‘compositional’. We can convert the Lagrangian descriptions of a bunch of little open systems into their Hamiltonian descriptions and then glue the results together, and we get the same answer as if we did that conversion on the whole big system. Thus, we’re starting to formalize the way physicists think about physical systems ‘one piece at a time’.

Getting to the Bottom of Noether’s Theorem

29 June, 2020

Most of us have been staying holed up at home lately. I spent the last month holed up writing a paper that expands on my talk at a conference honoring the centennial of Noether’s 1918 paper on symmetries and conservation laws. This made my confinement a lot more bearable. It was good getting back to this sort of mathematical physics after a long time spent on applied category theory. It turns out I really missed it.

While everyone at the conference kept emphasizing that Noether’s 1918 paper had two big theorems in it, my paper is just about the easy one—the one physicists call Noether’s theorem:

Getting to the bottom of Noether’s theorem.

People often summarize this theorem by saying “symmetries give conservation laws”. And that’s right, but it’s only true under some assumptions: for example, that the equations of motion come from a Lagrangian.

This leads to some interesting questions. For which types of physical theories do symmetries give conservation laws? What are we assuming about the world, if we assume it is described by a theories of this type? It’s hard to get to the bottom of these questions, but it’s worth trying.

We can prove versions of Noether’s theorem relating symmetries to conserved quantities in many frameworks. While a differential geometric framework is truer to Noether’s original vision, my paper studies the theorem algebraically, without mentioning Lagrangians.

Now, Atiyah said:

…algebra is to the geometer what you might call the Faustian offer. As you know, Faust in Goethe’s story was offered whatever he wanted (in his case the love of a beautiful woman), by the devil, in return for selling his soul. Algebra is the offer made by the devil to the mathematician. The devil says: I will give you this powerful machine, it will answer any question you like. All you need to do is give me your soul: give up geometry and you will have this marvellous machine.

While this is sometimes true, algebra is more than a computational tool: it allows us to express concepts in a very clear and distilled way. Furthermore, the geometrical framework developed for classical mechanics is not sufficient for quantum mechanics. An algebraic approach emphasizes the similarity between classical and quantum mechanics, clarifying their differences.

In talking about Noether’s theorem I keep using an interlocking trio of important concepts used to describe physical systems: ‘states’, ‘observables’ and `generators’. A physical system has a convex set of states, where convex linear combinations let us describe probabilistic mixtures of states. An observable is a real-valued quantity whose value depends—perhaps with some randomness—on the state. More precisely: an observable maps each state to a probability measure on the real line. A generator, on the other hand, is something that gives rise to a one-parameter group of transformations of the set of states—or dually, of the set of observables.

It’s easy to mix up observables and generators, but I want to distinguish them. When we say ‘the energy of the system is 7 joules’, we are treating energy as an observable: something you can measure. When we say ‘the Hamiltonian generates time translations’, we are treating the Hamiltonian as a generator.

In both classical mechanics and ordinary complex quantum mechanics we usually say the Hamiltonian is the energy, because we have a way to identify them. But observables and generators play distinct roles—and in some theories, such as real or quaternionic quantum mechanics, they are truly different. In all the theories I consider in my paper the set of observables is a Jordan algebra, while the set of generators is a Lie algebra. (Don’t worry, I explain what those are.)

When we can identify observables with generators, we can state Noether’s theorem as the following equivalence:

The generator a generates transformations that leave the
observable b fixed.


The generator b generates transformations that leave the observable a fixed.

In this beautifully symmetrical statement, we switch from thinking of a as the generator and b as the observable in the first part to thinking of b as the generator and a as the observable in the second part. Of course, this statement is true only under some conditions, and the goal of my paper is to better understand these conditions. But the most fundamental condition, I claim, is the ability to identify observables with generators.

In classical mechanics we treat observables as being the same as generators, by treating them as elements of a Poisson algebra, which is both a Jordan algebra and a Lie algebra. In quantum mechanics observables are not quite the same as generators. They are both elements of something called a ∗-algebra. Observables are self-adjoint, obeying

a^* = a

while generators are skew-adjoint, obeying

a^* = -a

The self-adjoint elements form a Jordan algebra, while the skew-adjoint elements form a Lie algebra.

In ordinary complex quantum mechanics we use a complex ∗-algebra. This lets us turn any self-adjoint element into a skew-adjoint one by multiplying it by \sqrt{-1}. Thus, the complex numbers let us identify observables with generators! In real and quaternionic quantum mechanics this identification is impossible, so the appearance of complex numbers in quantum mechanics is closely connected to Noether’s theorem.

In short, classical mechanics and ordinary complex quantum mechanics fit together in this sort of picture:


To dig deeper, it’s good to examine generators on their own: that is, Lie algebras. Lie algebras arise very naturally from the concept of ‘symmetry’. Any Lie group gives rise to a Lie algebra, and any element of this Lie algebra then generates a one-parameter family of transformations of that very same Lie algebra. This lets us state a version of Noether’s theorem solely in terms of generators:

The generator a generates transformations that leave the generator b fixed.


The generator b generates transformations that leave the generator a fixed.

And when we translate these statements into equations, their equivalence follows directly from this elementary property of the Lie bracket:

[a,b] = 0


[b,a] = 0

Thus, Noether’s theorem is almost automatic if we forget about observables and work solely with generators. The only questions left are: why should symmetries be described by Lie groups, and what is the meaning of this property of the Lie bracket?

In my paper I tackle both these questions, and point out that the Lie algebra formulation of Noether’s theorem comes from a more primitive group formulation, which says that whenever you have two group elements g and h,

g commutes with h.


h commutes with g.

That is: whenever you’ve got two ways of transforming a physical system, the first transformation is ‘conserved’ by second if and only if the second is conserved by the first!

However, observables are crucial in physics. Working solely with generators in order to make Noether’s theorem a tautology would be another sort of Faustian bargain. So, to really get to the bottom of Noether’s theorem, we need to understand the map from observables to generators. In ordinary quantum mechanics this comes from multiplication by i. But this just pushes the mystery back a notch: why should we be using the complex numbers in quantum mechanics?

For this it’s good to spend some time examining observables on their own: that is, Jordan algebras. Those of greatest importance in physics are the unital JB-algebras, which are unfortunately named not after me, but Jordan and Banach. These allow a unified approach to real, complex and quaternionic quantum mechanics, along with some more exotic theories. So, they let us study how the role of complex numbers in quantum mechanics is connected to Noether’s theorem.

Any unital JB-algebra O has a partial ordering: that is, we can talk about one observable being greater than or equal to another. With the help of this we can define states on O, and prove that any observable maps each state to a probability measure on the real line.

More surprisingly, any JB-algebra also gives rise to two Lie algebras. The smaller of these, say L, has elements that generate transformations of O that preserve all the structure of this unital JB-algebra. They also act on the set of states. Thus, elements of L truly deserve to be considered ‘generators’.

In a unital JB-algebra there is not always a way to reinterpret observables as generators. However, Alfsen and Shultz have defined the notion of a ‘dynamical correspondence’ for such an algebra, which is a well-behaved map

\psi \colon O \to L

One of the two conditions they impose on this map implies a version of Noether’s theorem. They prove that any JB-algebra with a dynamical correspondence gives a complex ∗-algebra where the observables are self-adjoint elements, the generators are skew-adjoint, and we can convert observables into generators by multiplying them by i.

This result is important, because the definition of JB-algebra does not involve the complex numbers, nor does the concept of dynamical correspondence. Rather, the role of the complex numbers in quantum mechanics emerges from a map from observables to generators that obeys conditions including Noether’s theorem!

To be a bit more precise, Alfsen and Shultz’s first condition on the map \psi \colon O \to L says that every observable a \in O generates transformations that leave a itself fixed. I call this the self-conservation principle. It implies Noether’s theorem.

However, in their definition of dynamical correspondence, Alfsen and Shultz also impose a second, more mysterious condition on the map \psi. I claim that that this condition is best understood in terms of the larger Lie algebra associated to a unital JB-algebra. As a vector space this is the direct sum

A = O \oplus L

but it’s equipped with a Lie bracket such that

[-,-] \colon L \times L \to L    \qquad [-,-] \colon L \times O \to O

[-,-] \colon O \times L \to O    \qquad [-,-] \colon O \times O \to L

As I mentioned, elements of L generate transformations of O that preserve all the structure on this unital JB-algebra. Elements of O also generate transformations of O, but these only preserve its vector space structure and partial ordering.

What’s the meaning of these other transformations? I claim they’re connected to statistical mechanics.

For example, consider ordinary quantum mechanics and let O be the unital JB-algebra of all bounded self-adjoint operators on a complex Hilbert space. Then L is the Lie algebra of all bounded skew-adjoint operators on this Hilbert space. There is a dynamical correpondence sending any observable H \in O to the generator \psi(H) = iH \in L, which then generates a one-parameter group of transformations of O like this:

a \mapsto e^{itH/\hbar} \, a \, e^{-itH/\hbar}  \qquad \forall t \in \mathbb{R}, a \in O

where \hbar is Planck’s constant. If H is the Hamiltonian of some system, this is the usual formula for time evolution of observables in the Heisenberg picture. But H also generates a one-parameter group of transformations of O as follows:

a \mapsto  e^{-\beta H/2} \, a \, e^{-\beta H/2}  \qquad \forall \beta \in \mathbb{R}, a \in O

Writing \beta = 1/kT where T is temperature and k is Boltzmann’s constant, I claim that these are ‘thermal transformations’. Acting on a state in thermal equilibrium at some temperature, these transformations produce states in thermal equilibrium at other temperatures (up to normalization).

The analogy between it/\hbar and 1/kT is often summarized by saying “inverse temperature is imaginary time”. The second condition in Alfsen and Shultz’s definition of dynamical correspondence is a way of capturing this principle in a way that does not explicitly mention the complex numbers. Thus, we may very roughly say their result explains the role of complex numbers in quantum mechanics starting from three assumptions:

• observables form Jordan algebra of a nice sort (a unital JB-algebra)

• the self-conservation principle (and thus Noether’s theorem)

• the relation between time and inverse temperature.

I still want to understand all of this more deeply, but the way statistical mechanics entered the game was surprising to me, so I feel I made a little progress.

I hope the paper is half as fun to read as it was to write! There’s a lot more in it than described here.


12 June, 2020

I’m wondering if people talk about this. Maybe you know?

Given a self-adjoint operator H that’s bounded below and a density matrix D on some Hilbert space, we can define for any \beta > 0 a new density matrix

\displaystyle{ D_\beta = \frac{e^{-\beta H/2} \, D \, e^{-\beta H/2}}{\mathrm{tr}(e^{-\beta H/2} \, D \, e^{-\beta H/2})} }

I would like to call this the thermalization of D when H is a Hamiltonian and \beta = 1/kT where T is the temperature and k is Boltzmann’s constant.

For example, in the finite-dimensional case we can take D to be the identity matrix, normalized to have trace 1. Then D_\beta is the Gibbs state at temperature T: that is, the state of thermal equilibrium at temperature T.

But I want to know if you’ve seen people do this thermalization trick starting from some other density matrix D.

Formal Concepts vs Eigenvectors of Density Operators

7 May, 2020

In the seventh talk of the ACT@UCR seminar, Tai-Danae Bradley told us about applications of categorical quantum mechanics to formal concept analysis.

She gave her talk on Wednesday May 13th. Afterwards we discussed her talk at the Category Theory Community Server. You can see those discussions here if you become a member:

You can see her slides here, or download a video here, or watch the video here:

• Tai-Danae Bradley: Formal concepts vs. eigenvectors of density operators.

Abstract. In this talk, I’ll show how any probability distribution on a product of finite sets gives rise to a pair of linear maps called density operators, whose eigenvectors capture “concepts” inherent in the original probability distribution. In some cases, the eigenvectors coincide with a simple construction from lattice theory known as a formal concept. In general, the operators recover marginal probabilities on their diagonals, and the information stored in their eigenvectors is akin to conditional probability. This is useful in an applied setting, where the eigenvectors and eigenvalues can be glued together to reconstruct joint probabilities. This naturally leads to a tensor network model of the original distribution. I’ll explain these ideas from the ground up, starting with an introduction to formal concepts. Time permitting, I’ll also share how the same ideas lead to a simple framework for modeling hierarchy in natural language. As an aside, it’s known that formal concepts arise as an enriched version of a generalization of the Isbell completion of a category. Oftentimes, the construction is motivated by drawing an analogy with elementary linear algebra. I like to think of this talk as an application of the linear algebraic side of that analogy.

Her talk is based on her thesis:

• Tai-Danae Bradley, At the Interface of Algebra and Statistics.


Superfluid Quasicrystals

31 January, 2020

Condensed matter physics is so cool! Bounce 4 laser beams off mirrors to make an interference pattern with 8-fold symmetry. Put a Bose–Einstein condensate of potassium atoms into this “optical lattice” and you get a superfluid quasicrystal!

You see, no periodic pattern in the plane can have 8-fold symmetry, so the interference pattern of the light is ‘quasiperiodic’: it never repeats itself, thought it comes arbitrarily close, sort of like this pattern drawn by Greg Egan:

In the Bose–Einstein condensate all the particles have the same wavefunction, and the wavefunction itself, influenced by the light, also becomes quasiperiodic.

But that’s not all! As you increase the intensity of the lasers, the Bose-Einstein condensate suddenly collapses from a quasicrystal to a ‘localized’ state where all the atoms sit in the same place!

Below the gray curve is the potential V formed by the lasers, while the blue curve is the absolute value squared of the wavefunction of the Bose–Einstein condensate, |ψ0|2.

At top the lasers are off so V is zero and |ψ0|2 is constant. In the middle the lasers are on, but not too bright, so V and |ψ0| is quasiperiodic. At the bottom the lasers are brighter, so V is quasiperiodic and larger, and |ψ0|2 is localized.

It’s well known that when a crystal is sufficiently disordered, its electrons may localize: instead of having spread-out wavefunctions, they get trapped in specific regions as shown here:

This phenomenon is called ‘Anderson localization’, and it was discovered around 1958.

But when a Bose-Einstein condensate localizes, all the atoms get trapped in the same place—because they’re all in exactly the same state! This phenomenon was discovered experimentally at the University of Cambridge very recently:

• Matteo Sbroscia, Konrad Viebahn, Edward Carter, Jr-Chiun Yu, Alexander Gaunt and Ulrich Schneider, Observing localisation in a 2D quasicrystalline optical lattice.

The evidence for it is somewhat indirect, so I’m sure people will continue to study it. Localization of a Bose–Einstein condensate in a one-dimensional quasiperiodic potential was seen much earlier, in 2008:

• Giacomo Roati, Chiara D’Errico, Leonardo Fallani, Marco Fattori, Chiara Fort, Matteo Zaccanti, Giovanni Modugno, Michele Modugno and Massimo Inguscio, Anderson localization of a non-interacting Bose–Einstein condensate, Nature 453 (2008), 895–898.

The holy grail, a ‘Bose glass’, remains to be seen. It’s a Bose-Einstein condensate that’s also a glass: its wavefunctions is disordered rather than periodic or quasiperiodic.

New forms of matter with strange properties—I love ’em!

For more popularizations of these ideas, see:

• Julia C. Keller, Researchers create new form of matter—supersolid is crystalline and superfluid at the same time,, 3 March 2018.

• University of Texas at Dallas, Solid research leads physicists to propose new state of matter,, 9 April 2018.

The latter says “The term ‘superfluid quasicrystal’ sounds like something a comic-book villain might use to carry out his dastardly plans.”

Schrödinger and Einstein

5 January, 2020


Schrödinger and Einstein helped invent quantum mechanics. But they didn’t really believe in its implications for the structure of reality, so in their later years they couldn’t get themselves to simply use it like most of their colleagues. Thus, they were largely sidelined. While others made rapid progress in atomic, nuclear and particle physics, they spent a lot of energy criticizing and analyzing quantum theory.

They also spent a lot of time on ‘unified field theories’: theories that sought to unify gravity and electromagnetism, without taking quantum mechanics into account.

After he finally found his equations describing gravity in November 1915, Einstein spent years working out their consequences. In 1917 he changed the equations, introducing the ‘cosmological constant’ Λ to keep the universe from expanding. Whoops.

In 1923, Einstein got excited about attempts to unify gravity and electromagnetism. He wrote to Niels Bohr:

I believe I have finally understood the connection between electricity and gravitation. Eddington has come closer to the truth than Weyl.

You see, Hermann Weyl and Arthur Eddington had both tried to develop unified field theories—theories that unified gravity and electromagnetism. Weyl had tried a gauge theory—indeed, he invented the term ‘gauge transformations’ at this time. In 1918 he asked Einstein to communicate a paper on it to the Berlin Academy. Einstein did, but pointed out a crushing physical objection to it in a footnote!

In 1921, Eddington tried a theory where the fundamental field was not the spacetime metric, but a torsion-free connection. He tried to show that both electromagnetism and gravity could be described by such a theory. But he didn’t even get as far as writing down field equations.

Einstein wrote three papers on Eddington’s ideas in 1923. He was so excited that he sent the first to the Berlin Academy from a ship sailing from Japan! He wrote down field equations and sought to connect them to Maxwell’s equations and general relativity. He was very optimistic at this time, concluding that

Eddington’s general idea in context with the Hamiltonian principle leads to a theory almost free of ambiguities; it does justice to our present knowledge about gravitation and electricity and unifies both kinds of fields in a truly accomplished manner.

Later he noticed the flaws in the theory. He had an elaborate approach to getting charged particles from singular solutions of the equation, though he wished they could be described by nonsingular solutions. He was stumped by the fact that the negatively and positively charged particles he knew—the electron and proton—had different masses. The same problem afflicted Dirac later, until the positron was discovered. But there were also problems even in getting Maxwell’s equations and general relativity from this framework, even approximately.

By the 1925 his enthusiasm had faded. He wrote to his friend Besso:

Regrettably, I had to throw away my work in the spirit of Eddington. Anyway, I now am convinced that, unfortunately, nothing can be made with the complex of ideas by Weyl–Eddington.

So, he started work on another unified field theory. And another.

And another.

Einstein worked obsessively on unified field theories until his death in 1955. He lost touch with his colleagues’ discoveries in particle physics. He had an assistant, Valentine Bargmann, try to teach him quantum field theory—but he lost interest in a month. All he wanted was a geometrical explanation of gravity and electromagnetism. He never succeeded in this quest.

But there’s more to this story!

The other side of the story is Schrödinger. In the 1940s, he too became obsessed with unified field theories. He and Einstein became good friends—but also competitors in their quest to unify the forces of nature.

But let’s back up a bit. In June 1935, after the famous Einstein-Podolsky-Rosen paper arguing that quantum mechanics was incomplete, Schrödinger wrote to Einstein:

I am very happy that in the paper just published in P.R. you have evidently caught dogmatic q.m. by the coat-tails.

Einstein replied:

You are the only person with whom I am actually willing to come to terms.

They bonded over their philosophical opposition to the Bohr–Heisenberg attitude to quantum mechanics. In November 1935, Schrödinger wrote his paper on ‘Schrödinger’s cat‘.

Schrödinger fled Austria after the Nazis took over. In 1940 he got a job at the brand-new Dublin Institute for Advanced Studies.

In 1943 he started writing about unified field theories, corresponding with Einstein. He worked on some theories very similar to those of Einstein and Straus, who were trying to unify gravity and electromagnetism in a theory involving a connection with torsion, whose Levi-Civita symbol was therefore non-symmetric. He wrote 8 papers on this subject.

Einstein even sent Schrödinger two of his unpublished papers on these ideas!

In late 1946, Schrödinger had a new insight. He was thrilled.

By 1947 Schrödinger thought he’d made a breakthrough. He presented a paper on January 27th at the Dublin Institute of Advanced Studies. He even called a press conference to announce his new theory!

He predicted that a rotating mass would generate a magnetic field.

The story of the great discovery was quickly telegraphed around the world, and the science editor of the New York Times interview Einstein to see what he thought.

Einstein was not impressed. In a carefully prepared statement he shot Schrödinger down:

Einstein was especially annoyed that Schrödinger had called a press conference to announce his new theory before there was any evidence supporting it.

Wise words. I wish people heeded them!

Schrödinger apologized in a letter to Einstein, claiming that he’d done the press conference just to get a pay raise. Einstein responded curtly, saying “your theory does not really differ from mine”.

They stopped writing to each other for 3 years.

I’d like to understand Schrödinger’s theory using the modern tools of differential geometry. I don’t think it’s promising. I just want to know what it actually says, and what it predicts! Go here for details:

Schrödinger’s unified field theory, The n-Category Café, December 26, 2019.

For more on Schrödinger’s theory, try his book:

• Erwin Schrödinger, Space-Time Structure, Cambridge U. Press, Cambridge, 1950. Chapter XII: Generalizations of Einstein’s theory.

and his first paper on the theory:

• Erwin Schödinger, The final affine field laws I, Proceedings of the Royal Irish Academy A 51 (1945–1948), 163–171.

For a wonderfully detailed analysis of the history of unified field theories, including the work of Einstein and Schrödinger, read these:

• Hubert F. M. Goenner, On the history of unified field theories, Living Reviews in Relativity 7 (2004), article no. 2. On the history of unified field theories II (ca. 1930–ca. 1965), Living Reviews in Relativity 17 (2014), article no. 5.

especially Section 6 of the second paper. For more on the story of Einstein and Schrödinger, I recommend this wonderful book:

• Walter Moore, Schrödinger: Life and Thought, Cambridge U. Press, Cambridge, 1989.

This is where I got most of my quotes.

Foundations of Math and Physics One Century After Hilbert

10 October, 2019

I wrote a review of this book with chapters by Penrose, Witten, Connes, Atiyah, Smolin and others:

• John Baez, review of Foundations of Mathematics and Physics One Century After Hilbert: New Perspectives, edited by Joseph Kouneiher, Notices of the American Mathematical Society 66 no. 11 (November 2019), 1690–1692.

It gave me a chance to say a bit—just a tiny bit—about the current state of fundamental physics and the foundations of mathematics.

Quantum Physics and Logic 2019

4 June, 2019

There’s another conference involving applied category theory at Chapman University!

• Quantum Physics and Logic 2019, June 9-14, 2019, Chapman University, Beckman Hall 404. Organized by Matthew Leifer, Lorenzo Catani, Justin Dressel, and Drew Moshier.

The QPL series started out being about quantum programming languages, but it later broadened its scope while keeping the same acronym. This conference series now covers quite a range of topics, including the category-theoretic study of physical systems. My students Kenny Courser, Jade Master and Joe Moeller will be speaking there, and I’ll talk about Kenny’s new work on structured cospans as a tool for studying open systems.


The program is here.

Invited talks

• John Baez (UC Riverside), Structured cospans.

• Anna Pappa (University College London), Classical computing via quantum means.

• Joel Wallman (University of Waterloo), TBA.


• Ana Belen Sainz (Perimeter Institute), Bell nonlocality: correlations from principles.

• Quanlong Wang (University of Oxford) and KangFeng Ng (Radboud University), Completeness of the ZX calculus.

Hidden Symmetries of the Hydrogen Atom

4 April, 2019

Here’s the math colloquium talk I gave at Georgia Tech this week:

Hidden symmetries of the hydrogen atom.

Abstract. A classical particle moving in an inverse square central force, like a planet in the gravitational field of the Sun, moves in orbits that do not precess. This lack of precession, special to the inverse square force, indicates the presence of extra conserved quantities beyond the obvious ones. Thanks to Noether’s theorem, these indicate the presence of extra symmetries. It turns out that not only rotations in 3 dimensions, but also in 4 dimensions, act as symmetries of this system. These extra symmetries are also present in the quantum version of the problem, where they explain some surprising features of the hydrogen atom. The quest to fully understand these symmetries leads to some fascinating mathematical adventures.

I left out a lot of calculations, but someday I want to write a paper where I put them all in. This material is all known, but I feel like explaining it my own way.

In the process of creating the slides and giving the talk, though, I realized there’s a lot I don’t understand yet. Some of it is embarrassingly basic! For example, I give Greg Egan’s nice intuitive argument for how you can get some ‘Runge–Lenz symmetries’ in the 2d Kepler problem. I might as well just quote his article:

• Greg Egan, The ellipse and the atom.

He says:

Now, one way to find orbits with the same energy is by applying a rotation that leaves the sun fixed but repositions the planet. Any ordinary three-dimensional rotation can be used in this way, yielding another orbit with exactly the same shape, but oriented differently.

But there is another transformation we can use to give us a new orbit without changing the total energy. If we grab hold of the planet at either of the points where it’s travelling parallel to the axis of the ellipse, and then swing it along a circular arc centred on the sun, we can reposition it without altering its distance from the sun. But rather than rotating its velocity in the same fashion (as we would do if we wanted to rotate the orbit as a whole) we leave its velocity vector unchanged: its direction, as well as its length, stays the same.

Since we haven’t changed the planet’s distance from the sun, its potential energy is unaltered, and since we haven’t changed its velocity, its kinetic energy is the same. What’s more, since the speed of a planet of a given mass when it’s moving parallel to the axis of its orbit depends only on its total energy, the planet will still be in that state with respect to its new orbit, and so the new orbit’s axis must be parallel to the axis of the original orbit.

Rotations together with these ‘Runge–Lenz transformations’ generate an SO(3) action on the space of elliptical orbits of any given energy. But what’s the most geometrically vivid description of this SO(3) action?

Someone at my talk noted that you could grab the planet at any point of its path, and move to anywhere the same distance from the Sun, while keeping its speed the same, and get a new orbit with the same energy. Are all the SO(3) transformations of this form?

I have a bunch more questions, but this one is the simplest!

Problems with the Standard Model Higgs

25 February, 2019

Here is a conversation I had with Scott Aaronson. It started on his blog, in a discussion about ‘fine-tuning’. Some say the Standard Model of particle physics can’t be the whole story, because in this theory you need to fine-tune the fundamental constants to keep the Higgs mass from becoming huge. Others say this argument is invalid.

I tried to push the conversation toward the calculations actually underlie this argument. Then our conversation drifted into email and got more technical… and perhaps also more interesting, because it led us to contemplate the stability of the vacuum!

You see, if we screwed up royally on our fine-tuning and came up with a theory where the square of the Higgs mass was negative, the vacuum would be unstable. It would instantly decay into a vast explosion of Higgs bosons.

Another possibility, also weird, turns out to be slightly more plausible. This is that the Higgs mass is positive—as it clearly is—and yet the vacuum is ‘metastable’. In this scenario, the vacuum we see around us might last a long time, and yet eventually it could decay through quantum tunnelling to the ‘true’ vacuum, with a lower energy density:

Little bubbles of true vacuum would form, randomly, and then grow very rapidly. This would be the end of life as we know it.

Scott agreed that other people might like to see our conversation. So here it is. I’ll fix a few mistakes, to make me seem smarter than I actually am.

I’ll start with some stuff on his blog.

Scott wrote, in part:

If I said, “supersymmetry basically has to be there because it’s such a beautiful symmetry,” that would be an argument from beauty. But I didn’t say that, and I disagree with anyone who does say it. I made something weaker, what you might call an argument from the explanatory coherence of the world. It merely says that, when we find basic parameters of nature to be precariously balanced against each other, to one part in 1010 or whatever, there’s almost certainly some explanation. It doesn’t say the explanation will be beautiful, it doesn’t say it will be discoverable by an FCC or any other collider, and it doesn’t say it will have a form (like SUSY) that anyone has thought of yet.

John wrote:

Scott wrote:

It merely says that, when we find basic parameters of nature to be precariously balanced against each other, to one part in 1010 or whatever, there’s almost certainly some explanation.

Do you know examples of this sort of situation in particle physics, or is this just a hypothetical situation?

Scott wrote:

To answer a question with a question, do you disagree that that’s the current situation with (for example) the Higgs mass, not to mention the vacuum energy, if one considers everything that could naïvely contribute? A lot of people told me it was, but maybe they lied or I misunderstood them.

John wrote:

The basic rough story is this. We measure the Higgs mass. We can assume that the Standard Model is good up to some energy near the Planck energy, after which it fizzles out for some unspecified reason.

According to the Standard Model, each of the 25 fundamental constants appearing in the Standard Model is a “running coupling constant”. That is, it’s not really a constant, but a function of energy: roughly the energy of the process we use to measure that process. Let’s call these “coupling constants measured at energy E”. Each of these 25 functions is determined by the value of all 25 functions at any fixed energy E – e.g. energy zero, or the Planck energy. This is called the “renormalization group flow”.

So, the Higgs mass we measure is actually the Higgs mass at some energy E quite low compared to the Planck energy.

And, it turns out that to get this measured value of the Higgs mass, the values of some fundamental constants measured at energies near the Planck mass need to almost cancel out. More precisely, some complicated function of them needs to almost but not quite obey some equation.

People summarize the story this way: to get the observed Higgs mass we need to “fine-tune” the fundamental constants’ values as measured near the Planck energy, if we assume the Standard Model is valid up to energies near the Planck energy.

A lot of particle physicists accept this reasoning and additionally assume that fine-tuning the values of fundamental constants as measured near the Planck energy is “bad”. They conclude that it would be “bad” for the Standard Model to be valid up to the Planck energy.

(In the previous paragraph you can replace “bad” with some other word—for example, “implausible”.)

Indeed you can use a refined version of the argument I’m sketching here to say “either the fundamental constants measured at energy E need to obey an identity up to precision ε or the Standard Model must break down before we reach energy E”, where ε gets smaller as E gets bigger.

Then, in theory, you can pick an ε and say “an ε smaller than that would make me very nervous.” Then you can conclude that “if the Standard Model is valid up to energy E, that will make me very nervous”.

(But I honestly don’t know anyone who has approximately computed ε as a function of E. Often people seem content to hand-wave.)

People like to argue about how small an ε should make us nervous, or even whether any value of ε should make us nervous.

But another assumption behind this whole line of reasoning is that the values of fundamental constants as measured at some energy near the Planck energy are “more important” than their values as measured near energy zero, so we should take near-cancellations of these high-energy values seriously—more seriously, I suppose, than near-cancellations at low energies.

Most particle physicists will defend this idea quite passionately. The philosophy seems to be that God designed high-energy physics and left his grad students to work out its consequences at low energies—so if you want to understand physics, you need to focus on high energies.

Scott wrote in email:

Do I remember correctly that it’s actually the square of the Higgs mass (or its value when probed at high energy?) that’s the sum of all these positive and negative high-energy contributions?

John wrote:

Sorry to take a while. I was trying to figure out if that’s a reasonable way to think of things. It’s true that the Higgs mass squared, not the Higgs mass, is what shows up in the Standard Model Lagrangian. This is how scalar fields work.

But I wouldn’t talk about a “sum of positive and negative high-energy contributions”. I’d rather think of all the coupling constants in the Standard Model—all 25 of them—obeying a coupled differential equation that says how they change as we change the energy scale. So, we’ve got a vector field on \mathbb{R}^{25} that says how these coupling constants “flow” as we change the energy scale.

Here’s an equation from a paper that looks at a simplified model:

Here m_h is the Higgs mass, m_t is the mass of the top quark, and both are being treated as functions of a momentum k (essentially the energy scale we’ve been talking about). v is just a number. You’ll note this equation simplifies if we work with the Higgs mass squared, since

m_h dm_h = \frac{1}{2} d(m_h^2)

This is one of a bunch of equations—in principle 25—that say how all the coupling constants change. So, they all affect each other in a complicated way as we change k.

By the way, there’s a lot of discussion of whether the Higgs mass square goes negative at high energies in the Standard Model. Some calculations suggest it does; other people argue otherwise. If it does, this would generally be considered an inconsistency in the whole setup: particles with negative mass squared are tachyons!

I think one could make a lot of progress on these theoretical issues involving the Standard Model if people took them nearly as seriously as string theory or new colliders.

Scott wrote:

So OK, I was misled by the other things I read, and it’s more complicated than m_h^2 being a sum of mostly-canceling contributions (I was pretty sure m_h couldn’t be such a sum, since then a slight change to parameters could make it negative).

Rather, there’s a coupled system of 25 nonlinear differential equations, which one could imagine initializing with the “”true”” high-energy SM parameters, and then solving to find the measured low-energy values. And these coupled differential equations have the property that, if we slightly changed the high-energy input parameters, that could generically have a wild effect on the low-energy outputs, pushing them up to the Planck scale or whatever.

Philosophically, I suppose this doesn’t much change things compared to the naive picture: the question is still, how OK are you with high-energy parameters that need to be precariously tuned to reproduce the low-energy SM, and does that state of affairs demand a new principle to explain it? But it does lead to a different intuition: namely, isn’t this sort of chaos just the generic behavior for nonlinear differential equations? If we fix a solution to such equations at a time t_0, our solution will almost always appear “finely tuned” at a faraway time t_1—tuned to reproduce precisely the behavior at t_0 that we fixed previously! Why shouldn’t we imagine that God fixed the values of the SM constants for the low-energy world, and they are what they are at high energies simply because that’s what you get when you RG-flow to there?

I confess I’d never heard the speculation that m_h^2 could go negative at sufficiently high energies (!!). If I haven’t yet abused your generosity enough: what sort of energies are we talking about, compared to the Planck energy?

John wrote:

Scott wrote:

Rather, there’s a coupled system of 25 nonlinear differential equations, which one could imagine initializing with the “”true”” high-energy SM parameters, and then solving to find the measured low-energy values. And these coupled differential equations have the property that, if we slightly changed the high-energy input parameters, that could generically have a wild effect on the low-energy outputs, pushing them up to the Planck scale or whatever.


Philosophically, I suppose this doesn’t much change things compared to the naive picture: the question is still, how OK are you with high-energy parameters that need to be precariously tuned to reproduce the low-energy SM, and does that state of affairs demand a new principle to explain it? But it does lead to a different intuition: namely, isn’t this sort of chaos just the generic behavior for nonlinear differential equations?

Yes it is, generically.

Physicists are especially interested in theories that have “ultraviolet fixed points”—by which they usually mean values of the parameters that are fixed under the renormalization group flow and attractive as we keep increasing the energy scale. The idea is that these theories seem likely to make sense at arbitrarily high energy scales. For example, pure Yang-Mills fields are believed to be “asymptotically free”—the coupling constant measuring the strength of the force goes to zero as the energy scale gets higher.

But attractive ultraviolet fixed points are going to be repulsive as we reverse the direction of the flow and see what happens as we lower the energy scale.

So what gives? Are all ultraviolet fixed points giving theories that require “fine-tuning” to get the parameters we observe at low energies? Is this bad?

Well, they’re not all the same. For theories considered nice, the parameters change logarithmically as we change the energy scale. This is considered to be a mild change. The Standard Model with Higgs may not have an ultraviolet fixed point, but people usually worry about something else: the Higgs mass changes quadratically with the energy scale. This is related to the square of the Higgs mass being the really important parameter… if we used that, I’d say linearly.

I think there’s a lot of mythology and intuitive reasoning surrounding this whole subject—probably the real experts could say a lot about it, but they are few, and a lot of people just repeat what they’ve been told, rather uncritically.

If we fix a solution to such equations at a time t_0, our solution will almost always appear “finely tuned” at a faraway time t_1—tuned to reproduce precisely the behavior at t_0 that we fixed previously! Why shouldn’t we imagine that God fixed the values of the SM constants for the low-energy world, and they are what they are at high energies simply because that’s what you get when you RG-flow to there?

This is something I can imagine Sabine Hossenfelder saying.

I confess I’d never heard the speculation that m_h^2 could go negative at sufficiently high energies (!!). If I haven’t yet abused your generosity enough: what sort of energies are we talking about, compared to the Planck energy?

The experts are still arguing about this; I don’t really know. To show how weird all this stuff is, there’s a review article from 2013 called “The top quark and Higgs boson masses and the stability of the electroweak vacuum”, which doesn’t look crackpotty to me, that argues that the vacuum state of the universe is stable if the Higgs mass and top quark are in the green region, but only metastable otherwise:

The big ellipse is where the parameters were expected to lie in 2012 when the paper was first written. The smaller ellipses only indicate the size of the uncertainty expected after later colliders made more progress. You shouldn’t take them too seriously: they could be centered in the stable region or the metastable region.

An appendix give an update, which looks like this:

The paper says:

one sees that the central value of the top mass lies almost exactly on the boundary between vacuum stability and metastability. The uncertainty on the top quark mass is nevertheless presently too large to clearly discriminate between these two possibilities.

Then John wrote:

By the way, another paper analyzing problems with the Standard Model says:

It has been shown that higher dimension operators may change the lifetime of the metastable vacuum, \tau, from

\tau = 1.49 \times 10^{714} T_U


\tau =5.45 \times 10^{-212} T_U

where T_U is the age of the Universe.

In other words, the calculations are not very reliable yet.

And then John wrote:

Sorry to keep spamming you, but since some of my last few comments didn’t make much sense, even to me, I did some more reading. It seems the best current conventional wisdom is this:

Assuming the Standard Model is valid up to the Planck energy, you can tune parameters near the Planck energy to get the observed parameters down here at low energies. So of course the the Higgs mass down here is positive.

But, due to higher-order effects, the potential for the Higgs field no longer looks like the classic “Mexican hat” described by a polynomial of degree 4:

with the observed Higgs field sitting at one of the global minima.

Instead, it’s described by a more complicated function, like a polynomial of degree 6 or more. And this means that the minimum where the Higgs field is sitting may only be a local minimum:

In the left-hand scenario we’re at a global minimum and everything is fine. In the right-hand scenario we’re not and the vacuum we see is only metastable. The Higgs mass is still positive: that’s essentially the curvature of the potential near our local minimum. But the universe will eventually tunnel through the potential barrier and we’ll all die.

Yes, that seems to be the conventional wisdom! Obviously they’re keeping it hush-hush to prevent panic.

This paper has tons of relevant references:

• Tommi Markkanen, Arttu Rajantie, Stephen Stopyra, Cosmological aspects of Higgs vacuum metastability.

Abstract. The current central experimental values of the parameters of the Standard Model give rise to a striking conclusion: metastability of the electroweak vacuum is favoured over absolute stability. A metastable vacuum for the Higgs boson implies that it is possible, and in fact inevitable, that a vacuum decay takes place with catastrophic consequences for the Universe. The metastability of the Higgs vacuum is especially significant for cosmology, because there are many mechanisms that could have triggered the decay of the electroweak vacuum in the early Universe. We present a comprehensive review of the implications from Higgs vacuum metastability for cosmology along with a pedagogical discussion of the related theoretical topics, including renormalization group improvement, quantum field theory in curved spacetime and vacuum decay in field theory.

Scott wrote:

Once again, thank you so much! This is enlightening.

If you’d like other people to benefit from it, I’m totally up for you making it into a post on Azimuth, quoting from my emails as much or as little as you want. Or you could post it on that comment thread on my blog (which is still open), or I’d be willing to make it into a guest post (though that might need to wait till next week).

I guess my one other question is: what happens to this RG flow when you go to the infrared extreme? Is it believed, or known, that the “low-energy” values of the 25 Standard Model parameters are simply fixed points in the IR? Or could any of them run to strange values there as well?

I don’t really know the answer to that, so I’ll stop here.

But in case you’re worrying now that it’s “possible, and in fact inevitable, that a vacuum decay takes place with catastrophic consequences for the Universe”, relax! These calculations are very hard to do correctly. All existing work uses a lot of approximations that I don’t completely trust. Furthermore, they are assuming that the Standard Model is valid up to very high energies without any corrections due to new, yet-unseen particles!

So, while I think it’s a great challenge to get these calculations right, and to measure the Standard Model parameters accurately enough to do them right, I am not very worried about the Universe being taken over by a rapidly expanding bubble of ‘true vacuum’.