Information Geometry (Part 18)

5 August, 2021

Last time I sketched how two related forms of geometry, symplectic and contact geometry, show up in thermodynamics. Today I want to explain how they show up in probability theory.

For some reason I haven’t seen much discussion of this! But people should have looked into this. After all, statistical mechanics explains thermodynamics in terms of probability theory, so if some mathematical structure shows up in thermodynamics it should appear in statistical mechanics… and thus ultimately in probability theory.

I just figured out how this works for symplectic and contact geometry.

Suppose a system has n possible states. We’ll call these microstates, following the tradition in statistical mechanics. If you don’t know what ‘microstate’ means, don’t worry about it! But the rough idea is that if you have a macroscopic system like a rock, the precise details of what its atoms are doing are described by a microstate, and many different microstates could be indistinguishable unless you look very carefully.

We’ll call the microstates 1, 2, \dots, n. So, if you don’t want to think about physics, when I say microstate I’ll just mean an integer from 1 to n.

Next, a probability distribution q assigns a real number q_i to each microstate, and these numbers must sum to 1 and be nonnegative. So, we have q \in \mathbb{R}^n, though not every vector in \mathbb{R}^n is a probability distribution.

I’m sure you’re wondering why I’m using q rather than p to stand for an observable instead of a probability distribution. Am I just trying to confuse you?

No: I’m trying to set up an analogy to physics!

Last time I introduced symplectic geometry using classical mechanics. The most important example of a symplectic manifold is the cotangent bundle T^\ast Q of a manifold Q. A point of T^\ast Q is a pair (q,p) consisting of a point q \in Q and a cotangent vector p \in T^\ast_q Q. In classical mechanics the point q describes the position of some physical system, while p describes its momentum.

So, I’m going to set up an analogy like this:

 Classical Mechanics  Probability Theory
  q   position   probability distribution  
  p   momentum ???

But what is to momentum as probability is to position?

A big clue is the appearance of symplectic geometry in thermodynamics, which I also outlined last time. We can use this to get some intuition about the analogue of momentum in probability theory.

In thermodynamics, a system has a manifold Q of states. (These are not the ‘microstates’ I mentioned before: we’ll see the relation later.) There is a function

f \colon Q \to \mathbb{R}

describing the entropy of the system as a function of its state. There is a law of thermodynamics saying that

p = (df)_q

This equation picks out a submanifold of T^\ast Q, namely

\Lambda = \{(q,p) \in T^\ast Q : \; p = (df)_q \}

Moreover this submanifold is Lagrangian: the symplectic structure \omega vanishes when restricted to it:

\displaystyle{ \omega |_\Lambda = 0 }

This is very beautiful, but it goes by so fast you might almost miss it! So let’s clutter it up a bit with coordinates. We often use local coordinates on Q and describe a point q \in Q using these coordinates, getting a point

(q_1, \dots, q_n) \in \mathbb{R}^n

They give rise to local coordinates q_1, \dots, q_n, p_1, \dots, p_n on the cotangent bundle T^\ast Q. The q_i are called extensive variables, because they are typically things that you can measure only by totalling up something over the whole system, like the energy or volume of a cylinder of gas. The p_i are called intensive variables, because they are typically things that you can measure locally at any point, like temperature or pressure.

In these local coordinates, the symplectic structure on T^\ast Q is the 2-form given by

\omega = dp_1 \wedge dq_1 + \cdots + dp_n \wedge dq_n

The equation

p = (df)_q

serves as a law of physics that determines the intensive variables given the extensive ones when our system is in thermodynamic equilibrium. Written out using coordinates, this law says

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

It looks pretty bland here, but in fact it gives formulas for the temperature and pressure of a gas, and many other useful formulas in thermodynamics.

Now we are ready to see how all this plays out in probability theory! We’ll get an analogy like this, which goes hand-in-hand with our earlier one:

 Thermodynamics   Probability Theory 
  q   extensive variables   probability distribution  
  p   intensive variables ???

This analogy is clearer than the last, because statistical mechanics reveals that the extensive variables in thermodynamics are really just summaries of probability distributions on microstates. Furthermore, both thermodynamics and probability theory have a concept of entropy.

So, let’s take our manifold Q to consist of probability distributions on the set of microstates I was talking about before: the set \{1, \dots, n\}. Actually, let’s use nowhere vanishing probability distributions:

\displaystyle{ Q = \{ q \in \mathbb{R}^n : \; q_i > 0, \; \sum_{i=1}^n q_i = 1 \} }

I’m requiring q_i > 0 to ensure Q is a manifold, and also to make sure f is differentiable: it ceases to be differentiable when one of the probabilities q_i hits zero.

Since Q is a manifold, its cotangent bundle is a symplectic manifold T^\ast Q. And here’s the good news: we have a god-given entropy function

f \colon Q \to \mathbb{R}

namely the Shannon entropy

\displaystyle{ f(q) = - \sum_{i = 1}^n q_i \ln q_i }

So, everything I just described about thermodynamics works in the setting of plain old probability theory! Starting from our manifold Q and the entropy function, we get all the rest, leading up to the Lagrangian submanifold

\Lambda = \{(q,p) \in T^\ast Q : \; p = (df)_q \}

that describes the relation between extensive and intensive variables.

For computations it helps to pick coordinates on Q. Since the probabilities q_1, \dots, q_n sum to 1, they aren’t independent coordinates on Q. So, we can either pick all but one of them as coordinates, or learn how to deal with non-independent coordinates, which are already completely standard in projective geometry. Let’s do the former, just to keep things simple.

These coordinates on Q give rise in the usual way to coordinates q_i and p_i on the cotangent bundle T^\ast Q. These play the role of extensive and intensive variables, respectively, and it should be very interesting to impose the equation

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

where f is the Shannon entropy. This picks out a Lagrangian submanifold \Lambda \subseteq T^\ast Q.

So, the question becomes: what does this mean? If this formula gives the analogue of momentum for probability theory, what does this analogue of momentum mean?

Here’s a preliminary answer: p_i says how fast entropy increases as we increase the probability q_i that our system is in the ith microstate. So if we think of nature as ‘wanting’ to maximize entropy, the quantity p_i says how eager it is to increase the probability q_i.

Indeed, you can think of p_i as a bit like pressure—one of the most famous intensive quantities in thermodynamics. A gas ‘wants’ to expand, and its pressure says precisely how eager it is to expand. Similarly, a probability distribution ‘wants’ to flatten out, to maximize entropy, and p_i says how eager it is to increase the probability q_i in order to do this.

But what can we do with this concept? And what does symplectic geometry do for probability theory?

I will start tackling these questions next time.

One thing I’ll show is that when we reduce thermodynamics to probability theory using the ideas of statistical mechanics, the appearance of symplectic geometry in thermodynamics follows from its appearance in probability theory.

Another thing I want to investigate is how other geometrical structures on the space of probability distributions, like the Fisher information metric, interact with the symplectic structure on its cotangent bundle. This will integrate symplectic geometry and information geometry.

I also want to bring contact geometry into the picture. It’s already easy to see from our work last time how this should go. We treat the entropy S as an independent variable, and replace T^\ast Q with a larger manifold T^\ast Q \times \mathbb{R} having S as an extra coordinate. This is a contact manifold with contact form

\alpha = -dS + p_1 dq_i + \cdots + p_n dq_n

This contact manifold has a submanifold \Sigma where we remember that entropy is a function of the probability distribution q, and define p in terms of q as usual:

\Sigma = \{(q,p,S) \in T^\ast Q \times \mathbb{R} : \; S = f(q), \; p = (df)_q \}

And as we saw last time, \Sigma is a Legendrian submanifold, meaning

\displaystyle{ \alpha|_{\Sigma} = 0 }

But again, we want to understand what these ideas from contact geometry really do for probability theory!

For all my old posts on information geometry, go here:

Information geometry.

Information Geometry (Part 17)

27 July, 2021

I’m getting back into information geometry, which is the geometry of the space of probability distributions, studied using tools from information theory. I’ve written a bunch about it already, which you can see here:

Information geometry.

Now I’m fascinated by something new: how symplectic geometry and contact geometry show up in information geometry. But before I say anything about this, let me say a bit about how they show up in thermodynamics. This is more widely discussed, and it’s a good starting point.

Symplectic geometry was born as the geometry of phase space in classical mechanics: that is, the space of possible positions and momenta of a classical system. The simplest example of a symplectic manifold is the vector space \mathbb{R}^{2n}, with n position coordinates q_i and n momentum coordinates p_i.

It turns out that symplectic manifolds are always even-dimensional, because we can always cover them with coordinate charts that look like \mathbb{R}^{2n}. When we change coordinates, it turns out that the splitting of coordinates into positions and momenta is somewhat arbitrary. For example, the position of a rock on a spring now may determine its momentum a while later, and vice versa. What’s not arbitrary? It’s the so-called ‘symplectic structure’:

\omega = dp_1 \wedge dq_1 + \cdots + dp_n \wedge dq_n

While far from obvious at first, we know by now that the symplectic structure is exactly what needs to be preserved under valid changes of coordinates in classical mechanics! In fact, we can develop the whole formalism of classical mechanics starting from a manifold with a symplectic structure.

Symplectic geometry also shows up in thermodynamics. In thermodynamics we can start with a system in equilibrium whose state is described by some variables q_1, \dots, q_n. Its entropy will be a function of these variables, say

S = f(q_1, \dots, q_n)

We can then take the partial derivatives of entropy and call them something:

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

These new variables p_i are said to be ‘conjugate’ to the q_i, and they turn out to be very interesting. For example, if q_i is energy then p_i is ‘coolness’: the reciprocal of temperature. The coolness of a system is its change in entropy per change in energy.

Often the variables q_i are ‘extensive’: that is, you can measure them only by looking at your whole system and totaling up some quantity. Examples are energy and volume. Then the new variables p_i are ‘intensive’: that is, you can measure them at any one location in your system. Examples are coolness and pressure.

Now for a twist: sometimes we do not know the function f ahead of time. Then we cannot define the p_i as above. We’re forced into a different approach where we treat them as independent quantities, at least until someone tells us what f is.

In this approach, we start with a space \mathbb{R}^{2n} having n coordinates called q_i and n coordinates called p_i. This is a symplectic manifold, with the symplectic struture \omega described earlier!

But what about the entropy? We don’t yet know what it is as a function of the q_i, but we may still want to talk about it. So, we build a space \mathbb{R}^{2n+1} having one extra coordinate S in addition to the q_i and p_i. This new coordinate stands for entropy. And this new space has an important 1-form on it:

\alpha = -dS + p_1 dq_i + \cdots + p_n dq_n

This is called the ‘contact 1-form’.

This makes \mathbb{R}^{2n+1} into an example of a ‘contact manifold’. Contact geometry is the odd-dimensional partner of symplectic geometry. Just as symplectic manifolds are always even-dimensional, contact manifolds are always odd-dimensional.

What is the point of the contact 1-form? Well, suppose someone tells us the function f relating entropy to the coordinates q_i. Now we know that we want

S = f

and also

\displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

So, we can impose these equations, which pick out a subset of \mathbb{R}^{2n+1}. You can check that this subset, say \Sigma, is an n-dimensional submanifold. But even better, the contact 1-form vanishes when restricted to this submanifold:

\left.\alpha\right|_\Sigma = 0

Let’s see why! Suppose x \in \Sigma and suppose v \in T_x \Sigma is a vector tangent to \Sigma at this point x. It suffices to show

\alpha(v) = 0

Using the definition of \alpha this equation says

\displaystyle{ -dS(v) + \sum_i p_i dq_i(v) = 0 }

But on the surface \Sigma we have

S = f, \qquad  \displaystyle{ p_i = \frac{\partial f}{\partial q_i} }

So, the equation we’re trying to show can be written as

\displaystyle{ -df(v) + \sum_i \frac{\partial f}{\partial q_i} dq_i(v) = 0 }

But this follows from

\displaystyle{ df = \sum_i \frac{\partial f}{\partial q_i} dq_i }

which holds because f is a function only of the coordinates q_i.

So, any formula for entropy S = f(q_1, \dots, q_n) picks out a so-called ‘Legendrian submanifold’ of \mathbb{R}^{2n+1}: that is, an n-dimensional submanifold such that the contact 1-form vanishes when restricted to this submanifold. And the idea is that this submanifold tells you everything you need to know about a thermodynamic system.

Indeed, V. I. Arnol’d says this was implicitly known to the great founder of statistical mechanics, Josiah Willard Gibbs. Arnol’d calls \mathbb{R}^5 with coordinates energy, entropy, temperature, pressure and volume the ‘Gibbs manifold’, and he proclaims:

Gibbs’ thesis: substances are Legendrian submanifolds of the Gibbs manifold.

This is from here:

• V. I. Arnol’d, Contact geometry: the geometrical method of Gibbs’ thermodynamics, Proceedings of the Gibbs Symposium (New Haven, CT, 1989), AMS, Providence, Rhode Island, 1990.

A bit more detail

Now I want to say everything again, with a bit of extra detail, assuming more familiarity with manifolds. Above I was using \mathbb{R}^n with coordinates q_1, \dots, q_n to describe the ‘extensive’ variables of a thermodynamic system. But let’s be a bit more general and use any smooth n-dimensional manifold Q. Even if Q is a vector space, this viewpoint is nice because it’s manifestly coordinate-independent!

So: starting from Q we build the cotangent bundle T^\ast Q. A point in cotangent describes both extensive variables, namely q \in Q, and ‘intensive’ variables, namely a cotangent vector p \in T^\ast_q Q.

The manifold T^\ast Q has a 1-form \theta on it called the tautological 1-form. We can describe it as follows. Given a tangent vector v \in T_{(q,p)} T^\ast Q we have to say what \theta(v) is. Using the projection

\pi \colon T^\ast Q \to Q

we can project v down to a tangent vector d\pi(v) at the point q. But the 1-form p eats tangent vectors at q and spits out numbers! So, we set

\theta(v) = p(d\pi(v))

This is sort of mind-boggling at first, but it’s worth pondering until it makes sense. It helps to work out what \theta looks like in local coordinates. Starting with any local coordinates q_i on an open set of Q, we get local coordinates q_i, p_i on the cotangent bundle of this open set in the usual way. On this open set you then get

\theta = p_1 dq_1 + \cdots + p_n dq_n

This is a standard calculation, which is really worth doing!

It follows that we can define a symplectic structure \omega by

\omega = d \theta

and get this formula in local coordinates:

\omega = dp_1 \wedge dq_1 + \cdots + dp_n \wedge dq_n

Now, suppose we choose a smooth function

f \colon Q \to \mathbb{R}

which describes the entropy. We get a 1-form df, which we can think of as a map

df \colon Q \to T^\ast Q

assigning to each choice q of extensive variables the pair (q,p) of extensive and intensive variables where

p = df_q

The image of the map df is a ‘Lagrangian submanifold‘ of T^\ast Q: that is, an n-dimensional submanifold \Lambda such that

\left.\omega\right|_{\Lambda} = 0

Lagrangian submanifolds are to symplectic geometry as Legendrian submanifolds are to contact geometry! What we’re seeing here is that if Gibbs had preferred symplectic geometry, he could have described substances as Lagrangian submanifolds rather than Legendrian submanifolds. But this approach would only keep track of the derivatives of entropy, df, not the actual value of the entropy function f.

If we prefer to keep track of the actual value of f using contact geometry, we can do that. For this we add an extra dimension to T^\ast Q and form the manifold T^\ast Q \times \mathbb{R}. The extra dimension represents entropy, so we’ll use S as our name for the coordinate on \mathbb{R}.

We can make T^\ast Q \times \mathbb{R} into a contact manifold with contact 1-form

\alpha = -d S + \theta

In local coordinates we get

\alpha = -dS + p_1 dq_i + \cdots + p_n dq_n

just as we had earlier. And just as before, if we choose a smooth function f \colon Q \to \mathbb{R} describing entropy, the subset

\Sigma = \{(q,p,S) \in T^\ast Q \times \mathbb{R} : \; S = f(q), \; p = df_q \}

is a Legendrian submanifold of T^\ast Q \times \mathbb{R}.

Okay, this concludes my lightning review of symplectic and contact geometry in thermodynamics! Next time I’ll talk about something a bit less well understood: how they show up in statistical mechanics.

Thermodynamics and Economic Equilibrium

18 July, 2021

I’m having another round of studying thermodynamics, and I’m running into more interesting leads than I can keep up with. Like this paper:

• Eric Smith and Duncan K. Foley, Classical thermodynamics and economic general equilibrium theory, Journal of Economic Dynamics and Control 32 (2008) 7–65.

I’ve always been curious about the connection between economics and thermodynamics, but I know too little about economics to make this easy to explore. There are people who work on subjects called thermoeconomics and econophysics, but classical economists consider them ‘heterodox’. While I don’t trust classical economists to be right about things, I should probably learn more classical economics before I jump into the fray.

Still, the introduction of this paper is intriguing:

The relation between economic and physical (particularly thermodynamic) concepts of equilibrium has been a topic of recurrent interest throughout the development of neoclassical economic theory. As systems for defining equilibria, proving their existence, and computing their properties, neoclassical economics (Mas-Collel et al., 1995; Varian, 1992) and classical thermodynamics (Fermi, 1956) undeniably have numerous formal and methodological similarities. Both fields seek to describe system phenomena in terms of solutions to constrained optimization problems. Both rely on dual representations of interacting subsystems: the state of each subsystem is represented by pairs of variables, one variable from each pair characterizing the subsystem’s content, and the other characterizing the way it interacts with other subsystems. In physics the content variables are quantities like asubsystem’s total energy or the volume in space it occupies; in economics they area mounts of various commodities held by agents. In physics the interaction variables are quantities like temperature and pressure that can be measured on the system boundaries; in economics they are prices that can be measured by an agent’s willingness to trade one commodity for another.

In thermodynamics these pairs are called conjugate variables. The ‘content variables’ are usually called extensive and the ‘interaction variables’ are usually called intensive. A vector space with conjugate pairs of variables as coordinates is a symplectic vector space, and I’ve written about how these show up in the category-theoretic approach to open systems:

• John Baez, A compositional framework for passive linear networks, Azimuth, 28 April 2015.

Continuing on:

The significance attached to these similarities has changed considerably, however, in the time from the first mathematical formulation of utility (Walras, 1909) to the full axiomatization of general equilibrium theory (Debreu, 1987). Léon Walras appears (Mirowski, 1989) to have conceptualized economic equilibrium as a balance of the gradients of utilities, more for the sake of similarity to the concept of force balance in mechanics, than to account for any observations about the outcomes of trade. Fisher (1892) (a student of J. Willard Gibbs) attempted to update Walrasian metaphors from mechanics to thermodynamics, but retained Walras’s program of seeking an explicit parallelism between physics and economics.

This Fisher is not the geneticist and statistician Ronald Fisher who came up with Fisher’s fundamental theorem. It’s the author of this thesis:

• Irving Fisher, Mathematical Investigations in the Theory of Value and Prices, Ph.D. thesis, Yale University, 1892.

Continuing on with Smith and Foley’s paper:

As mathematical economics has become more sophisticated (Debreu, 1987) the naive parallelism of Walras and Fisher has progressively been abandoned, and with it the sense that it matters whether neoclassical economics resembles any branch of physics. The cardinalization of utility that Walras thought of as a counterpart to energy has been discarded, apparently removing the possibility of comparing utility with any empirically measurable quantity. A long history of logically inconsistent (or simply unproductive) analogy making (see Section 7.2) has further caused the topic of parallels to fall out of favor. Samuelson (1960) summarizes well the current view among many economists, at the end of one of the few methodologically sound analyses of the parallel roles of dual representation in economics and physics:

The formal mathematical analogy between classical thermodynamics and mathematic economic systems has now been explored. This does not warrant the commonly met attempt to find more exact analogies of physical magnitudes—such as entropy or energy—in the economic realm. Why should there be laws like the first or second laws of thermodynamics holding in the economic realm? Why should ‘utility’ be literally identified with entropy, energy, or anything else? Why should a failure to make such a successful identification lead anyone to overlook or deny the mathematical isomorphism that does exist between minimum systems that arise in different disciplines?

The view that neoclassical economics is now mathematically mature, and that it is mere coincidence and no longer relevant whether it overlaps with any body of physical theory, is reflected in the complete omission of the topic of parallels from contemporary graduate texts (Mas-Collel et al., 1995). We argue here that, despite its long history of discussion, there are important insights still to be gleaned from considering the relation of neoclassical economics to classical thermodynamics. The new results concerning this relation we present here have significant implications, both for the interpretation of economic theory and for econometrics. The most important point of this paper (more important than the establishment of formal parallels between thermodynamics and utility economics) is that economics, because it does not recognize an equation of state or define prices intrinsically in terms of equilibrium, lacks the close relation between measurement and theory physical thermodynamics enjoys.

Luckily, the paper seems to be serious about explaining economics to those who know thermodynamics (and maybe vice versa). So, I will now read the rest of the paper—or at least skim it.

One interesting simple point seems to be this: there’s an analogy between entropy maximization and utility maximization, but it’s limited by the following difference.

In classical thermodynamics the total entropy of a closed system made of subsystems is the sum of the entropies of the parts. While the second law forbids the system from moving to a state to a state of lower total entropy, the entropies of some parts can decrease.

By contrast, in classical economics the total utility of a collection of agents is an unimportant quantity: what matters is the utility of each individual agent. The reason is that we assume the agents will voluntarily move from one state to another only if the utility of each agent separately increases. Furthermore, if we believe we can reparametrize the utility of each agent without changing anything, it makes no sense to add utilities.

(On the other hand, some utilitarian ethicists seem to believe it makes sense to add utilities and try to maximize the total. I imagine that libertarians would consider this ‘totalitarian’ approach morally unacceptable. I’m even less eager to enter discussions of the foundations of ethics than of economics, but it’s interesting how the question of whether a quantity can or ‘should’ be totaled up and then maximized plays a role in this debate.)

The Ideal Monatomic Gas

15 July, 2021

Today at the Topos Institute, Sophie Libkind, Owen Lynch and I spent some time talking about thermodynamics, Carnot engines and the like. As a result, I want to work out for myself some basic facts about the ideal gas. This stuff is all well-known, but I’m having trouble finding exactly what I want—and no more, thank you—collected in one place.

Just for background, the Carnot cycle looks roughly like this:

This is actually a very inaccurate picture, but it gets the point across. We have a container of gas, and we make it execute a cyclic motion, so its pressure P and volume V trace out a loop in the plane. As you can see, this loop consists of four curves:

• In the first, from a to b, we put a container of gas in contact with a hot medium. Then we make it undergo isothermal expansion: that is, expansion at a constant temperature.

• In the second, from b to c, we insulate the container and let the gas undergo adiabatic reversible expansion: that is, expansion while no heat enters or leaves. The temperature drops, but merely because the container expands, not because heat leaves. It reaches a lower temperature. Then we remove the insulation.

• In the third, from c to d, we put the container in contact with a cold medium that matches its temperature. Then we make it undergo isothermal contraction: that is, contraction at a constant temperature.

• In the fourth, from d to a, we insulate the container and let the gas undergo adiabatic reversible contraction: that is, contraction while no heat enters or leaves. The temperature increases until it matches that of the hot medium. Then we remove the insulation.

The Carnot cycle is historically important because it’s an example of a heat engine that’s as efficient as possible: it give you the most work possible for the given amount of heat transferred from the hot medium to the cold medium. But I don’t want to get into that. I just want to figure out formulas for everything that’s going on here—including formulas for the four curves in this picture!

To get specific formulas, I’ll consider an ideal monatomic gas, meaning a gas made of individual atoms, like helium. Some features of an ideal gas, like the formula for energy as a function of temperature, depend on whether it’s monatomic.

As a quirky added bonus, I’d like to highlight how certain properties of the ideal monatomic gas depend on the dimension of space. There’s a certain chunk of the theory that doesn’t depend on the dimension of space, as long as you interpret ‘volume’ to mean the n-dimensional analogue of volume. But the number 3 shows up in the formula for the energy of the ideal monatomic gas. And this is because space is 3-dimensional! So just for fun, I’ll do the whole analysis in n dimensions.

There are four basic formulas we need to know.

First, we have the ideal gas law:

PV = NkT


P is the pressure.
V is the n-dimensional volume.
N is the number of molecules in a container of gas.
k is a constant called Boltzmann’s constant.
T is the temperature.

Second, we have a formula for the energy, or more precisely the internal energy, of a monatomic ideal gas:

U = \frac{n}{2} NkT


U is the internal energy.
n is the dimension of space.

The factor of n/2 shows up thanks to the equipartition theorem: classically, a harmonic oscillator at temperature T has expected energy equal to kT times its number of degrees of freedom. Very roughly, the point is that in n dimensions there are n different directions in which an atom can move around.

Third, we have a relation between internal energy, work and heat:

dU = \delta W + \delta Q


dU is the differential of internal energy.
\delta W is the infinitesimal work done to the gas.
\delta Q is the infinitesimal heat transferred to the gas.

The intuition is simple: to increase the energy of some gas you can do work to it or transfer heat to it. But the math may seem a bit murky, so let me explain.

I emphasize ‘to’ because it affects the sign: for example, the work done by the gas is minus the work done to the gas. Work done to the gas increases its internal energy, while work done by it reduces its internal energy. Similarly for heat.

But what is this ‘infinitesimal’ stuff, and these weird \delta symbols?

In a minute I’m going to express everything in terms of P and V. So, T, N and U will be functions on the plane with coordinates P and V. dU will be a 1-form on this plane: it’s the differential of the function U.

But \delta W and \delta Q are not differentials of functions W and Q. There are no functions on the plane called W and Q. You can not take a box of gas and measure its work, or heat! There are just 1-forms called \delta W and \delta Q describing the change in work or heat. These are not exact 1-forms: that is, they’re not differentials of functions.

Fourth and finally:

\delta W = - P dV

This should be intuitive. The work done by the gas on the outside world by changing its volume a little equals the pressure times the change in volume. So, the work done to the gas is minus the pressure times the change in volume.

One nice feature of the 1-form \delta W = -P d V is this: as we integrate it around a simple closed curve going counterclockwise, we get the area enclosed by that curve. So, the area of this region:

is the work done by our container of gas during the Carnot cycle. (There are a lot of minus signs to worry about here, but don’t worry, I’ve got them under control. Our curve is going clockwise, so the work done to our container of gas is negative, and it’s minus the area in the region.)

Okay, now that we have our four basic equations, we can play with them and derive consequences. Let’s suppose the number N of atoms in our container of gas is fixed—a constant. Then we think of everything as a function of two variables: P and V.

First, since PV = NkT we have

\displaystyle{ T = \frac{PV}{Nk} }

So temperature is proportional to pressure times volume.

Second, since PV = NkT and U = \frac{n}{2}NkT we have

U = \frac{n}{2} P V

So, like the temperature, the internal energy of the gas is proportional to pressure times volume—but it depends on the dimension of space!

From this we get

dU = \frac{n}{2} d(PV) = \frac{n}{2}( V dP + P dV)

From this and our formulas dU = \delta W + \delta Q, \delta W = -PdV we get

\begin{array}{ccl}  \delta Q &=& dU - \delta W \\  \\  &=& \frac{n}{2}( V dP + P dV) + P dV \\ \\  &=& \frac{n}{2} V dP + \frac{n+2}{2} P dV   \end{array}

That’s basically it!

But now we know how to figure out everything about the Carnot cycle. I won’t do it all here, but I’ll work out formulas for the curves in this cycle:

The isothermal curves are easy, since we’ve seen temperature is proportional to pressure times volume:

\displaystyle{ T = \frac{PV}{Nk} }

So, an isothermal curve is any curve with

P \propto V^{-1}

The adiabatic reversible curves, or ‘adiabats’ for short, are a lot more interesting. A curve C in the P  V plane is an adiabat if when the container of gas changes pressure and volume while moving along this curve, no heat gets transferred to or from the gas. That is:

\delta Q \Big|_C = 0

where the funny symbol means I’m restricting a 1-form to the curve and getting a 1-form on that curve (which happens to be zero).

Let’s figure out what an adiabat looks like! By our formula for Q we have

(\frac{n}{2} V dP + \frac{n+2}{2} P dV) \Big|_C = 0


\frac{n}{2} V dP \Big|_C = -\frac{n+2}{2} P dV \Big|_C


\frac{dP}{P} \Big|_C = - \frac{n+2}{n} \frac{dV}{V}\Big|_C

Now, we can integrate both sides along a portion of the curve C and get

\ln P = - \frac{n+2}{n} \ln V + \mathrm{constant}


P \propto V^{-(n+2)/n}

So in 3-dimensional space, as you let a gas expand adiabatically—say by putting it in an insulated cylinder so heat can’t get in or out—its pressure drops as its volume increases. But for a monatomic gas it drops in this peculiar specific way: the pressure goes like the volume to the -5/3 power.

In any dimension, the pressure of the monatomic gas drops more steeply when the container expands adiabatically than when it expands at constant temperature. Why? Because V^{-(n+2)/n} drops more rapidly than V^{-1} since

\frac{n+2}{n} > 1

But as n \to \infty,

\frac{n+2}{n} \to 1

so the adiabats become closer and and closer to the isothermal curves in high dimensions. This is not important for understanding the conceptually significant features of the Carnot cycle! But it’s curious, and I’d like to improve my understanding by thinking about it until it seems obvious. It doesn’t yet.

Nonequilibrium Thermodynamics in Biology (Part 2)

16 June, 2021

Larry Li, Bill Cannon and I ran a session on non-equilibrium thermodynamics in biology at SMB2021, the annual meeting of the Society for Mathematical Biology. You can see talk slides here!

Here’s the basic idea:

Since Lotka, physical scientists have argued that living things belong to a class of complex and orderly systems that exist not despite the second law of thermodynamics, but because of it. Life and evolution, through natural selection of dissipative structures, are based on non-equilibrium thermodynamics. The challenge is to develop an understanding of what the respective physical laws can tell us about flows of energy and matter in living systems, and about growth, death and selection. This session addresses current challenges including understanding emergence, regulation and control across scales, and entropy production, from metabolism in microbes to evolving ecosystems.

Click on the links to see slides for most of the talks:

Persistence, permanence, and global stability in reaction network models: some results inspired by thermodynamic principles
Gheorghe Craciun, University of Wisconsin–Madison

The standard mathematical model for the dynamics of concentrations in biochemical networks is called mass-action kinetics. We describe mass-action kinetics and discuss the connection between special classes of mass-action systems (such as detailed balanced and complex balanced systems) and the Boltzmann equation. We also discuss the connection between the ‘global attractor conjecture’ for complex balanced mass-action systems and Boltzmann’s H-theorem. We also describe some implications for biochemical mechanisms that implement noise filtering and cellular homeostasis.

The principle of maximum caliber of nonequilibria
Ken Dill, Stony Brook University

Maximum Caliber is a principle for inferring pathways and rate distributions of kinetic processes. The structure and foundations of MaxCal are much like those of Maximum Entropy for static distributions. We have explored how MaxCal may serve as a general variational principle for nonequilibrium statistical physics—giving well-known results, such as the Green-Kubo relations, Onsager’s reciprocal relations and Prigogine’s Minimum Entropy Production principle near equilibrium, but is also applicable far from equilibrium. I will also discuss some applications, such as finding reaction coordinates in molecular simulations non-linear dynamics in gene circuits, power-law-tail distributions in ‘social-physics’ networks, and others.

Nonequilibrium biomolecular information processes
Pierre Gaspard, Université libre de Bruxelles

Nearly 70 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of genetic information processing in DNA replication, transcription, and translation remain poorly understood. These template-directed copolymerization processes are running away from equilibrium, being powered by extracellular energy sources. Recent advances show that their kinetic equations can be exactly solved in terms of so-called iterated function systems. Remarkably, iterated function systems can determine the effects of genome sequence on replication errors, up to a million times faster than kinetic Monte Carlo algorithms. With these new methods, fundamental links can be established between molecular information processing and the second law of thermodynamics, shedding a new light on genetic drift, mutations, and evolution.

Nonequilibrium dynamics of disturbed ecosystems
John Harte, University of California, Berkeley

The Maximum Entropy Theory of Ecology (METE) predicts the shapes of macroecological metrics in relatively static ecosystems, across spatial scales, taxonomic categories, and habitats, using constraints imposed by static state variables. In disturbed ecosystems, however, with time-varying state variables, its predictions often fail. We extend macroecological theory from static to dynamic, by combining the MaxEnt inference procedure with explicit mechanisms governing disturbance. In the static limit, the resulting theory, DynaMETE, reduces to METE but also predicts a new scaling relationship among static state variables. Under disturbances, expressed as shifts in demographic, ontogenic growth, or migration rates, DynaMETE predicts the time trajectories of the state variables as well as the time-varying shapes of macroecological metrics such as the species abundance distribution and the distribution of metabolic rates over
individuals. An iterative procedure for solving the dynamic theory is presented. Characteristic signatures of the deviation from static predictions of macroecological patterns are shown to result from different kinds of disturbance. By combining MaxEnt inference with explicit dynamical mechanisms of disturbance, DynaMETE is a candidate theory of macroecology for ecosystems responding to anthropogenic or natural disturbances.

Stochastic chemical reaction networks
Supriya Krishnamurthy, Stockholm University

The study of chemical reaction networks (CRN’s) is a very active field. Earlier well-known results (Feinberg Chem. Enc. Sci. 42 2229 (1987), Anderson et al Bull. Math. Biol. 72 1947 (2010)) identify a topological quantity called deficiency, easy to compute for CRNs of any size, which, when exactly equal to zero, leads to a unique factorized (non-equilibrium) steady-state for these networks. No general results exist however for the steady states of non-zero-deficiency networks. In recent work, we show how to write the full moment-hierarchy for any non-zero-deficiency CRN obeying mass-action kinetics, in terms of equations for the factorial moments. Using these, we can recursively predict values for lower moments from higher moments, reversing the procedure usually used to solve moment hierarchies. We show, for non-trivial examples, that in this manner we can predict any moment of interest, for CRN’s with non-zero deficiency and non-factorizable steady states. It is however an open question how scalable these techniques are for large networks.

Heat flows adjust local ion concentrations in favor of prebiotic chemistry
Christof Mast, Ludwig-Maximilians-Universität München

Prebiotic reactions often require certain initial concentrations of ions. For example, the activity of RNA enzymes requires a lot of divalent magnesium salt, whereas too much monovalent sodium salt leads to a reduction in enzyme function. However, it is known from leaching experiments that prebiotically relevant geomaterial such as basalt releases mainly a lot of sodium and only little magnesium. A natural solution to this problem is heat fluxes through thin rock fractures, through which magnesium is actively enriched and sodium is depleted by thermogravitational convection and thermophoresis. This process establishes suitable conditions for ribozyme function from a basaltic leach. It can take place in a spatially distributed system of rock cracks and is therefore particularly stable to natural fluctuations and disturbances.

Deficiency of chemical reaction networks and thermodynamics
Matteo Polettini, University of Luxembourg

Deficiency is a topological property of a Chemical Reaction Network linked to important dynamical features, in particular of deterministic fixed points and of stochastic stationary states. Here we link it to thermodynamics: in particular we discuss the validity of a strong vs. weak zeroth law, the existence of time-reversed mass-action kinetics, and the possibility to formulate marginal fluctuation relations. Finally we illustrate some subtleties of the Python module we created for MCMC stochastic simulation of CRNs, soon to be made public.

Large deviations theory and emergent landscapes in biological dynamics
Hong Qian, University of Washington

The mathematical theory of large deviations provides a nonequilibrium thermodynamic description of complex biological systems that consist of heterogeneous individuals. In terms of the notions of stochastic elementary reactions and pure kinetic species, the continuous-time, integer-valued Markov process dictates a thermodynamic structure that generalizes (i) Gibbs’ microscopic chemical thermodynamics of equilibrium matters to nonequilibrium small systems such as living cells and tissues; and (ii) Gibbs’ potential function to the landscapes for biological dynamics, such as that of C. H. Waddington and S. Wright.

Using the maximum entropy production principle to understand and predict microbial biogeochemistry
Joseph Vallino, Marine Biological Laboratory, Woods Hole

Natural microbial communities contain billions of individuals per liter and can exceed a trillion cells per liter in sediments, as well as harbor thousands of species in the same volume. The high species diversity contributes to extensive metabolic functional capabilities to extract chemical energy from the environment, such as methanogenesis, sulfate reduction, anaerobic photosynthesis, chemoautotrophy, and many others, most of which are only expressed by bacteria and archaea. Reductionist modeling of natural communities is problematic, as we lack knowledge on growth kinetics for most organisms and have even less understanding on the mechanisms governing predation, viral lysis, and predator avoidance in these systems. As a result, existing models that describe microbial communities contain dozens to hundreds of parameters, and state variables are extensively aggregated. Overall, the models are little more than non-linear parameter fitting exercises that have limited, to no, extrapolation potential, as there are few principles governing organization and function of complex self-assembling systems. Over the last decade, we have been developing a systems approach that models microbial communities as a distributed metabolic network that focuses on metabolic function rather than describing individuals or species. We use an optimization approach to determine which metabolic functions in the network should be up regulated versus those that should be down regulated based on the non-equilibrium thermodynamics principle of maximum entropy production (MEP). Derived from statistical mechanics, MEP proposes that steady state systems will likely organize to maximize free energy dissipation rate. We have extended this conjecture to apply to non-steady state systems and have proposed that living systems maximize entropy production integrated over time and space, while non-living systems maximize instantaneous entropy production. Our presentation will provide a brief overview of the theory and approach, as well as present several examples of applying MEP to describe the biogeochemistry of microbial systems in laboratory experiments and natural ecosystems.

Reduction and the quasi-steady state approximation
Carsten Wiuf, University of Copenhagen

Chemical reactions often occur at different time-scales. In applications of chemical reaction network theory it is often desirable to reduce a reaction network to a smaller reaction network by elimination of fast species or fast reactions. There exist various techniques for doing so, e.g. the Quasi-Steady-State Approximation or the Rapid Equilibrium Approximation. However, these methods are not always mathematically justifiable. Here, a method is presented for which (so-called) non-interacting species are eliminated by means of QSSA. It is argued that this method is mathematically sound. Various examples are given (Michaelis-Menten mechanism, two-substrate mechanism, …) and older related techniques from the 50s and 60s are briefly discussed.

Electrostatics and the Gauss–Lucas Theorem

24 May, 2021

Say you know the roots of a polynomial P and you want to know the roots of its derivative. You can do it using physics! Namely, electrostatics in 2d space, viewed as the complex plane.

To keep things simple, let us assume P does not have repeated roots. Then the procedure works as follows.

Put equal point charges at each root of P, then see where the resulting electric field vanishes. Those are the roots of P’.

I’ll explain why this is true a bit later. But first, we use this trick to see something cool.

There’s no way the electric field can vanish outside the convex hull of your set of point charges. After all, if all the charges are positive, the electric field must point out of that region. So, the roots of P’ must lie in the convex hull of the roots of P!

This cool fact is called the Gauss–Lucas theorem. It always seemed mysterious to me. Now, thanks to this ‘physics proof’, it seems completely obvious!

Of course, it relies on my first claim: that if we put equal point
charges at the roots of P, the electric field they generate will vanish at the roots of P’. Why is this true?

By multiplying by a constant if necessary, we can assume

\displaystyle{   P(z) = \prod_{i = 1}^n  (z - a_i) }


\displaystyle{  \ln |P(z)| = \sum_{i = 1}^n \ln|z - a_i| }

This function is the electric potential created by equal point charges at the points ai in the complex plane. The corresponding electric field is minus the gradient of the potential, so it vanishes at the critical points of this function. Equivalently, it vanishes at the critical points of the exponential of this function, namely |P|. Apart from one possible exception, these points are the same as the critical points of P, namely the roots of P’. So, we’re almost done!

The exception occurs when P has a critical point where P vanishes. |P| is not smooth where P vanishes, so in this case we cannot say the critical point of P is a critical point of |P|.

However, when P has a critical point where P vanishes, then this point is a repeated root of P, and I already said I’m assuming P has no repeated roots. So, we’re done—given this assumption.

Everything gets a bit more complicated when our polynomial has repeated roots. Greg Egan explored this, and also the case where its derivative has repeated roots.

However, the Gauss–Lucas theorem still applies to polynomials with repeated roots, and this proof explains why:

• Wikipedia, Gauss–Lucas theorem.

Alternatively, it should be possible to handle the case of a polynomial with repeated roots by thinking of it as a limit of polynomials without repeated roots.

By the way, in my physics proof of the Gauss–Lucas theorem I said the electric field generated by a bunch of positive point charges cannot vanish outside the convex hull of these point charges because the field ‘points out’ of this region. Let me clarify that.

It’s true even if the positive point charges aren’t all equal; they just need to have the same sign. The rough idea is that each charge creates an electric field that points radially outward, so these electric fields can’t cancel at a point that’s not ‘between’ several charges—in other words, at a point that’s not in the convex hull of the charges.

But let’s turn this idea into a rigorous argument.

Suppose z is some point outside the convex hull of the points ai. Then, by the hyperplane separation theorem, we can draw a line with z on one side and all the points ai on the other side. Let v be a vector normal to this line and pointing toward the z side. Then

v \cdot (z - a_i) > 0

for all i. Since the electric field created by the ith point charge is a positive multiple of z – ai at the point z, the total electric field at z has a positive dot product with v. So, it can’t be zero!


The picture of a convex hull is due to Robert Laurini.

Parallel Line Masses and Marden’s Theorem

22 May, 2021

Here’s an idea I got from Albert Chern on Twitter. He did all the hard work, and I think he also drew the picture I’m going to use. I’ll just express the idea in a different way.

Here’s a strange fact about Newtonian gravity.

Consider three parallel ‘line masses’ that have a constant mass per length—the same constant for each one. Choose a plane orthogonal to these lines. There will typically be two points on this plane, say a and b, where a mass can sit in equilibrium, with the gravitational pull from all three lines masses cancelling out. This will be an unstable equilibrium.

Put a mass at point a. Remove the three line masses—but keep in mind the triangle they formed where they pierced your plane!

You can now orbit a test particle in an elliptical orbit around the mass at a in such a way that:

• one focus of this ellipse is a,
• the other focus is b, and
• the ellipse fits inside the triangle, just touching the midpoint of each side of the triangle.

Even better, this ellipse has the largest possible area of any ellipse contained in the triangle!

Here is Chern’s picture:


The triangle’s corners are the three points where the line masses pierce your chosen plane. These line masses create a gravitational potential, and the contour lines are level curves of this potential.

You can see that the points a and b are at saddle points of the potential. Thus, a mass placed at either a and b will be in an unstable equilibrium.

You can see the ellipse with a and b as its foci, snugly fitting into the triangle.

You can sort of see that the ellipse touches the midpoints of the triangle’s edges.

What you can’t see is that this ellipse has the largest possible area for any ellipse fitting into the triangle!

Now let me explain the math. While the gravitational potential of a point mass in 3d space is proportional to 1/r, the gravitational potential of a line mass in 3d space is proportional to \log r, which is also the gravitational potential of a point mass in 2d space.

So, if we have three equal line masses, which are parallel and pierce an orthogonal plane at points p_1, p_2 and p_3, then their gravitational potential, as a function on this plane, will be proportional to

\phi(z) = \log|z - p_1| + \log|z - p_2| + \log|z - p_3|

Here I’m using z as our name for an arbitrary point on this plane, because the next trick is to think of this plane as the complex plane!

Where are the critical points (in fact saddle points) of this potential? They are just points where the gradient of \phi vanishes. To find these points, we can just take the exponential of \phi and see where the gradient of that vanishes. This is a nice idea because

e^{\phi(z)} = |(z-p_1)(z-p_2)(z-p_3)|

The gradient of this function will vanish whenever

P'(z) = 0


P(z) = (z-p_1)(z-p_2)(z-p_3)

Since P is a cubic polynomial, P' is a quadratic, hence proportional to

(z - a)(z - b)

for some a and b. Now we use

Marden’s theorem. Suppose the zeros p_1, p_2, p_3 of a cubic polynomial P are non-collinear. Then there is a unique ellipse inscribed in the triangle with vertices p_1, p_2, p_3 and tangent to the sides at their midpoints. The foci of this ellipse are the zeroes of the derivative of P.

For a short proof of this theorem go here:

Carlson’s proof of Marden’s theorem.

This ellipse is called the Steiner inellipse of the triangle:

• Wikipedia, Steiner inellipse.

The proof that it has the largest area of any ellipse inscribed in the triangle goes like this. Using a linear transformation of the plane you can map any triangle to an equilateral triangle. It’s obvious that there’s a circle inscribed in any equilateral triangle, touching each of the triangle’s midpoints. It’s at least very plausible that that this circle is the ellipse of largest area contained in the triangle. If we can prove this we’re done.

Why? Because linear transformations map circles to ellipses, and map midpoints of line segments to midpoints of line segments, and simply rescale areas by a constant fact. So applying the inverse linear transformation to the circle inscribed in the equilateral triangle, we get an ellipse inscribed in our original triangle, which will touch this triangle’s midpoints, and have the maximum possible area of any ellipse contained in this triangle!

The Koide Formula

4 April, 2021

There are three charged leptons: the electron, the muon and the tau. Let m_e, m_\mu and m_\tau be their masses. Then the Koide formula says

\displaystyle{ \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} = \frac{2}{3} }

There’s no known reason for this formula to be true! But if you plug in the experimentally measured values of the electron, muon and tau masses, it’s accurate within the current experimental error bars:

\displaystyle{ \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} = 0.666661 \pm 0.000007 }

Is this significant or just a coincidence? Will it fall apart when we measure the masses more accurately? Nobody knows.

Here’s something fun, though:

Puzzle. Show that no matter what the electron, muon and tau masses might be—that is, any positive numbers whatsoever—we must have

\displaystyle{ \frac{1}{3} \le \frac{m_e + m_\mu + m_\tau}{\big(\sqrt{m_e} + \sqrt{m_\mu} + \sqrt{m_\tau}\big)^2} \le 1}

For some reason this ratio turns out to be almost exactly halfway between the lower bound and upper bound!

Koide came up with his formula in 1982 before the tau’s mass was measured very accurately.  At the time, using the observed electron and muon masses, his formula predicted the tau’s mass was

m_\tau = 1776.97 MeV/c2

while the observed mass was

m_\tau = 1784.2 ± 3.2 MeV/c2

Not very good.

In 1992 the tau’s mass was measured much more accurately and found to be

m_\tau = 1776.99 ± 0.28 MeV/c2

Much better!

Koide has some more recent thoughts about his formula:

• Yoshio Koide, What physics does the charged lepton mass relation tell us?, 2018.

He points out how difficult it is to explain a formula like this, given how masses depend on an energy scale in quantum field theory.

Vincenzo Galilei

3 April, 2021

I’ve been reading about early music. I ran into Vicenzo Galilei, an Italian lute player, composer, and music theorist who lived during the late Renaissance and helped start the Baroque era. Of course anyone interested in physics will know Galileo Galilei. And it turns out Vicenzo was Galileo’s dad!

The really interesting part is that Vincenzo did a lot of experiments—and he got Galileo interested in the experimental method!

Vicenzo started out as a lutenist, but in 1563 he met Gioseffo Zarlino, the most important music theorist of the sixteenth century, and began studying with him. Vincenzo became interested in tuning and keys, and in 1584 he anticipated Bach’s Well-Tempered Clavier by composing 24 groups of dances, one for each of the 12 major and 12 minor keys.

He also studied acoustics, especially vibrating strings and columns of air. He discovered that while the frequency of sound produced by a vibrating string varies inversely with the length of string, it’s also proportional to the square root of the tension applied. For example, weights suspended from strings of equal length need to be in a ratio of 9:4 to produce a perfect fifth, which is the frequency ratio 3:2.

Galileo later told a biographer that Vincenzo introduced him to the idea of systematic testing and measurement. The basement of their house was strung with lengths of lute string materials, each of different lengths, with different weights attached. Some say this drew Galileo’s attention away from pure mathematics to physics!

You can see books by Vicenzo Galilei here:

• Internet Archive, Vincenzo Galilei, c. 1520 – 2 July 1591.

Unfortunately for me they’re in Italian, but the title of his Dialogo della Musica Antica et Della Moderna reminds me of his son’s Dialogo sopra i Due Massimi Sistemi del Mondo (Dialog Concerning the Two Chief World Systems).

Speaking of dialogs, here’s a nice lute duet by Vincenzo Galilei, played by Evangelina Mascardi and Frédéric Zigante:

It’s from his book Fronimo Dialogo, an instruction manual for the lute which includes many compositions, including the 24 dances illustrating the 24 keys. “Fronimo” was an imaginary expert in the lute—in ancient Greek, phronimo means sage—and the book apparently consists of dialogs with between Fronimo and a student Eumazio (meaning “he who learns well”).

So, I now suspect that Galileo also got his fondness for dialogs from his dad, too! Or maybe everyone was writing them back then?

Can We Understand the Standard Model Using Octonions?

31 March, 2021

I gave two talks in Latham Boyle and Kirill Krasnov’s Perimeter Institute workshop Octonions and the Standard Model.

The first talk was on Monday April 5th at noon Eastern Time. The second was exactly one week later, on Monday April 12th at noon Eastern Time.

Here they are:

Can we understand the Standard Model? (video, slides)

Abstract. 40 years trying to go beyond the Standard Model hasn’t yet led to any clear success. As an alternative, we could try to understand why the Standard Model is the way it is. In this talk we review some lessons from grand unified theories and also from recent work using the octonions. The gauge group of the Standard Model and its representation on one generation of fermions arises naturally from a process that involves splitting 10d Euclidean space into 4+6 dimensions, but also from a process that involves splitting 10d Minkowski spacetime into 4d Minkowski space and 6 spacelike dimensions. We explain both these approaches, and how to reconcile them.

The second is on Monday April 12th at noon Eastern Time:

Can we understand the Standard Model using octonions? (video, slides)

Abstract. Dubois-Violette and Todorov have shown that the Standard Model gauge group can be constructed using the exceptional Jordan algebra, consisting of 3×3 self-adjoint matrices of octonions. After an introduction to the physics of Jordan algebras, we ponder the meaning of their construction. For example, it implies that the Standard Model gauge group consists of the symmetries of an octonionic qutrit that restrict to symmetries of an octonionic qubit and preserve all the structure arising from a choice of unit imaginary octonion. It also sheds light on why the Standard Model gauge group acts on 10d Euclidean space, or Minkowski spacetime, while preserving a 4+6 splitting.

You can see all the slides and videos and also some articles with more details here.