The Kuramoto–Sivashinsky Equation (Part 2)

22 October, 2021

I love the Kuramoto–Sivashinsky equation, beautifully depicted here by Thien An, because it’s one of the simplest partial differential equations that displays both chaos and a visible ‘arrow of time’. Last time I made some conjectures about it. Most notably, I conjecture that the ‘stripes’ you see above can appear or merge as time passes, but never disappear or split.

But I was quite confused for a while, because the Kuramoto–Sivashinsky equation comes in two forms. The integral form, which I spent most of my time discussing, is this:

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

where h(t,x) is a real function of two real variables, ‘time’ t and ‘space’ x. The derivative form is this:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

where u(t,x) is again a real function of two real variables. If h is a solution of the integral form,

u = h_x

is a solution of the derivative form. This is easy to see: just take the x derivative of the equation with h in it and see what you get.

By the way, beware of this: Thien An’s wonderful movie above shows the integral form of the equation, but she calls the function u.

Note that if h has a local maximum as a function of x, then u will go positive and then negative. This is important because it affects the definition of ‘stripe’. Here’s what stripes look in the derivative form of the Kuramoto–Sivashinsky equation, as computed by Steve Huntsman:

As you can see, the function u goes negative and then positive as we increase x moving through a stripe. This means that h would have a local minimum.

Another thing puzzling me was that Steve Huntsman found solutions of the derivative form where the stripes in u mostly move in the same direction as time passes:

This suggests that there’s some way to take a solution where the stripes aren’t moving much, and switch to a moving frame of reference to get a solution with moving stripes. And indeed I’ll show that’s true! But I’ll also show it’s not true for the integral form of the Kuramoto–Sivashinsky equation—at least not with periodic boundary conditions, which is what we’re using here.

Galilean transformations

Galileo was the guy who first emphasized that the laws of physics look the same in a moving frame of reference, but later Maxwell explained the idea very poetically:

Our whole progress up to this point may be described as a gradual development of the doctrine of relativity of all physical phenomena. Position we must evidently acknowledge to be relative, for we cannot describe the position of a body in any terms which do not express relation… There are no landmarks in space; one portion of space is exactly like every other portion, so that we cannot tell where we are. We are, as it were, on an unruffled sea, without stars, compass, sounding, wind or tide, and we cannot tell in which direction we are going. We have no log which we can case out to take a dead reckoning by; we may compute our rate of motion with respect to the neighboring bodies, but we do not know how these bodies may be moving in space.

Before Einstein came along, a transformation into a moving frame of reference worked like this:

t \mapsto t
x \mapsto x + vt

where v is a constant, the velocity.

These are called ‘Galilean transformations’ And here’s something cool: the derivative form of the Kuramoto–Sivashinsky equation has Galilean transformations as symmetries! If u(t,x) is a solution, so is the boosted function

u'(t,x) = u(t,x+vt) - v

The prime here does not mean derivative: u' is the name of our new boosted solution. To get this boosted solution, we do the obvious coordinate transformation into a moving frame of reference, but then a bit more: we must subtract the constant v.

Let’s see how this works! Suppose u is a solution of the derivative form of the Kuramoto–Sivashinsky equation:

u_t + u_{xx} + u_{xxxx} + u u_x = 0

Then defining

u'(t,x) = u(t, x+vt) - v

we see the t derivative of u' has an extra vu_x term:

u'_t(t,x) = u_t(t,x+vt) + vu_x(t,x+vt)

while its x derivatives are simple:

u'_x(t,x) = u_x(t,x+vt)

and so on for the higher x derivatives.

This lets us check that the boosted function u' is also a solution:

\begin{array}{ccl}  u'_t + u'_{xx} + u'_{xxxx} + u' u'_x &=& u_t(t,x+vt) + vu_x(t,x+vt)  \\ \\  &&  + u_{xx}(t,x+vt) + u_{xxxx}(t,x+tv)  \\ \\  && + (u(t,x+vt) - v) u_x(t,x+vt) \\ \\  &=& 0  \end{array}

Note how subtracting the constant v when we transform u exactly cancels the extra term vu_x term we get when we transform the time derivative of u.

I got this idea from here:

• Uriel Frisch, Zhensu She and Olivier Thual, Viscoelastic behaviour of cellular solutions to the Kuramoto–Sivashinsky model, Journal of Fluid Mechanics 168 (1986), 221–240.

thanks to someone on Twitter.

Does the integral form of the Kuramoto-Sivashinsky equation also have Galilean transformations as symmetries? Yes, but there’s a certain problem with them. Let’s see how it goes.

If h(t,x) is a solution, then so is this boosted function:

h'(t,x) = h(t,x+vt) - vx - \frac{1}{2}t v^2

But now the ‘fudge factors’ in the boosted function are more sneaky! We don’t just subtract a constant, as we did with the derivative form. We subtract a function that depends on both the space and time coordinates.

Let’s check. The first time derivative of h' works like this:

h'_t(t,x) = h_t(t,x+vt) + vh_t(t,x+vt) - \frac{1}{2} v^2

The first space derivative works like this:

h'_x(t,x) = h_x(t,x+vt) - v

The second space derivative is simpler:

h'_{xx}(t,x) = h_{xx}(t,x+vt)

and the higher space derivatives work the same way. Now suppose h was a solution of the integral form of the Kuramoto-Sivashinsky equation:

h_t + h_{xx} + h_{xxxx} + \frac{1}{2} (h_x)^2 = 0

We can use our formulas to check that h' is a solution:

\begin{array}{ccl}  h'_t \! +\! h'_{xx}\! +\! h'_{xxxx} \!+ \! \frac{1}{2} (h'_x)^2 \!  &=& h_t(t,x+vt) + vh_t(t,x+vt) \! - \!\frac{1}{2} v^2 \\ \\  && + h_{xx}(t,x+vt)   \\ \\  && + h_{xxxx}(t,x+vt) \\ \\  && + \frac{1}{2}(h_x(t,x+vt) - v)^2 \\ \\  &=& h_x(t,x+vt) + h_{xx}(t,x+vt)  \\ \\  && + h_{xxxx}(t,x+vt) + \frac{1}{2} h_x(t,x+vt)^2  \\ \\  &=& 0  \end{array}

The cancellations in the second step rely crucially on those sneaky fudge factors in the definition of the boosted solution h'. I chose those fudge factors precisely to make these cancellations happen.

Periodic boundary conditions

A lot of really interesting results about the Kuramoto-Sivashinsky equation involve looking at solutions that are periodic in space. For the derivative form this means

u(t,x+L) = u(t,x)

It’s easy to see that if u obeys this equation, so does its boosted version:

u'(t,x) = u(t,x+vt) - v

After all, a quick calculation shows

\begin{array}{ccl}  u'(t,x+L) &=& u(t,x+L+vt) - v \\ \\  &=& u(t,x+vt) - v \\ \\  &=& u'(t,x)   \end{array}

So for any spatially periodic solution with some stripes that are basically standing still, there’s a spatially periodic boosted version where they’re moving at velocity v. You can think of a spatially periodic solution as a function on the cylinder. In the boosted version, the stripes spiral around this cylinder like the stripes on a barber pole!

But for the integral form this doesn’t work! Suppose h is a solution of the integral form that is periodic in space:

h(t,x+L) = h(t,x)

Now its boosted version is defined by

h'(t,x) = h(t,x+vt) - vx - \frac{1}{2}t v^2

and this is not periodic in space:

\begin{array}{ccl}  h'(t,x+L) &=& h(t,x+L+vt) - v(x+L) - \frac{1}{2}t v^2  \\ \\  &=& h(t,x+vt) - v(x+L) - \frac{1}{2}t v^2 \\ \\  &\ne& h(t,x+vt) = h'(t,x)  \end{array}

A conserved quantity

The derivative form of the Kuramoto–Shivashinsky equation has an unexpected symmetry under Galilean transformations, or boosts, even for periodic solutions. It also has an interesting conserved quantity, namely

\displaystyle{ \int_0^L u(t,x) \, dx }

To see this, note that

\begin{array}{ccl}  \displaystyle{ \frac{d}{dt} \int_0^L u(t,x) \, dx } &=& \displaystyle{ \int_0^L u_t(t,x) \, dx }  \\ \\  &=& \displaystyle{ - \int_0^L \left( u_{xx} + u_{xxxx} + u u_x \right) \, dx } \\ \\  &=& \displaystyle{ \left. - \left( u_x + u_{xxx} + \frac{1}{2} u^2 \right)   \right|_0^L  } \\ \\  &=& 0  \end{array}

Note that if we boost a solution, replacing u(t,x) by u(t,x+vt) - v, we also subtract a constant from this conserved quantity, namely v L. So, it seems that the conserved quantity

\displaystyle{ \int_0^L u(t,x) \, dx }

is a way of measuring ‘how much the stripes are moving, on average’.

I’ll venture a guess that when

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

the stripes are stationary, on average. One reason I’m willing to guess this is that this equation has another meaning, too.

I mentioned that we can get solutions u of the derivative form from solutions h of the integral form by differentiating them, like this:

u = h_x

Does every solution of the derivative form arise this way from a solution of the integral form? Given u we can define h by

\displaystyle{ h(t,x) = \int_0^x u(t,y) \, dy }

and we can check it obeys the integral form of the Kuramoto–Sivashinsky equation. I’ve been showing you lots of calculations and we’re probably both getting tired of them, so I won’t actually go through this check. But here’s the interesting part. Suppose we restrict attention to spatially periodic solutions! If u is spatially periodic, h will be so iff

\displaystyle{ \int_0^L u(t,x) \, dx = 0}

since we have

\begin{array}{ccl}  h(t,x+L) &=& \displaystyle{ \int_0^{x+L} u(t,y) \, dy } \\ \\  &=&  \displaystyle{ \int_0^x u(t,y) \, dy + \int_x^{x+L} u(t,y) \, dy } \\ \\  &=& h(t,x)  \end{array}

where in the last step, the first integral is h(t,x) by definition and the second is zero because

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

and u is periodic with period L.


Limiting ourselves to spatially periodic solutions, we see:

• The derivative form of the Kuramoto–Sivashinsky equation has symmetry under boosts.

• The integral form does not.

• Solutions of the integral form correspond precisely to solutions of the derivative form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

• Boosting a solution of the derivative form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

gives a solution where this quantity is not zero.

Given all this, I think the good theorems about spatially periodic solutions of the Kuramoto–Sivashinsky equation will focus on solutions of the integral form, or equivalently solutions of the differential form with

\displaystyle{ \int_0^L u(t,x) \, dx = 0 }

Epilogue: symmetries of the heat equation

At first I was shocked that the Kuramoto–Sivashinsky equation had Galilean transformations as symmetries, because it’s a relative of the heat equation, and the heat equation obviously doesn’t have Galilean transformations as symmetries: you can’t get a solution of the heat equation where a wave of heat moves along at constant velocity as it spreads out.

But then I remembered that the heat equation

u_t = u_{xx}

is very similar to Schrödinger’s equation:

u_t = i u_{xx}

And Schrödinger’s equation does have Galilean transformations as symmetries, since it describes a free particle! Physicists know how they work. You don’t just replace u(t,x) with u(t,x+vt), you also multiply it by a suitable complex phase that depends on t and x. That is, you multiply u(t,x+vt) by the exponential of some imaginary-valued function of t and x.

So then I realized that the heat equation does have Galilean symmetries! They work just as for Schrödinger’s equation, but now you have to multiply u(t,x+vt) by the exponential of some real-valued function of t and x.

This grows exponentially as you move in one direction in space, so people thinking about heat often don’t think about such solutions. Yes, you can get a ‘wave of heat that moves along at constant velocity as it spreads out’, but it’s like a huge tsunami wave rolling in from infinity!

I was very proud of myself for having discovered this weird Galilean invariance of the heat equation, and I posed it as a puzzle on Twitter. But then Tom Price pointed out a paper about it:

• U. Niederer, Schrödinger invariant generalized heat equations, Helvetica Physica Acta 51 (1978), 220–239.

It turns out the Galilean invariance of the heat equation, and much more, was discovered by Sophus Lie in 1882!

The so-called ‘Schrödinger group’ also includes certain fractional linear transformations of the plane. So this raises the question of whether the Kuramoto–Sivashinsky equation has even more symmetries than those in the Galilei group (translations, spatial reflections and Galilean transformations).

But it seems this too has been studied:

• Mehdi Nadjafikhah and Fatemeh Ahangari, Lie symmetry analysis of the two-dimensional generalized Kuramoto–Sivashinsky equation, Mathematical Sciences 6 (2012).

This discusses the analogue of the Kuramoto–Sivashinsky equation in two dimensions of space and one of time. It uses some techniques to compute its symmetry group, but I don’t see any sign of invariance under a big group like the ‘Schrödinger group’.

The Kuramoto–Sivashinsky Equation (Part 1)

17 October, 2021

I love this movie showing a solution of the Kuramoto–Sivashinsky equation, made by Thien An. If you haven’t seen her great math images on Twitter, check them out!

I hadn’t known about this equation, and it looked completely crazy to me at first. But it turns out to be important, because it’s one of the simplest \partial differential equations that exhibits chaotic behavior.

As the image scrolls to the left, you’re seeing how a real-valued function h(t,x) of two real variables changes with the passage of time. The vertical direction is ‘space’, x, while the horizontal direction is time, t.

Near the end of this post I’ll make some conjectures about the Kuramoto–Sivashinsky equation. The first one is very simple: as time passes, stripes appear and merge, but they never disappear or split.

The behavior of these stripes makes the Kuramoto–Sivashinsky equation an excellent playground for thinking about how differential equations can describe ‘things’ with some individuality, even though their solutions are just smooth functions. But to test my conjectures, I could really use help from people who are good at numerical computation or creating mathematical images!

First let me review some known stuff. You can skip this and go straight to the conjectures if you want, but some terms might not make sense.


For starters, note that these stripes seem to appear out of nowhere. That’s because this system is chaotic: small ripples get amplified. This is especially true of ripples with a certain wavelength: roughly 2 \sqrt{2} \pi, as we’ll see later.

And yet while solutions of the Kuramoto–Sivanshinsky equation are chaotic, they have a certain repetitive character. That is, they don’t do completely novel things; they seem to keep doing the same general sort of thing. The world this equation describes has an arrow of time, but it’s ultimately rather boring compared to ours.

The reason is that all smooth solutions of the Kuramoto–Sivanshinsky equation quickly approach a certain finite-dimensional manifold of solutions, called an ‘inertial manifold’. The dynamics on the inertial manifold is chaotic. And sitting inside it is a set called an ‘attractor’, which all solutions approach. This attractor is probably a fractal. This attractor describes the complete repertoire of what you’ll see solutions do if you wait a long time.

Some mathematicians have put a lot of work into proving these things, but let’s see how much we can understand without doing anything too hard.

Written out with a bit less jargon, the Kuramoto–Sivashinky equation says

\displaystyle{ \frac{\partial h}{\partial t} = - \frac{\partial^2 h}{\partial x^2}  - \frac{\partial^4 h}{\partial x^4}  - \frac{1}{2}\left( \frac{\partial h}{\partial x}\right)^2 }

or in more compressed notation,

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

To understand it, first remember the heat equation:

h_t = h_{xx}

This describes how heat spreads out. That is: if h(t,x) is the temperature of an iron rod at position x at time t, the heat equation describes how this temperature function flattens out as time passes and heat spreads.

But the Kuramoto–Sivashinsky equation more closely resembles the time-reversed heat equation

h_t = -h_{xx}

This equation describes how, running a movie of a hot iron rod backward, heat tends to bunch up rather than smear out! Small regions of different temperature, either hotter or colder than their surroundings, will tend to amplify.

This accounts for the chaotic behavior of the Kuramoto–Sivashinsky equation: small stripes emerge as if out of thin air and then grow larger. But what keeps these stripes from growing uncontrollably?

The next term in the equation helps. If we have

h_t = -h_{xx} - h_{xxxx}

then very sharp spikes in h(t,x) tend to get damped out exponentially.

To see this, it helps to bring in a bit more muscle: Fourier series. We can easily solve the heat equation if our iron rod is the interval [0,2\pi] and we demand that its temperature is the same at both ends:

h(t,0) = h(t,2\pi)

This lets us write the temperature function h(t,x) in terms of the functions e^{ikx} like this:

\displaystyle{  h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(t) e^{ikx} }

for some functions \hat{h}_k(t). Then the heat equation gives

\displaystyle{   \frac{d}{d t} \hat{h}_k(t) = -k^2 \hat{h}_k(t) }

and we can easily solve these equations and get

\displaystyle{  \hat{h}_k(t) = e^{-k^2 t} \hat{h}_k(0) }

and thus

\displaystyle{   h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{-k^2 t} e^{ikx} }

So, each function \hat{h}_k(t) decays exponentially as time goes on, and the so-called ‘high-frequency modes’, \hat{h}_k(t) with |k| big, get damped really fast due to that e^{-k^2 t} factor. This is why heat smears out as time goes on.

If we solve the time-reversed heat equation the same way we get

\displaystyle{  h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{k^2 t} e^{ikx} }

so now high-frequency modes get exponentially amplified. The time-reversed heat equation is a very unstable: if you change the initial data a little bit by adding a small amount of some high-frequency function, it will make an enormous difference as time goes by.

What keeps things from going completely out of control? The next term in the equation helps:

h_t = -h_{xx} - h_{xxxx}

This is still linear so we can still solve it using Fourier series. Now we get

\displaystyle{   h(t,x) = \sum_{k = -\infty}^\infty \hat{h}_k(0) e^{(k^2-k^4) t} e^{ikx} }

Since k^2 - k^4 \le 0, none of the modes \hat{h}_k(t) grows exponentially. In fact, all the modes decay exponentially except for three: k = -1,0,1. These will be constant in time. So, any solution will approach a constant as time goes on!

We can make the story more interesting if we don’t require our rod to have length 2\pi. Say it has length L. We can write periodic functions on the interval [0,L] as linear combinations of functions e^{ikx} where now the frequencies k aren’t integers: instead

k = 2\pi n/L

for integers n. The longer our rod, the lower these frequencies k can be. The rest of the math works almost the same: we get

\displaystyle{ h(t,x) = \sum_{n = -\infty}^\infty \hat{h}_k(0) e^{(k^2-k^4) t} e^{ikx} }

but we have to remember k = 2\pi n/L. The modes with k^2 - k^4 > 0 will grow exponentially, while the rest will decay exponentially or stay constant. Note that k^2 - k^4 > 0 only for 0 <|k| < 1. So, modes with these frequencies grow exponentially. Modes with |k| > 1 decay exponentially.

If L < 2\pi, all the frequencies k are integers times 2\pi/L, which is bigger than 1, so no modes grow exponentially—and indeed all solutions approach a constant! But as you look at longer and longer rods, you get more and more modes that grow exponentially. The number of these will be roughly proportional to L, though they will ‘pop into existence’ at certain specific values of L.

Which exponentially growing modes grow the fastest? These are the ones that make k^2 - k^4 as large as possible, so they happen near where

\displaystyle{ \frac{d}{dk} (k^2 - k^4) = 0 }

namely k = 1/\sqrt{2}. The wavelength of a mode is 2\pi/k, so these fastest-growing modes have wavelength close to 2\sqrt{2} \pi.

In short, our equation has a certain length scale where the instability is most pronounced: temperature waves with about this wavelength grow fastest.

All this is very easy to work out in as much detail as we want, because our equation so far is linear. The full-fledged Kuramoto–Sivashinsky equation

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

is a lot harder. And yet some features of the linear version remain, which is why it was worth spending time on that version.

For example, I believe the stripes we see in the movie above have width roughly 2 \sqrt{2} \pi. Stripes of roughly this width tend to get amplified. Why don’t they keep on growing taller forever? Apparently the nonlinear term -(h_x)^2 prevents it. But this is not obvious. Indeed, it’s conjectured that if you solve the Kuramoto–Sivashinsky equation starting with a bounded smooth function h(0,x), the solution will remain bounded by a constant. But this has not been proved—or at least it was not proved as of 2000, when this very nice article was written:

• Encyclopedia of Mathematics, Kuramoto–Sivashinsky equation.

The inertial manifold

The most fascinating fact about the Kuramoto–Sivashinsky equation is that for any fixed length L, it has a finite-dimensional manifold M of solutions such that every solution approaches one of these, exponentially fast! So, while this equation really describes an infinite-dimensional dynamical system, as t \to \infty its solutions move closer and closer to the solutions of some finite-dimensional dynamical system. This finite-dimensional system contains all the information about the patterns we’re seeing in Thien An’s movie.

As I mentioned, the manifold M is called an ‘inertial manifold’. This is a general concept in dynamical systems theory:

• Wikipedia, Inertial manifold.

To make these ideas precise we need to choose a notion of distance between two solutions at a given time. A good choice uses the L^2 norm for periodic functions on [0,L]:

\displaystyle{  \|f\| = \sqrt{\int_0^L |f(x)|^2 \, dx} }

Functions on [0,L] with finite L^2 norm form a Hilbert space called L^2[0,L]. If we start with any function h(0,-) in this Hilbert space we get a solution h(t,x) of the Kuramoto–Sivashinsky equation such that the function h(t,-) is in this Hilbert space at all later times t. Furthermore, this function is smooth, even analytic, for all later times:

• P. Collet, J.-P. Eckmann, H. Epstein and J. Stubbe, Analyticity for the Kuramoto–Sivashinsky equation, Physica D 67 (1993), 321–326.

This smoothing property is well-known for the heat equation, but it’s much less obvious here!

This work also shows that the Kuramoto–Sivashinsky equation defines a dynamical system on the Hilbert space L^2[0,L]. And based on earlier work by other mathematicians, Temam and Wang have heroically shown that this Hilbert space contains an inertial manifold of dimension bounded by some constant times L^{1.64} (\ln L)^{0.2}.

• Roger Temam and Xiao Ming Wang, Estimates on the lowest dimension of inertial manifolds for the Kuramoto-Sivashinsky equation in the general case, Differential and Integral Equations 7 (1994), 1095–1108.

I conjecture that in reality its dimension grows roughly linearly with L. Why? We’ve just seen this is true for the linearized version of the Kuramoto–Sivashinsky equation: all modes with frequency |k| > 1 get damped exponentially, but since there’s one mode for each integer n, and k = 2\pi n/L, these modes correspond to integers n with |n| \le L /2\pi. So, there are \lfloor L/\pi \rfloor of these modes. In short, for the linearized Kuramoto–Sivashinsky equation the inertial manifold has dimension about L/\pi.

This evidence is rather weak, since it completely ignores the nonlinearity of the Kuramoto–Sivashinsky equation. I would not be shocked if the dimension of the inertial manifold grew at some other rate than linearly with L.

Sitting inside the inertial manifold is an attractor, the smallest set that all solutions approach. This is probably a fractal, since that’s true of many chaotic systems. So besides trying to estimate the dimension of the inertial manifold, which is an integer we should try to estimate the dimension of this attractor, which may not be an integer!

There have been some nice numerical experiments studying solutions of the Kuramoto–Sivashinsky equation for various values of L, seeing how they get more complicated as L increases. For small L, every solution approaches a constant, just as in the linearized version. For larger L we get periodic solutions, and as L continues to increase we get period doubling and finally chaos—a typical behavior for dynamical systems. But that’s just the start. For details, read this:

• Demetrios T. Papageorgiou and Yiorgos S. Smyrlis, The route to chaos for the Kuramoto–Sivashinsky equation, Theoretical and Computational Fluid Dynamics 3 (1991), 15–42.

I’ll warn you that they use a slightly different formalism. Instead of changing the length L, they keep it equal to 2\pi and change the equation, like this:

h_t = -h_{xx} - v h_{xxxx} - \frac{1}{2} (h_x)^2

for some number v they call the ‘viscosity’. It’s just a different way of describing the same business, so if I had more energy I could figure out the relation between L and v and tell you at which length L chaos first kicks in. But I won’t now. Instead, I want to make some conjectures.


There should be some fairly well-defined notion of a ‘stripe’ for the Kuramoto–Sivashinsky equations: you can see the stripes form and merge here, and if we can define them, we can count them and say precisely when they’re born and when they merge:

For now I will define a ‘stripe’ as follows. At any time, a solution of the Kuramoto–Sivashinsky gives a periodic function h on the interval [0,L]. We can think of this as a function on the circle. A stripe will be a maximal closed subinterval of the circle on which h \ge c. This definition depends on a constant c > 0, and it’s up to you to pick a value of the constant that makes my conjectures true—or at least, almost true!

So, here are the conjectures:

First, I conjecture that if L is large enough, almost every non-negative solution in the inertial manifold has a finite number of stripes at any time t, and that while they can appear and merge as we increase t, they can never split or disappear.

(Here ‘almost every’ is in the usual sense of measure theory. There are certainly solutions of the Kuramoto–Sivashinsky equation that don’t have stripes that appear and merge, like constant solutions. These solutions may lie on the inertial manifold, but I’m claiming they are rare.)

I also conjecture that the time-averaged number of stripes is asymptotically proportional to L as L \to \infty for almost every nonnegative solution on the inertial manifold. The constant of proportionality shouldn’t depend on the solution we pick, except for solutions in some set of measure zero. It will, however, depend on our precise definition of ‘stripe’, e.g. our choice of the constant c.

I also conjecture that there’s a well-defined time average of the rate at which new stripes form, which is also asymptotically proportional to L and independent of which solution we pick, except for solutions in a set of measure zero.

I also conjecture that this rate equals the time-averaged rate at which stripes merge, while the time-averaged rate at which stripes disappear or split is zero.

These conjectures are rather bold, but of course there are various fallback positions if they fail.

How can we test these conjectures? It’s hard to explicitly describe solutions that are actually on the inertial manifold, but by definition, any solution keeps getting closer to the inertial manifold at an exponential rate. Thus, it should behave similarly to solutions that are on the inertial manifold, after we wait long enough. So, I’ll conjecture that the above properties hold not only for almost every solution on the inertial manifold, but for typical solutions that start near the inertial manifold… as long as we wait long enough when doing our time averages.

If you feel like working on this, here are some things I’d really like:

• Images like Thien An’s but with various choices of L. To create these, maybe start with

h(0,x) = \sin \frac{x}{2L} + \textrm{a small amount of noise}

and run it for long enough to ‘settle down’—that is, get near the inertial manifold.

• A time-averaged count of the average number of stripes for various choices of L. I’m conjecturing that this is asymptotically proportional to L for large L.

• Time-averaged counts of the rates at which stripes are born, merge, split, and die—again for various choices of L. I’m conjecturing that the first two are asymptotically proportional to L for large L and that they’re equal. I’m conjecturing that the last two are zero, or tiny.

If someone gets into this, maybe we could submit a short paper to Experimental Mathematics. I’ve been browsing papers on the Kuramoto–Sivashinsky equations, and I haven’t yet seen anything that gets into as much detail on what solutions look like as I’m trying to do here.

The arrow of time

One more thing. I forgot to emphasize that the dynamical system on the Hilbert space L^2[0,L] is not reversible: we can evolve a solution forwards in time and it will stay in this Hilbert space, but not backwards in time. This is very well-known for the heat equation; the point is that solutions get smoother as we run them forward, but when we run them backward they typically get more wild and eventually their L^2 norm blows up.

What makes this especially interesting is that the dynamical system on the inertial manifold probably is reversible. As long as this manifold is compact, it must be: any smooth vector field on a compact manifold M generates a ‘flow’ that you can run forward or backward in time.

And yet, even if this flow is reversible, as I suspect it is, it doesn’t resemble its time-reversed version! It has an ‘arrow of time’ built in, since bumps are born and merge much more often than they merge and split.

So, if my guesses are right, the inertial manifold for the Kuramoto–Sivashinsky equation describes a deterministic universe where time evolution is reversible—and yet the future doesn’t look like the past, because the dynamics carry the imprint of the irreversible dynamics of the Kuramoto–Sivashinsky equation on the larger Hilbert space of all solutions.

A warning

If you want to help me, the following may be useful. I believe the stripes are ‘bumps’, that is, regions where u > c for some positive constant c. That would make them easy to define. I was shocked when Steve Huntsman did some calculations and produced a picture showing a solution where L = 128:

Here the stripes are not mere bumps: they are regions where, as we increase x, the solution first becomes large and negative and then becomes large and positive!

After massive confusion I realized that Steve was using some MATLAB code adapted from this website:

• Mathab Lak, Test case for PDEs: Kuramoto–Sivashinksy, Crank-Nicolson/Adams-Bashforth (CNAB2) timestepping.

and this code solves a different version of the Kuramoto–Sivashinksy equations, the so-called derivative form:

u_t = -u_{xx} - u_{xxxx} - uu_x

If h satisfies the integral form of the Kuramoto–Sivashinksy equation, which is the one I’ve been talking about all along:

h_t = -h_{xx} - h_{xxxx} - \frac{1}{2} (h_x)^2

then its derivative

u = h_x

satisfies the derivative form.

So, the two equations are related, but you have to be careful because some of their properties are quite different! For the integral form, the cross-section of a typical stripe looks very roughly like this:

but for the derivative form it looks more like this:

You can grab Steve Huntsman’s MATLAB code here, but beware: this program solves the derivative form! Many of Steve’s pictures in comments to this blog—indeed, all of them so far—show solutions of the derivative form. In particular, the phenomenon he discovered of stripes tending to move at a constant nonzero velocity seems to be special to the derivative form of the Kuramoto–Sivashinsky equation. I’ll say more about this next time.

Stirling’s Formula in Words

8 October, 2021


I recently sketched a quick proof of Stirling’s asymptotic formula for the factorial. But what does this formula really mean?

Above is my favorite explanation. You can’t see the number 2\pi in the words here, but it’s hiding in the formula for a Gaussian probability distribution. But what about the factorial, and the rest?

My description in words was informal. I’m really talking about a
Poisson distribution. If raindrops land at an average rate r, this says that after time t the probability of k having landed is

\displaystyle{  \frac{(rt)^k e^{-rt}}{k!} }

This is where the factorial comes from, and also the number e.

Since the average rate of rainfall is r, at time t the expected number of drops that have landed will be rt. Since I said “wait until the expected number of drops that have landed is n”, we want rt = n. Then the probability of k having landed is

\displaystyle{  \frac{n^k e^{-n}}{k!} }

Next, what’s the formula for a Gaussian with mean n and standard deviation \sqrt{n}? Written as a function of k it’s

\displaystyle{    \frac{e^{-(k-n)^2/2n}}{\sqrt{2 \pi n}} }

If this matches the Poisson distribution above in the limit of large n, the two functions must match at the point k = n, at least asymptotically, so

\displaystyle{  \frac{n^n e^{-n}}{n!}  \sim  \frac{1}{\sqrt{2 \pi n}} }

And this becomes Stirling’s formula after a tiny bit of algebra!

I learned about this on Twitter: Ilya Razenshtyn showed how to prove Stirling’s formula starting from probability theory this way. But it’s much easier to use his ideas to check that my paragraph in words implies Stirling’s formula, as I’ve just done.

Stirling’s Formula

3 October, 2021

Stirling’s formula says

\displaystyle{ n! \sim \sqrt{2 \pi n}\, \left(\frac{n}{e}\right)^n }

where \sim means that the ratio of the two quantities goes to 1 as n \to \infty.

Where does this formula come from? In particular, how does the number 2\pi get involved? Where is the circle here?

To understand these things, I think a nonrigorous argument that can be made rigorous is more useful than a rigorous proof with all the ε’s dotted and the δ’s crossed. It’s important, I think, to keep the argument short. So let me do that.

The punchline will be that the 2\pi comes from this formula:

\displaystyle{  \int_{-\infty}^\infty e^{-x^2/2} \, dx = \sqrt{2 \pi} }

And this, I hope you know, comes from squaring both sides and converting the left side into a double integral that you can do in polar coordinates, pulling out a factor of 2 \pi because the thing you’re integrating only depends on r, not \theta.

Okay, here goes. We start with

\displaystyle{ \int_0^\infty x^n e^{-x} \, dx = n! }

This is easy to show using repeated integration by parts.

Next, we do this:

\begin{array}{ccl}  n! &=& \displaystyle{ \int_0^\infty x^n e^{-x} \, dx } \\ \\  &=& \displaystyle{ \int_0^\infty e^{n \ln x  -x} \, dx } \\ \\  &=& \displaystyle{  n\int_0^\infty e^{n \ln (ny) -n y} \, dy } \\ \\  \end{array}

In first step we’re writing x^n as e^{n \ln x}. In the second we’re changing variables: x = n y.

Next we use \ln(ny) = \ln n + \ln y to bust things up:

\displaystyle{ n! =  n e^{n \ln n} \int_0^\infty e^{n \ln y -n y} \, dy }

All the hard work will come in showing this:

\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }

Given this, we get

\displaystyle{ n! \sim  n e^{n \ln n} \sqrt{\frac{2 \pi}{n}} \; e^{-n} }

and simplifying we get Stirling’s formulas:

\displaystyle{ n! \sim \sqrt{2 \pi n} \, \left(\frac{n}{e}\right)^n}

Laplace’s method

So to prove Stirling’s formula, the big question is: how do we get

\displaystyle{ \int_0^\infty e^{n \ln y -n y} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} } \; ?

Let’s write it like this:

\displaystyle{ \int_0^\infty e^{-n (y - \ln y)} \, dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n} }

The trick is to note that as n gets big, the integral will become dominated by the point where y - \ln y is as small as possible. We can then approximate the integral by a Gaussian peaked at that point!

Notice that

\displaystyle{ \frac{d}{dy} (y - \ln y) = 1 - y^{-1} }

\displaystyle{ \frac{d^2}{dy^2} (y - \ln y) = y^{-2} }

so the function y - \ln y has a critical point at y = 1 and its second derivative is 1 there, so it’s a local minimum. Indeed this point is the unique minimum of our function on the whole interval (0,\infty).

Then we use this:

Laplace’s Method. Suppose f \colon [a,b] \to \mathbb{R} has a unique minimum at some point x_0 \in (a,b) and f''(x_0) > 0. Then

\displaystyle{ \int_a^b e^{-n f(x)} dx \sim \sqrt{\frac{2 \pi}{n f''(x_0)}} \; e^{-n f(x_0)}    }

as n \to \infty.

This says that asymptotically, the integral equals what we’d get if we replaced f by the quadratic function whose value, first derivative and second derivative all match that of f at the point x_0. With this quadratic replacing f, you can do the integral by hand—it’s the integral of a Gaussian—and you get the right hand side.

Applying this formula to the problem at hand we get

\displaystyle{ \int_a^b e^{-n (y - \ln y)} dy \sim \sqrt{\frac{2 \pi}{n f''(y_0)}} \; e^{-n f(y_0)}    }

where f(y) = y - \ln y, y_0 = 1, f(y_0) = 1 and f''(y_0) = 1. So we get

\displaystyle{ \int_a^b e^{-n (y - \ln y)} dy \sim \sqrt{\frac{2 \pi}{n}} \; e^{-n}    }

and then letting a = 0, b \to +\infty we get what we want.

So, from this viewpoint—and there are others—the key to Stirling’s formula is Laplace’s method of approximating an integral like

\displaystyle{ \int_a^b e^{-n f(x)} dx }

with a Gaussian integral. And in the end, the crucial calculation is where we do that Gaussian integral, using

\displaystyle{  \int_{-\infty}^\infty e^{-x^2/2} \, dx = \sqrt{2 \pi} }

You can see the whole proof of Laplace’s method here:

• Wikipedia, Laplace’s method.

Physicists who have done quantum field theory will know that when push comes to shove it’s largely about Gaussian integrals. The limit n \to \infty we’re seeing here is like a ‘classical limit’ where \hbar \to 0. So they will be familiar with this idea.

There should be some deeper moral here, about how n! is related to a Gaussian process of some sort, but I don’t know it—even though I know how binomial coefficients approximate a Gaussian distribution. Do you know some deeper explanation, maybe in terms of probability theory and combinatorics, of why n! winds up being asymptotically described by an integral of a Gaussian?

For a very nice account of some cruder versions of Stirling’s formula, try this blog article:

• Michael Weiss, Stirling’s formula: Ahlfors’ derivation, Diagonal Argument, 17 July 2019.

His ‘note’, which you can find there, will give you more intuition for why something like Stirling’s formula should be true. But I think the above argument explains the \sqrt{2 \pi} better than Ahlfors’ argument.

But in math, there are always mysteries within mysteries. Gaussians show up in probability theory when we add up lots of independent and identically distributed random variables. Could that be going on here somehow?

Yes! See this:

• Aditya Ghosh, A probabilistic proof of Stirling’s formula, Blog on Mathematics and Statistics, September 7, 2020.

Folks at the n-Category Café noticed more mysteries. n!/n^n is the probability that a randomly chosen function from an n-element set to itself is a permutation. Stirling’s formula is a cool estimate of this probability! Can we use this to prove Stirling’s formula? I don’t know!


So I don’t think we’ve gotten to the bottom of Stirling’s formula! Comments at the n-Category Café contain other guesses about what it might ‘really mean’. You can read them here. Also check out Section 3 here, which discusses many different articles on Stirling’s formula in the American Mathematical Monthly:

• Jonathan M. Borwein and Robert M. Corless, Gamma and factorial in the Monthly.

Judging by the number of articles in the Monthly on the subject, Stirling’s formula approximating n! for large n is by far the most popular aspect of the \Gamma function. There are “some remarks”, “notes”, more “remarks”; there are “simple proofs”, “direct proofs”, “new proofs”, “corrections”, “short proofs”, “very short proofs”, “elementary” proofs, “probabilistic” proofs, “new derivations”, and (our favourite title) “The (n+1)th proof”.

That should be “(n+1)st”.

Classical Mechanics versus Thermodynamics (Part 4)

26 September, 2021

Thanks to the precise mathematical analogy between classical mechanics and thermodynamics that we saw last time, we can define Poisson brackets in thermodynamics just as in classical mechanics. That is, we can define Poisson brackets of quantities like entropy, temperature, volume and pressure. But what does this mean?

Furthermore, we can get quantum mechanics from classical mechanics by replacing Poisson brackets by commutators. That’s a brief summary of a long story, but by now this story is pretty well understood. So, we can do the same maneuver in thermodynamics and get some new theory where thermodynamic quantities like entropy, temperature, volume and pressure no longer commute! But what does this mean physically?

The obvious guess is that we get some version of thermodynamics that takes quantum mechanics into account, but I believe this is wrong. I want to explain why, and what I think is really going on. Briefly, I think what we get is more like statistical mechanics—but with a somewhat new outlook on this subject.

In getting to this understanding, we’ll bump into a few thorny questions. They’re interesting in themselves, but I don’t want them to hold us back, because we don’t absolutely need to answer them here. So, I’ll leave them as ‘puzzles’. Maybe you can help me with them.

In the classical mechanics of a point particle on a line, I considered a 4-dimensional vector space (q,p,t,H) where a point describes the position q, momentum p, time t and energy H of a particle. This is usually called the extended phase space, since it’s bigger than the usual phase space where a point (q,p) only keeps track of position and momentum. We saw that the extended phase space naturally acquires a symplectic structure

\omega = dp \wedge dq - dH \wedge dt

We can define Poisson brackets of smooth functions on any symplectic manifold. Instead of reviewing the whole well-known procedure I’ll just turn the crank and tell you what it gives for the coordinate functions on the extended phase space. The Poisson brackets of position and momentum are nonzero, and similarly for time and energy. All the rest vanish. We have

\{q,p\} = 1, \qquad \{t,H\} = -1

so we call these pairs of functions conjugate variables.

Puzzle 1. The first equation above says that momentum generates translations of position. The second says that energy generates time translations, but in reverse, thanks to the minus sign. Why the difference? Is this a problem?

To go to quantum mechanics, we can build a noncommutative algebra generated by elements I’ll call \hat{q}, \hat{p}, \hat{t} and \hat{H}. We require that these commute except for two, which have commutators that mimic the Poisson brackets above:

[\hat{q},\hat{p}] = i \hbar, \qquad [\hat{t},\hat{H}] = i \hbar

Here \hbar is Planck’s constant, or more precisely the reduced Planck constant. This has dimensions of action, making the above equations dimensionally consistent: momentum times position has dimensions of action, and so does energy multiplied by time.

Puzzle 2. Does this mean the well-known Poisson bracket equations \{p,q\} = 1 is not dimensionally consistent? Does this cause problems in classical mechanics? If not, why not?

We can work with the algebra having these generators and relations; this is called a Weyl algebra. But physicists usually prefer to treat this algebra as consisting of operators on some Hilbert space. The Stone–von Neumann theorem says there’s a unique ‘really nice’ way to do this. I don’t want to explain what ‘really nice’ means—it’s too much of a digression here. In fact I wrote a whole book about it with Irving Segal and Zhengfang Zhou. But this article is better if you just want the basic idea:

• Wikipedia, Stone–von Neumann theorem.

The main thing I need to say is that in the ‘really nice’ situation, \hat{q}, \hat{p}, \hat{t} and \hat{H} are unbounded self-adjoint operators, and the spectrum of each one is the whole real line. That’s not all there is to it: more must be true. But a problem immediately shows up. This is fine for the usual position and momentum operators in quantum mechanics, but the spectrum of the Hamiltonian \hat{H} is usually not the whole real line! In fact we usually want its spectrum to be bounded below: that is, we don’t want to allow arbitrarily large negative energies.

As a result, in quantum mechanics we usually give up on trying to find a ‘time operator’ \hat{t} that obeys the relation

[\hat{t},\hat{H}] = i \hbar

There is more to say about this:

• John Baez, The time-energy uncertainty relation.

But this is not what I want to talk about today; I mention it only because we’ll see a similar problem in thermodynamics, where volume is bounded below.

In any event, there’s a standard ‘really nice’ way to make \hat{q} and \hat{p} into operators in quantum mechanics: we take the Hilbert space L^2(\mathbb{R}) of square-integrable functions on the line, and define

\displaystyle{ (\hat{q} \psi)(x) = x \psi(x), \qquad (\hat{p} \psi)(x) = -i \hbar \frac{\partial}{\partial x} \psi(x) }

We may try to copy this in thermodynamics.

So, let’s see how it works! There are some twists.

In my explanations of classical mechanics versus thermodynamics so far, I’ve mainly been using the energy scheme, where we start with the internal energy U as a function of entropy S and volume V, and then define temperature T and pressure P as derivatives, so that

dU = T dS - P dV

As we’ve seen, this makes it very interesting to consider a 4-dimensional space—I’ll again call it the extended phase space—where S,T,V and P are treated as independent coordinates. This vector space has a symplectic structure

\omega = dT \wedge dS - dP \wedge dV

and the physically allowed points in this vector space form a ‘Lagrangian submanifold’—that is, a 2-dimensional surface on which the symplectic structure vanishes:

\displaystyle{ \left\{ (S,T,V,P): \;  T = \left.\phantom{\Big|} \frac{\partial U}{\partial S}\right|_V  , \; P = -\left. \phantom{\Big|}\frac{\partial U}{\partial V} \right|_S \right\} \; \subset \; \mathbb{R}^4 }

The symplectic structure lets us define Poisson brackets of smooth functions on the extended phase space. Again I’ll just turn the crank and tell you the Poisson brackets of the coordinate functions. The math is just the same, only the names have been changed, so you should not be surprised that two of their Poisson brackets are nonzero:

\{S,T\} = 1, \qquad \{V,P\} = -1

and all the rest vanish.

Puzzle 3. What is the physical meaning of these Poisson brackets?

Again we may worry that these equations are dimensionally inconsistent—but let’s go straight to the ‘quantized’ version and do our worrying there!

So, we’ll build a noncommutative algebra generated by elements I’ll call \hat{S}, \hat{T}, \hat{V} and \hat{P}. We’ll require that these commute except for two, namely the pair S, T and the pair V, P. What relation should these obey?

A first try might be

[\hat{S},\hat{T}] = i \hbar, \qquad [\hat{V},\hat{P}] = i \hbar

However, these equations are not dimensionally consistent! In both cases the left side has units of energy. Entropy times temperature and volume times pressure both have units of energy, since the first is related to ‘heat’ and the second is related to ‘work’. But \hbar has units of action.

I declare this to be intolerable. We are groping around trying to understand the physical meaning of the mathematics here; dimensional analysis is a powerful guide, and if we don’t have that going for us we’ll be in serious trouble.

So, we could try

[\hat{S},\hat{T}] = i\alpha, \qquad [\hat{V},\hat{P}] = i\alpha

where \alpha is some constant with units of energy.

Unfortunately there is no fundamental constant with units of energy that plays an important role throughout thermodynamics. We could for example use the mass of the electron times the speed of light squared, but why in the world should thermodynamics place some special importance on this?

One important quantity with units of energy in thermodynamics is kT, where T is temperature and k is Boltzmann’s constant. Boltzmann’s constant is fundamental in thermodynamics, or more precisely statistical mechanics: it has units of energy per temperature. So we might try

[\hat{S},\hat{T}] = ikT, \qquad [\hat{V},\hat{P}] = ikT

but unfortunately the temperature T is not a constant: it’s one of the variables we’re trying to ‘quantize’! It would make more sense to try

[\hat{S},\hat{T}] = ik\hat{T}, \qquad [\hat{V},\hat{P}] = ik\hat{T}

but unfortunately these commutation relations are more complicated than the ones we had in quantum mechanics.

Luckily, the fundamental constant we want is sitting right in front of us: it’s Boltzmann’s constant. This has units of energy per temperature—or in other words, units of entropy. This suggests quitting the ‘energy scheme’ and working with the second most popular formulation of thermodynamics, the ‘entropy scheme’.

Here we start by writing the entropy S of our system as a function of its internal energy U and volume V. A simple calculation starting from dU = TdS - PdV then shows

\displaystyle{ dS = \frac{1}{T} dU + \frac{P}{T} dV }

It will help to make up short names for the quantities here, so I’ll define

\displaystyle{ C = \frac{1}{T} }

and call C the coldness, for the obvious reason. I’ll also define

\displaystyle{ B = \frac{P}{T} }

and call B the boldness, for no particularly good reason—I just need a name. In terms of these variables we get

dS = C dU + B dV

This is the so-called entropy scheme. This equation implies that C times U has dimensions of entropy, and so does
B times V.

If we go through the procedure we used last time, we get a 4-dimensional vector space with coordinates U,C,V,B and symplectic structure

\omega = dC \wedge dU + dB \wedge dV

The physically allowed points in this vector space form a Lagrangian submanifold

\displaystyle{  \left\{ (U,C,B,V): \;  C = \left.\phantom{\Big|} \frac{\partial S}{\partial U}\right|_V  , \; B = \left. \phantom{\Big|}\frac{\partial S}{\partial V} \right|_U \right\} \; \subset \; \mathbb{R}^4 }

which is really just the submanifold we had before, now described using new coordinates.

But let’s get to the point! We’ll build a noncommutative algebra generated by elements I’ll call \hat{U}, \hat{S}, \hat{V} and \hat{P}. We’ll require that these commute except for two, namely the pair U, C and the pair V, B. And what commutation relations should these obey?

We could try

[\hat{U},\hat{C}] = ik, \qquad [\hat{V},\hat{B}] = ik

This is now dimensionally consistent, since in each equation both sides have dimensions of entropy!

However, there’s another twist. Quantum mechanics is about amplitudes, while statistical mechanics is about probabilities, which are real. So I actually think I want

[\hat{U},\hat{C}] = k, \qquad [\hat{V},\hat{B}] = k

Let me give some evidence for this!

I will ignore the second equation for now and focus on the first. Suppose we’re doing statistical mechanics and we have a system that has probability p(E) of having internal energy E. We would like to define an internal energy operator \hat{U} and coldness operator \hat{C}. Say we set

\displaystyle{  (\hat{U} p)(E) = E p(E), \qquad (\hat{C} p)(E) = -k \frac{\partial}{\partial E} p(E) }

Then it’s easy to check that they obey the commutation relation

[\hat{U},\hat{C}] = k

Moreover—and this is the exciting part—they make physical sense! If the system definitely has internal energy E_0, then p(E) vanishes except at E = E_0. It follows that p is an eigvenvector of the internal energy operator with eigenvalue E_0:

(\hat{U} p)(E) = E p(E) = E_0 p(E)


\hat{U} p = E_0 p

On the other hand, suppose the system definitely has temperature T_0, and thus coldness C_0 = 1/T_0. Then statistical mechanics tells us that p(E) is given by the Boltzmann distribution

\displaystyle{ p(E) = \frac{\exp(-E/kT_0)}{Z}  = \frac{\exp(-C_0 E/k) }{Z} }

where Z is a normalizing constant called the partition function. It follows that p is an eigenvector of the coldness operator with eigenvalue C_0:

\begin{array}{ccl}  (\hat{C} p)(E) &=&  \displaystyle{ -k \frac{\partial}{\partial E} \frac{\exp(-C_0 E/k)}{Z} }\\ \\  &=&  \displaystyle{  C_0 \frac{\exp(-C_0 E/k)}{Z} } \\ \\  &=& C_0 p(E)  \end{array}


\hat{C} p = C_0 p

This makes me very happy. We are seeing a nice analogy:

• Internal energy eigenstates in statistical mechanics are like position eigenstates in quantum mechanics.

• Coldness eigenstates in statistical mechanics are like momentum eigenstates in quantum mechanics—except for a missing factor of i in the exponential, which makes the coldness eigenstates decrease exponentially instead of oscillate.

We can also define a temperature operator

\displaystyle{ \hat{T} = \frac{1}{\hat{C}} }

at least if we ignore the eigenvectors of the coldness operator with eigenvalue zero. But the coldness operator is more fundamental in this approach.

A lot of further questions and problems appear at this point, but I think I’ll stop here and tackle them later.

Part 1: Hamilton’s equations versus the Maxwell relations.

Part 2: the role of symplectic geometry.

Part 3: a detailed analogy between classical mechanics and thermodynamics.

Part 4: what is the analogue of quantization for thermodynamics?

Classical Mechanics versus Thermodynamics (Part 3)

23 September, 2021

There’s a fascinating analogy between classical mechanics and thermodynamics, which I last talked about in 2012:

Classical mechanics versus thermodynamics (part 1).
Classical mechanics versus thermodynamics (part 2).

I’ve figured out more about it, and today I’m giving a talk about it in the physics colloquium at the University of British Columbia. It’s a colloquium talk that’s supposed to be accessible for upper-level undergraduates, so I’ll spend a lot of time reviewing the basics… which is good, I think.

You can see my slides here, and I’ll base this blog article on them. You can also watch a video of my talk:

Hamilton’s equations versus the Maxwell relations

Why do Hamilton’s equations in classical mechanics:

\begin{array}{ccr}  \displaystyle{  \frac{d p}{d t} }  &=&  \displaystyle{- \frac{\partial H}{\partial q} } \\  \\  \displaystyle{  \frac{d q}{d t} } &=&  \displaystyle{ \frac{\partial H}{\partial p} }  \end{array}

look so much like the Maxwell relations in thermodynamics?

\begin{array}{ccr}  \displaystyle{ \left. \frac{\partial T}{\partial V}\right|_S }  &=&  \displaystyle{ - \left. \frac{\partial P}{\partial S}\right|_V } \\   \\  \displaystyle{ \left. \frac{\partial S}{\partial  V}\right|_T  }  &=&  \displaystyle{ \left. \frac{\partial P}{\partial T} \right|_V }  \end{array}

William Rowan Hamilton discovered his equations describing classical mechanics in terms of energy around 1827. By 1834 he had also introduced Hamilton’s principal function, which I’ll explain later.

James Clerk Maxwell is most famous for his equations describing electromagnetism, perfected in 1865. But he also worked on thermodynamics, and discovered the ‘Maxwell relations’ in 1871.

Hamilton’s equations describe how the position q and momentum p of a particle on a line change with time t if we know the energy or Hamiltonian H(q,p):

\begin{array}{ccr}  \displaystyle{  \frac{d p}{d t} }  &=&  \displaystyle{- \frac{\partial H}{\partial q} } \\  \\  \displaystyle{  \frac{d q}{d t} } &=&  \displaystyle{ \frac{\partial H}{\partial p} }  \end{array}

Two of the Maxwell relations connect the volume V, entropy S, pressure P and temperature T of a system in thermodynamic equilibrium:

\begin{array}{ccr}  \displaystyle{ \left. \frac{\partial T}{\partial V}\right|_S }  &=&  \displaystyle{ - \left. \frac{\partial P}{\partial S}\right|_V } \\   \\  \displaystyle{ \left. \frac{\partial S}{\partial  V}\right|_T  }  &=&  \displaystyle{ \left. \frac{\partial P}{\partial T} \right|_V }  \end{array}
Using this change of variables:

q \to S \qquad p \to T
t \to V \qquad H \to P

Hamilton’s equations:

\begin{array}{ccr}  \displaystyle{  \frac{d p}{d t} }  &=&  \displaystyle{- \frac{\partial H}{\partial q} } \\   \\  \displaystyle{  \frac{d q}{d t} } &=&  \displaystyle{ \frac{\partial H}{\partial p} }  \end{array}

become these relations:

\begin{array}{ccr}  \displaystyle{  \frac{d T}{d V} }  &=&  \displaystyle{- \frac{\partial P}{\partial S} } \\  \\  \displaystyle{  \frac{d S}{d V} } &=&  \displaystyle{ \frac{\partial P}{\partial T} }  \end{array}

These are almost like two of the Maxwell relations! But in thermodynamics we always use partial derivatives:

\begin{array}{ccr}  \displaystyle{  \frac{\partial T}{\partial V} }  &=&  \displaystyle{ - \frac{\partial P}{\partial S} } \\   \\  \displaystyle{ \frac{\partial S}{\partial  V}   }  &=&  \displaystyle{  \frac{\partial P}{\partial T} }  \end{array}

and we say which variables are held constant:

\begin{array}{ccr}  \displaystyle{ \left. \frac{\partial T}{\partial V}\right|_S }  &=&  \displaystyle{ - \left. \frac{\partial P}{\partial S}\right|_V } \\   \\  \displaystyle{ \left. \frac{\partial S}{\partial  V}\right|_T  }  &=&  \displaystyle{ \left. \frac{\partial P}{\partial T} \right|_V }  \end{array}

If we write Hamilton’s equations in the same style as the Maxwell relations, they look funny:

\begin{array}{ccr}  \displaystyle{ \left. \frac{\partial p}{\partial t}\right|_q }  &=&  \displaystyle{ - \left. \frac{\partial H}{\partial q}\right|_t } \\   \\  \displaystyle{ \left. \frac{\partial q}{\partial  t}\right|_p  }  &=&  \displaystyle{ \left. \frac{\partial H}{\partial p} \right|_t }  \end{array}

Can this possibly be right?

Yes! When we work out the analogy between classical mechanics and thermodynamics we’ll see why.

We can get Maxwell’s relations starting from this: the internal energy U of a system in equilibrium depends on its entropy S and volume V.

Temperature and pressure are derivatives of U:

\displaystyle{  T =  \left.\frac{\partial U}{\partial S} \right|_V  \qquad  P = - \left. \frac{\partial U}{\partial V} \right|_S }

Maxwell’s relations follow from the fact that mixed partial derivatives commute! For example:

\displaystyle{    \left. \frac{\partial T}{\partial V} \right|_S \; = \;  \left. \frac{\partial}{\partial V} \right|_S \left. \frac{\partial }{\partial S} \right|_V U \; = \;  \left. \frac{\partial }{\partial S} \right|_V \left. \frac{\partial}{\partial V} \right|_S  U \; = \;  - \left. \frac{\partial P}{\partial S} \right|_V }

To get Hamilton’s equations the same way, we need a function W of the particle’s position q and time t such that

\displaystyle{       p = \left. \frac{\partial W}{\partial q} \right|_t   \qquad       H = -\left. \frac{\partial W}{\partial t} \right|_q  }

Then we’ll get Hamilton’s equations from the fact that mixed partial derivatives commute!

The trick is to let W be ‘Hamilton’s principal function’. So let’s define that. First, the action of a particle’s path is

\displaystyle{  \int_{t_0}^{t_1} L(q(t),\dot{q}(t)) \, dt  }

where L is the Lagrangian:

L(q,\dot{q}) = p \dot q - H(q,p)

The particle always takes a path from (q_0, t_0) to (q_1, t_1) that’s a critical point of the action. We can derive Hamilton’s equations from this fact.

Let’s assume this critical point is a minimum. Then the least action for any path from (q_0,t_0) to (q_1,t_1) is called Hamilton’s principal function

W(q_0,t_0,q_1,t_1) = \min_{q \; \mathrm{with} \; q(t_0) = q_0, q(t_1) = q_1}  \int_{t_0}^{t_1} L(q(t),\dot{q}(t)) \, dt

A beautiful fact: if we differentiate Hamilton’s principal function, we get back the energy H and momentum p:

\begin{array}{ccc}  \displaystyle{  \frac{\partial}{\partial q_0} W(q_0,t_0,q_1,t_1) = -p(t_0) } &&  \displaystyle{    \frac{\partial}{\partial t_0} W(q_0,t_0,q_1,t_1) = H(t_0) } \\  \\  \displaystyle{   \frac{\partial}{\partial q_1} W(q_0,t_0,q_1,t_1) =  p(t_1) } &&  \displaystyle{   \frac{\partial}{\partial t_1}W(q_0,t_0,q_1,t_1)  = -H(t_1) }  \end{array}

You can prove these equations using

L  = p\dot{q} - H

which implies that

\displaystyle{ W(q_0,t_0,q_1,t_1) =  \int_{q_0}^{q_1} p \, dq \; - \; \int_{t_0}^{t_1} H \, dt }

where we integrate along the minimizing path. (It’s not as trivial as it may look, but you can do it.)

Now let’s fix a starting-point (q_0,t_0) for our particle, and say its path ends at any old point (q,t). Think of Hamilton’s principal function as a function of just (q,t):

W(q,t) = W(q_0,t_0,q,t)

Then the particle’s momentum and energy when it reaches (q,t) are:

\displaystyle{   p = \left. \frac{\partial W}{\partial q} \right|_t   \qquad       H = -\left. \frac{\partial W}{\partial t} \right|_q }

This is just what we wanted. Hamilton’s equations now follow from the fact that mixed partial derivatives commute!

So, we have this analogy between classical mechanics and thermodynamics:

Classical mechanics Thermodynamics
action: W(q,t) internal energy: U(V,S)
position: q entropy: S
momentum: p = \frac{\partial W}{\partial q} temperature: T = \frac{\partial U}{\partial S}
time: t volume: V
energy: H= -\frac{\partial W}{\partial t} pressure: P = - \frac{\partial U}{\partial V}
dW = pdq - Hdt dU = TdS - PdV

What’s really going on in this analogy? It’s not really the match-up of variables that matters most—it’s something a bit more abstract. Let’s dig deeper.

I said we could get Maxwell’s relations from the fact that mixed partials commute, and gave one example:

\displaystyle{   \left. \frac{\partial T}{\partial V} \right|_S \; = \;  \left. \frac{\partial}{\partial V} \right|_S \left. \frac{\partial }{\partial S} \right|_V U \; = \;  \left. \frac{\partial }{\partial S} \right|_V \left. \frac{\partial}{\partial V} \right|_S  U \; = \;  - \left. \frac{\partial P}{\partial S} \right|_V }

But to get the other Maxwell relations we need to differentiate other functions—and there are four of them!

U: internal energy
U - TS: Helmholtz free energy
U + PV: enthalpy
U + PV - TS: Gibbs free energy

They’re important, but memorizing all the facts about them has annoyed students of thermodynamics for over a century. Is there some other way to get the Maxwell relations? Yes!

In 1958 David Ritchie explained how we can get all four Maxwell relations from one equation! Jaynes also explained how in some unpublished notes for a book. Here’s how it works.

Start here:

dU = T d S - P d V

Integrate around a loop \gamma:

\displaystyle{    \oint_\gamma T d S - P d V  = \oint_\gamma d U = 0 }


\displaystyle{  \oint_\gamma T d S = \oint_\gamma P dV }

This says the heat added to a system equals the work it does in this cycle

Green’s theorem implies that if a loop \gamma encloses a region R,

\displaystyle{   \oint_\gamma T d S = \int_R dT \, dS }


\displaystyle{ \oint_\gamma P d V  = \int_R dP \, dV }

But we know these are equal!

So, we get

\displaystyle{ \int_R dT \, dS  =   \int_R dP \, dV  }

for any region R enclosed by a loop. And this in turn implies

d T\, dS = dP \, dV

In fact, all of Maxwell’s relations are hidden in this one equation!

Mathematicians call something like dT \, dS a 2-form and write it as dT \wedge dS. It’s an ‘oriented area element’, so

dT \, dS = -dS \, dT

Now, starting from

d T\, dS = dP \, dV

We can choose any coordinates X,Y and get

\displaystyle{   \frac{dT \, dS}{dX \, dY} = \frac{dP \, dV}{dX \, dY}  }

(Yes, this is mathematically allowed!)

If we take X = V, Y = S we get

\displaystyle{    \frac{dT \, dS}{dV \, dS} = \frac{dP \, dV}{dV \, dS}  }

and thus

\displaystyle{   \frac{dT \, dS}{dV \, dS} = - \frac{dV \, dP}{dV \, dS}  }

We can actually cancel some factors and get one of the Maxwell relations:

\displaystyle{  \left.   \frac{\partial T}{\partial V}\right|_S = - \left. \frac{\partial P}{\partial S}\right|_V  }

(Yes, this is mathematically justified!)

Let’s try another one. If we take X = T, Y = V we get

\displaystyle{   \frac{dT \, dS}{dT \, dV} = \frac{dP \, dV}{dT \, dV} }

Cancelling some factors here we get another of the Maxwell relations:

\displaystyle{ \left.   \frac{\partial S}{\partial V} \right|_T = \left. \frac{\partial P}{\partial T}  \right|_V }

Other choices of X,Y give the other two Maxwell relations.

In short, Maxwell’s relations all follow from one simple equation:

d T\, dS = dP \, dV

Similarly, Hamilton’s equations follow from this equation:

d p\, dq = dH \, dt

All calculations work in exactly the same way!

By the way, we can get these equations efficiently using the identity d^2 = 0 and the product rule for d:

\begin{array}{ccl} dU = TdS - PdV & \implies & d^2 U = d(TdS - P dV) \\ \\  &\implies& 0 = dT\, dS - dP \, dV  \\ \\  &\implies & dT\, dS = dP \, dV  \end{array}

Now let’s change viewpoint slightly and temporarily treat P and V as independent from S and T. So, let’s start with \mathbb{R}^4 with coordinates (S,T,V,P). Then this 2-form on \mathbb{R}^4:

\omega = dT \, dS - dP \, dV

is called a symplectic structure.

Choosing the internal energy function U(S,V), we get this 2-dimensional surface of equilibrium states:

\displaystyle{   \Lambda = \left\{ (S,T,V,P): \; \textstyle{ T = \left.\phantom{\Big|} \frac{\partial U}{\partial S}\right|_V  , \; P = -\left. \phantom{\Big|}\frac{\partial U}{\partial V} \right|_S} \right\} \; \subset \; \mathbb{R}^4 }


\omega = dT \, dS - dP \, dV

we know

\displaystyle{  \int_R \omega = 0 }

for any region in the surface \Lambda, since on this surface dU = TdS - PdV and our old argument applies.

This fact encodes the Maxwell relations! Physically it says: for any cycle on the surface of equilibrium states, the heat flow in equals the work done.

Similarly, in classical mechanics we can start with \mathbb{R}^4 with coordinates (q,p,t,H), treating p and H as independent from q and t . This 2-form on \mathbb{R}^4:

\omega = dH \, dt - dp \, dq

is a symplectic structure. Hamilton’s principal function W(q,t) defines a 2d surface

\displaystyle{   \Lambda = \left\{ (q,p,t,H): \; \textstyle{ p = \left.\phantom{\Big|} \frac{\partial W}{\partial q}\right|_t  , \;  H = -\left.\phantom{\Big|} \frac{\partial W}{\partial t} \right|_q} \right\}  \subset \mathbb{R}^4 }

We have \int_R \omega = 0 for any region R in this surface \Lambda. And this fact encodes Hamilton’s equations!


In thermodynamics, any 2d region R in the surface \Lambda of equilibrium states has

\displaystyle{  \int_R \omega = 0 }

This is equivalent to the Maxwell relations.

In classical mechanics, any 2d region R in the surface \Lambda of allowed (q,p,t,H) 4-tuples for particle trajectories through a single point (q_0,t_0) has

\displaystyle{   \int_R \omega = 0 }

This is equivalent to Hamilton’s equations.

These facts generalize when we add extra degrees of freedom, e.g. the particle number N in thermodynamics:

\omega = dT \, dS - dP \, dV + d\mu \, dN

or more dimensions of space in classical mechanics:

\omega =  dp_1 \, dq_1 + \cdots + dp_{n-1} dq_{n-1} - dH \, dt

We get a vector space \mathbb{R}^{2n} with a 2-form \omega on it, and a Lagrangian submanifold \Lambda \subset \mathbb{R}^{2n}: that is, a n-dimensional submanifold such that

\int_R \omega = 0

for any 2d region R \subset \Lambda.

This is more evidence for Alan Weinstein’s “symplectic creed”:


As a spinoff, we get two extra Hamilton’s equations for a point particle on a line! They look weird, but I’m sure they’re correct for trajectories that go through a specific arbitrary spacetime point (q_0,t_0).

\begin{array}{ccrcccr}  \displaystyle{ \left. \frac{\partial T}{\partial V}\right|_S }  &=&  \displaystyle{ - \left. \frac{\partial P}{\partial S}\right|_V } & \qquad &  \displaystyle{ \left. \frac{\partial p}{\partial t}\right|_q }  &=&  \displaystyle{ - \left. \frac{\partial H}{\partial q}\right|_t }  \\   \\  \displaystyle{ \left. \frac{\partial S}{\partial  V}\right|_T  }  &=&  \displaystyle{ \left. \frac{\partial P}{\partial T} \right|_V } & &  \displaystyle{ \left. \frac{\partial q}{\partial  t}\right|_p  }  &=&  \displaystyle{ \left. \frac{\partial H}{\partial p} \right|_t }  \\ \\  \displaystyle{ \left. \frac{\partial V}{\partial T} \right|_P }  &=&  \displaystyle{ - \left. \frac{\partial S}{\partial P} \right|_T } & &  \displaystyle{ \left. \frac{\partial t}{\partial p} \right|_H }  &=&  \displaystyle{ - \left. \frac{\partial q}{\partial H} \right|_p }  \\ \\  \displaystyle{ \left. \frac{\partial V}{\partial S} \right|_P } &=&  \displaystyle{ \left. \frac{\partial T}{\partial P} \right|_S }  & &  \displaystyle{ \left. \frac{\partial p}{\partial H} \right|_q } &=&  \displaystyle{ \left. \frac{\partial t}{\partial q} \right|_H }  \end{array}

Part 1: Hamilton’s equations versus the Maxwell relations.

Part 2: the role of symplectic geometry.

Part 3: a detailed analogy between classical mechanics and thermodynamics.

Part 4: what is the analogue of quantization for thermodynamics?

Maxwell’s Relations (Part 3)

22 September, 2021

In Part 2 we saw a very efficient formulation of Maxwell’s relations, from which we can easily derive their usual form. Now let’s talk more about the meaning of the Maxwell relations—both their physical meaning and their mathematical meaning. For the physical meaning, I’ll draw again from Ritchie’s paper:

• David J. Ritchie, A simple method for deriving Maxwell’s relations, American Journal of Physics 36 (1958), 760–760.

but Emily Roach pointed out that much of this can also be found in the third chapter of Jaynes’ unpublished book:

• E. T. Jaynes, Thermodynamics, Chapter 3: Plausible reasoning.

First I’ll do the case of 2 dimensions, and then the case of n dimensions. In the 2d case I’ll talk like a physicist and use notation from thermodynamics. We’ll learn the Maxwell relations have these meanings, apart from their obvious ones:

• In any thermodynamic cycle, the heat absorbed by a system equals the work it does.

• In any thermodynamic cycle, energy is conserved.

• Any region in the surface of equilibrium states has the same area in pressure-volume coordinates as it does in temperature-entropy coordinates.

In the n-dimensional case I’ll use notation that mathematicians will like better, and also introduce the language of symplectic geometry. This will give a general, abstract statement of the Maxwell relations:

• The manifold of equilibrium states is a Lagrangian submanifold of a symplectic manifold.

Don’t worry—I’ll explain it!

Maxwell’s relations in 2 dimensions

Suppose we have a physical system whose internal energy U is a smooth function of its entropy S and volume V. So, we have a smooth function on the plane:

U \colon \mathbb{R}^2 \to \mathbb{R}

and we call the coordinates on this plane (S,V).

Next we introduce two more variables called temperature T and pressure P. So, we’ll give \mathbb{R}^4 coordinates (S,T,V,P). But, there’s a 2-dimensional surface where these extra variables are given by the usual formulas in thermodynamics:

\displaystyle{ \Lambda = \left\{(S,T,V,P) : \; T = \left.\frac{\partial U}{\partial S} \right|_V, \;   P = - \left. \frac{\partial U}{\partial V} \right|_S \right\}   \; \subset \mathbb{R}^4 }

We call this the surface of equilibrium states.

By the equations that T and P obey on the surface of equilibrium states, the following equation holds on this surface:

dU = T d S - P d V

So, if \gamma is any loop on this surface, we have

\displaystyle{ \oint_\gamma ( T d S - P d V) \; = \; \oint_\gamma \; dU \; = 0    }

where the total change in U is zero because the loop ends where it starts! We thus have

\displaystyle{ \oint_\gamma  T d S  = \oint_\gamma P d V }

In thermodynamics this has a nice meaning: the left side is the heat absorbed by a system as its state moves around the loop \gamma, while the right side is the work done by this system. So, the equation says the heat absorbed by a system as it carries out a cycle is equal to the work it does.

We’ll soon see that this equation contains, hidden within it, all four Maxwell relations. And it’s a statement of conservation of energy! By the way, it’s perfectly obvious that energy is conserved in a cycle: our loop takes us from a point where U has some value back to the same point, where it has the same value.

So how do we get Maxwell’s relations out of this blithering triviality? There’s a slow way and a fast way. Since the slow way provides extra insight I’ll do that first.

Suppose the loop \gamma bounds some 2-dimensional region R in the surface \Lambda. Then by Green’s theorem we have

\displaystyle{ \oint_\gamma  T d S  = \int_R dT \wedge dS }

That is, the heat absorbed in the cycle is just the area of the region R in (T,S) coordinates. But Green’s theorem also says

\displaystyle{ \oint_\gamma  P dV  = \int_R dP \wedge dV }

That is, the work done in the cycle, which is minus the left hand side, is minus the area of the region R in (P,V) coordinates!

That’s nice to know. But since we’ve seen these are equal,

\displaystyle{ \int_R dT \wedge dS =  \int_R dP \wedge dV  }

for every region R in the surface \Lambda.

Well, at least I’ve shown it for every region bounded by a loop! But every region can be chopped up into small regions that are bounded by loops, so the equation is really true for any region R in the surface \Lambda.

We express this fact by saying that dT \wedge dS equals dP \wedge dV when these 2-forms are restricted to the surface \Lambda. We write it like this:

\displaystyle{ \left. dT \wedge dS \right|_{\Lambda} = \left. dP \wedge dV \right|_{\Lambda} }

Now, last time we saw how to quickly get from this equation to all four Maxwell relations!

(Back then I didn’t write the |_{\Lambda} symbol because I was implicitly working on this surface \Lambda without telling you. More precisely, I was working on the plane \mathbb{R}^2, but we can identify the surface \Lambda with that plane using the (S,V) coordinates.)

So, given what we did last time, we are done! The equation

\displaystyle{ \left. dT \wedge dS \right|_{\Lambda} = \left. dP \wedge dV \right|_{\Lambda}}

expresses both conservation of energy and all four Maxwell equations—in a very compressed, beautiful form!

Maxwell’s relations in n dimensions

Now we can generalize everything to n dimensions. Suppose we have a smooth function

U \colon \mathbb{R}^n \to \mathbb{R}

Write the coordinates on \mathbb{R}^n as

(q^1, \dots, q^n)

and write the corresponding coordinates on the cotangent bundle T^\ast \mathbb{R}^n as

(q^1, \dots, q^n, p_1, \dots, p_n)

There is a submanifold of T^\ast \mathbb{R}^n where p_i equals the partial derivative of U with respect to q^i:

\displaystyle{ p_i = \frac{\partial U}{\partial q^i} }

Let’s call this submanifold \Lambda since it’s the same one we’ve seen before (except for that annoying little minus sign in the definition of pressure):

\displaystyle{ \Lambda = \left\{(q,p) \in T^\ast \mathbb{R}^n : \;  p_i = \frac{\partial U}{\partial q^i} \right\} }

In applications to thermodynamics, this is the manifold of equilibrium states. But we’re doing math here, and this math has many applications. It’s this generality that makes the subject especially interesting to me.

Now, U started out life as a function on \mathbb{R}^n, but it lifts to a function on T^\ast \mathbb{R}^n, and we have

\displaystyle{ dU  = \sum_i \frac{\partial U}{\partial q^i} dq^i }

By the definition of \Lambda we have

\displaystyle{  \left. \frac{\partial U}{\partial q^i} \right|_\Lambda =  \left. p_i \right|_\Lambda }


\displaystyle{  \left. dU \right|_\Lambda  = \left. \sum_i p_i dq^i \right|_\Lambda }

The 1-form on the right-hand side here:

\displaystyle{ \theta = \sum_i p_i \, dq^i }

is called the tautological 1-form on T^\ast \mathbb{R}^n. Its exterior derivative

\displaystyle{ \omega = d\theta = \sum_i dp_i \wedge dq^i }

is called the symplectic structure on this cotangent bundle. Both of these are a big deal in classical mechanics, but here we are seeing them in thermodynamics! And the point of all this stuff is that we’ve seen

\displaystyle{ \left. dU \right|_\Lambda  = \left. \theta \right|_\Lambda }

Taking d of both sides and using d^2 = 0, we get

\displaystyle{ \left. d\theta \right|_\Lambda = 0 }

or in other words

\displaystyle{ \left. \omega \right|_\Lambda = 0 }

And this is a very distilled statement of Maxwell’s relations!

Why? Well, in the 2d case we discussed earlier, the tautological 1-form is

\theta = T dS - P dV

thanks to the annoying minus sign in the definition of pressure. Thus, the symplectic structure is

\omega = dT \wedge dS - dP \wedge dV

and the fact that the symplectic structure \omega vanishes when restricted to \Lambda is just our old friend

\displaystyle{ \left. dT \wedge dS \right|_{\Lambda} = \left. dP \wedge dV \right|_{\Lambda} }

As we’ve seen, this equation contains all of Maxwell’s relations.

In the n-dimensional case, \Lambda is an n-dimensional submanifold of the 2n-dimensional symplectic manifold T^\ast M. In general, if an n-dimensional submanifold S of a 2n-dimensional symplectic manifold has the property that the symplectic structure vanishes when restricted to S, we call S a Lagrangian submanifold.

So, one efficient abstract statement of Maxwell’s relations is:

The manifold of equilibrium states is a Lagrangian submanifold.

It takes a while to see the value of this statement, and I won’t try to explain it here. Instead, read Weinstein’s introduction to symplectic geometry:

• Alan Weinstein, Symplectic geometry, Bulletin of the American Mathematical Society 5 (1981), 1–13.

You’ll see here an introduction to Lagrangian submanifolds and an explanation of the “symplectic creed”:

— Alan Weinstein

The restatement of Maxwell’s relations in terms of Lagrangian submanifolds is just another piece of evidence for this!

Part 1: a proof of Maxwell’s relations using commuting partial derivatives.

Part 2: a proof of Maxwell’s relations using 2-forms.

Part 3: the physical meaning of Maxwell’s relations, and their formulation in terms of symplectic geometry.

For how Maxwell’s relations are connected to Hamilton’s equations, see this post:

Classical mechanics versus thermodynamics.

Maxwell’s Relations (Part 2)

18 September, 2021

The Maxwell relations are some very general identities about the partial derivatives of functions of several variables. You don’t need to understand anything about thermodyamics to understand them, but they’re used a lot in that subject, so discussions of them tend to use notation from that subject.

Last time I went through a standard way to derive these relations for a function of two variables. Now I want to give a better derivation, which I found here:

• David J. Ritchie, A simple method for deriving Maxwell’s relations, American Journal of Physics 36 (1958), 760–760.

This paper is just one page long, and I can’t really improve on it, but I’ll work through the ideas in more detail. It again covers only the case of a function of two variables, and I won’t try to go beyond that case now—maybe later.

So, remember the setup. We have a smooth function on the plane

U \colon \mathbb{R}^2 \to \mathbb{R}

We call the coordinates on the plane S and V, and we give the partial derivatives of U funny names:

\displaystyle{ T =  \left.\frac{\partial U}{\partial S} \right|_V  \qquad    P = - \left. \frac{\partial U}{\partial V} \right|_S }

None of these funny names or the minus sign has any effect on the actual math involved; they’re just traditional in thermodynamics. So, mathematicians, please forgive me! If I ever generalize to the n-variable case, I’ll switch to more reasonable notation.

We instantly get this:

dU = T dS - P dV

and since d^2 = 0 we get

0 = d^2 U = dT \wedge dS - dP \wedge dV


dT \wedge dS = dP \wedge dV

Believe it or not, this simple relation contains all four of Maxwell’s relations within it!

To see this, note that both sides are smooth 2-forms on the plane. Now, the space of 2-forms at any one point of the plane is a 1-dimensional vector space. So, we can divide any 2-form at a point by any nonzero 2-form at that point and get a real number.

In particular, suppose X and Y are functions on the plane such that dX \wedge dY \ne 0 at some point. Then we can divide both sides of the above equation by dX \wedge dY and get

\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }

at this point. We can now get the four Maxwell relations simply by making different choices of X and Y. We’ll choose them to be either T,S,V or P. The argument will only work if dX \wedge dY \ne 0, so I’ll always assume that. The argument works the same way each time so I’ll go faster after the first time.

The first relation

Take X = V and Y = S and substitute them into the above equation. We get

\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; \frac{dP \wedge dV}{dV \wedge dS} }


\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; - \frac{dP \wedge dV}{dS \wedge dV} }

Next we use a little fact about differential forms and partial derivatives to simplify both sides:

\displaystyle{ \frac{dT \wedge dS}{dV \wedge dS} \; = \; \left.\frac{\partial T}{\partial V} \right|_S }

and similarly

\displaystyle{  \frac{dP \wedge dV}{dS \wedge dV} \; = \; \left. \frac{\partial P}{\partial S}\right|_V }

If you were scarred as a youth when plausible-looking manipulations with partial derivatives turned out to be unjustified, you might be worried about this—and rightly so! Later I’ll show how to justify the kind of ‘cancellation’ we’re doing here. But anyway, it instantly gives us the first Maxwell relation:

\boxed{ \displaystyle{  \left. \frac{\partial T}{\partial V} \right|_S \; = \; - \left. \frac{\partial P}{\partial S} \right|_V } }

The second relation

This time we take X = T, Y = V. Substituting this into our general formula

\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }

we get

\displaystyle{ \frac{dT \wedge dS}{dT \wedge dV} \; = \; \frac{dP \wedge dV}{dT \wedge dV} }

and doing the same sort of ‘cancellation’ as last time, this gives the second Maxwell relation:

\boxed{ \displaystyle{ \left. \frac{\partial S}{\partial V} \right|_T \; = \;  \left. \frac{\partial P}{\partial T} \right|_V } }

The third relation

This time we take X = P, Y = S. Substituting this into our general formula

\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }

we get

\displaystyle{ \frac{dT \wedge dS}{dP \wedge dS} \; = \; \frac{dP \wedge dV}{dP \wedge dS} }

which gives the third Maxwell relation:

\boxed{ \displaystyle{ \left. \frac{\partial T}{\partial P} \right|_S  \; = \;  \left. \frac{\partial V}{\partial S} \right|_P }}

The fourth relation

This time we take X = P, Y = T. Substituting this into our general formula

\displaystyle{ \frac{dT \wedge dS}{dX \wedge dY} \; = \; \frac{dP \wedge dV}{dX \wedge dY} }

we get

\displaystyle{ \frac{dT \wedge dS}{dP \wedge dT} \; = \; \frac{dP \wedge dV}{dP \wedge dT} }


\displaystyle{ -\frac{dS \wedge dT}{dP \wedge dT} \; = \; \frac{dP \wedge dV}{dP \wedge dT} }

giving the fourth Maxwell relation:

\boxed{ \displaystyle{ \left. \frac{\partial V}{\partial T} \right|_P \; = \; - \left. \frac{\partial S}{\partial P} \right|_T }}

You can check that other choices of X and Y don’t give additional relations of the same form.


So, we’ve see that all four Maxwell relations follow quickly from the equation

dT \wedge dS = dP \wedge dV

if we can do ‘cancellations’ in expressions like this:

\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} }

when one of the functions A,B \colon \mathbb{R}^2 \to \mathbb{R} equals one of the functions X,Y \colon \mathbb{R}^2 \to \mathbb{R}. This works whenever dX \wedge dY \ne 0. Let’s see why!

First of all, by the inverse function theorem, if dX \wedge dY \ne 0 at some point in the plane, the functions X and Y serve as coordinates in some neighborhood of that point. In this case we have

\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} = \det \left(  \begin{array}{cc}  \displaystyle{ \frac{\partial A}{\partial X} } & \displaystyle{ \frac{\partial A}{\partial Y} } \\ \\  \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} }  \end{array}  \right) }

Yes: the ratio of 2-forms is just the Jacobian of the map sending (X,Y) to (A,B). This is clear if you know that 2-forms are ‘area elements’ and the Jacobian is a ratio of area elements. But you can also prove it by a quick calculation:

\displaystyle{ dA = \frac{\partial A}{\partial X} dX +  \frac{\partial A}{\partial Y} dY }

\displaystyle{ dB = \frac{\partial B}{\partial X} dX +  \frac{\partial B}{\partial Y} dY }

and thus

\displaystyle{  dA \wedge dB = \left(\frac{\partial A}{\partial X} \frac{\partial B}{\partial Y} - \frac{\partial A}{\partial Y} \frac{\partial B}{\partial X} \right) dX \wedge dY}

so the ratio dA \wedge dB/dX \wedge dY is the desired determinant.

How does this help? Well, take

\displaystyle{ \frac{dA \wedge dB}{dX \wedge dY} }

and now suppose that either A or B equals either X or Y. For example, suppose A = X. Then we can do a ‘cancellation’ like this:

\begin{array}{ccl}  \displaystyle{ \frac{dX \wedge dB}{dX \wedge dY} } &=& \det \left(  \begin{array}{cc}  \displaystyle{ \frac{\partial X}{\partial X} } & \displaystyle{ \frac{\partial X}{\partial Y} } \\ \\  \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} }  \end{array} \right) \\  \\  &=& \det \left(  \begin{array}{cc}  \displaystyle{ 1 } & \displaystyle{ 0 } \\ \\  \displaystyle{ \frac{\partial B}{\partial X} } & \displaystyle{\frac{\partial B}{\partial Y} }  \end{array} \right) \\  \\  &=& \displaystyle{\frac{\partial B}{\partial Y} }  \end{array}

or to make it clear that the partial derivatives are being done in the X,Y coordinate system:

\displaystyle{ \frac{dX \wedge dB}{dX \wedge dY} = \left.\frac{\partial B}{\partial Y} \right|_X }

This justifies all our calculations earlier.


So, we’ve seen that all four Maxwell relations are unified in a much simpler equation:

dT \wedge dS = dP \wedge dV

which follows instantly from

dU = T dS - P dV

This is a big step forward compared to the proof I gave last time, which, at least as I presented it, required cleverly guessing a bunch of auxiliary functions—even though these auxiliary functions turn out to be incredibly important in their own right.

So, we should not stop here: we should think hard about the physical and mathematical meaning of the equation

dT \wedge dS = dP \wedge dV

And Ritchie does this in his paper! But I will talk about that next time.

Part 1: a proof of Maxwell’s relations using commuting partial derivatives.

Part 2: a proof of Maxwell’s relations using 2-forms.

Part 3: the physical meaning of Maxwell’s relations, and their formulation in terms of symplectic geometry.

For how Maxwell’s relations are connected to Hamilton’s equations, see this post:

Classical mechanics versus thermodynamics.

Maxwell’s Relations (Part 1)

17 September, 2021

The Maxwell relations are equations that show up whenever we have a smooth convex function

U \colon \mathbb{R}^n \to \mathbb{R}

They say that the mixed partial derivatives of U commute, but also that the mixed partial derivatives of various functions constructed from U commute. So in a sense they are completely trivial except for the way we construct these various functions! Nonetheless they are very important in physics, because they give generally valid relations between the derivatives of thermodynamically interesting quantities.

For example, here is one of the Maxwell relations that people often use when studying a thermodynamic system like a cylinder of gas or a beaker of liquid:

\displaystyle{  \left. \frac{\partial T}{\partial V} \right|_S \; = \; - \left. \frac{\partial P}{\partial S} \right|_V }


S is the entropy of the system
T is its temperature
V is its volume
P is its pressure

There are also three more Maxwell relations involving these quantities. They all have the same general look, though half contain minus signs and half don’t. So, they’re quite tiresome to memorize. It’s more interesting to figure out what’s really going on here!

I already gave one story about what’s going on: we start with a function U, which happens in this case to be a function of S and V called the internal energy of our system. If we say its mixed partial derivatives commute:

\displaystyle{ \frac{\partial^2 U}{\partial V \partial S} = \frac{\partial^2 U}{\partial S \partial V}   }

we get the Maxwell relation I wrote above, simply by using the standard definitions of T and P as partial derivatives of U. We can then build a bunch of other functions from U, S, T, V and P, and write down other equations saying that the mixed partial derivatives of these functions commute. This gives us all the Maxwell relations.

Let me show you in detail how this works, but without explaining why I’m choosing the particular ‘bunch of other functions’ that I’ll use. There is a way to explain that using the concept of Legendre transform. But next time I’ll give a very different approach to the Maxwell relations, based on this paper:

• David J. Ritchie, A simple method for deriving Maxwell’s relations, American Journal of Physics 36 (1958), 760–760.

This sidesteps the whole business of choosing a ‘bunch of other functions’, and I think it’s very nice.

What follows is a bit mind-numbing, so let me tell you what to pay attention to. I’ll do the same general sort of calculation four times, first starting with any convex smooth function U \colon \mathbb{R}^2 \to \mathbb{R} and then with three other functions built from that one. The only clever part is how we choose those other functions.

The first relation

We start with any smooth convex function U of two real variables S, V and write its differential as follows:

dU = T dS - P d V

This says

\displaystyle{ T =  \left.\frac{\partial U}{\partial S} \right|_V  \qquad    P = - \left. \frac{\partial U}{\partial V} \right|_S }

These are the usual definitions of the temperature T and pressure P in terms of the internal energy U, but you don’t need to know anything about those concepts to follow all the calculations to come. In particular, from the pure mathematical viewpoint the minus sign is merely a stupid convention.

The commuting of mixed partials implies

\displaystyle{  \left. \frac{\partial T}{\partial V} \right|_S \; = \;  \left. \frac{\partial}{\partial V} \right|_S \left. \frac{\partial }{\partial S} \right|_V U \; = \;  \left. \frac{\partial }{\partial S} \right|_V \left. \frac{\partial}{\partial V} \right|_S  U \; = \;  - \left. \frac{\partial P}{\partial S} \right|_V }

giving the first of the Maxwell relations:

\boxed{ \displaystyle{  \left. \frac{\partial T}{\partial V} \right|_S \; = \; - \left. \frac{\partial P}{\partial S} \right|_V } }

The second relation

Next, let’s define the Helmholtz free energy

F = U - TS

Taking its differential we get

\begin{array}{ccl}  dF &=& dU - d(TS) \\ \\ &=& (T dS - P dV) - (SdT + TdS) \\ \\  &=& - S dT - P d V  \end{array}

so if we think of F as a function of T, V we get

\displaystyle{ S =  - \left.\frac{\partial F}{\partial T} \right|_V  \qquad    P = - \left. \frac{\partial F}{\partial V} \right|_T }

The commuting of mixed partials implies

\displaystyle{  \left. - \frac{\partial S}{\partial V} \right|_T \; = \;  \left. \frac{\partial}{\partial V} \right|_T \left. \frac{\partial }{\partial T} \right|_V F \; = \;  \left. \frac{\partial }{\partial T} \right|_V \left. \frac{\partial}{\partial V} \right|_T  F \; = \;  - \left. \frac{\partial P}{\partial T} \right|_V }

giving the second of the Maxwell relations:

\boxed{ \displaystyle{ \left. \frac{\partial S}{\partial V} \right|_T \; = \;  \left. \frac{\partial P}{\partial T} \right|_V } }

The third relation

Copying what we did to get the second relation, let’s define the enthalpy

H = U + P V

Taking its differential we get

\begin{array}{ccl}  dH &=& dU + d(PV) \\ \\ &=& (T dS - P dV) + (VdP + P dV) \\ \\  &=& T dS + V dP  \end{array}

so if we think of H as a function of S, P we get

\displaystyle{ T =  \left.\frac{\partial H}{\partial S} \right|_P  \qquad  V = \left. \frac{\partial H}{\partial P} \right|_S }

The commuting of mixed partials implies

\displaystyle{  \left. \frac{\partial T}{\partial P} \right|_S \; = \;  \left. \frac{\partial}{\partial P} \right|_S \left. \frac{\partial }{\partial S} \right|_P H \; = \;  \left. \frac{\partial }{\partial S} \right|_P \left. \frac{\partial}{\partial P} \right|_S  H \; = \;  \left. \frac{\partial V}{\partial S} \right|_P }

giving the third of the Maxwell relations:

\boxed{ \displaystyle{ \left. \frac{\partial T}{\partial P} \right|_S  \; = \;  \left. \frac{\partial V}{\partial S} \right|_P }}

The fourth relation

Combining the last two tricks, let’s define the Gibbs free energy

G = U + PV - TS

Taking its differential we get

\begin{array}{ccl}  dG &=& dU + d(PV) - d(TS) \\ \\ &=& (T dS - P dV) + (VdP + P dV) - (SdT + TdS) \\ \\  &=& V d P - S dT  \end{array}

so if we think of G as a function of P, T we get

\displaystyle{ V =  \left.\frac{\partial G}{\partial P} \right|_T  \qquad S = -\left. \frac{\partial G}{\partial T} \right|_P }

The commuting of mixed partials implies

\displaystyle{  \left. \frac{\partial V}{\partial T} \right|_P \; = \;  \left. \frac{\partial}{\partial T} \right|_P \left. \frac{\partial }{\partial P} \right|_T G \; = \;  \left. \frac{\partial }{\partial P} \right|_T \left. \frac{\partial}{\partial T} \right|_P  G \; = \;  - \left. \frac{\partial S}{\partial P} \right|_T }

giving the fourth and final Maxwell relation:

\boxed{ \displaystyle{ \left. \frac{\partial V}{\partial T} \right|_P \; = \; - \left. \frac{\partial S}{\partial P} \right|_T }}


I hope you’re a bit unsatisfied, for two main reasons.

The first question you should have is this: why did we chose these four functions to work with:

U, \; U - ST, \; U + PV, \; U - ST + PV

The pattern of signs is not really significant here: if we hadn’t followed tradition and stuck a minus sign in our definition of P here:

\displaystyle{ T =  \left.\frac{\partial U}{\partial S} \right|_V  \qquad    P = - \left. \frac{\partial U}{\partial V} \right|_S }

everything would look more systematic, and we’d use these four functions:

U, \; U - ST, \; U - PV, \; U - ST - PV

This would make it easier to guess how everything works if instead of a function U \colon \mathbb{R}^2 \to \mathbb{R} we started with a function U \colon \mathbb{R}^n \to \mathbb{R}. We could write this as

U(q^1, \dots, q^n)

and define a bunch of functions

\displaystyle{ p_i = \frac{\partial U}{\partial q^i} }

called ‘conjugate quantities’.

Then, we could get 2^n different functions by starting with U and subtracting off products of the form p_i q^i where i ranges over some subset of \{1, \dots, n\}.

Then we could take mixed partial derivatives of these functions, note that the mixed partial derivatives commute, and get a bunch of Maxwell relations — maybe

\displaystyle{ 2^n \frac{n(n-1)}{2} }

of them! (Sanity check: when n is 2 this is indeed 4.)

The second question you should have is: how did I sneakily switch from thinking of U as a function of S and V to thinking of F as a function of T and V, and so on? How could I get away with this? I believe the answer to this involves the concept of Legendre transform, which works well when U is convex.

Answering these questions well might get us into a bit of contact geometry. That would be nice. But instead of answering these questions next time, I’ll talk about Ritchie’s approach to deriving Maxwell’s equations, which seems to sidestep the two questions I just raised!

Just for fun

Finally: students of thermodynamics are often forced to memorize the Maxwell relations. They sometimes use the thermodynamic square:

where for some idiotic reason the p for pressure is written lower-case—perhaps so you can mix it up with momentum!

If you like puzzles, maybe you can figure out how the thermodynamic square works if you stare at it along with the four Maxwell equations I derived:

\displaystyle{  \left. \frac{\partial T}{\partial V} \right|_S \; \phantom{-} = \; - \left. \frac{\partial P}{\partial S} \right|_V }

\displaystyle{ \left. \frac{\partial S}{\partial V} \right|_T \; \phantom{-} = \; \phantom{-} \left. \frac{\partial P}{\partial T} \right|_V }

\displaystyle{ \left. \frac{\partial T}{\partial P} \right|_S  \; \phantom{-} = \; \phantom{-} \left. \frac{\partial V}{\partial S} \right|_P }

\displaystyle{ \left. \frac{\partial V}{\partial T} \right|_P \; \phantom{-} = \; - \left. \frac{\partial S}{\partial P} \right|_T }

Again, it probably be easier if we hadn’t stuck a minus sign into the definition of pressure. If you get stuck, click on the link.

Unfortunately the thermodynamic square is itself so hard to memorize that students resort to a mnemonic for that! Sometimes they say “Good Physicists Have Studied Under Very Fine Teachers” which gives the letters “GPHSUVFT” that you see as you go around the square.

I will never try to remember any of these mnemonics. But I wonder if there’s something deeper going on here. So:

Puzzle. How does the thermodynamic square generalize if U is a function of 3 variables instead of 2? How about more variables?

Part 1: a proof of Maxwell’s relations using commuting partial derivatives.

Part 2: a proof of Maxwell’s relations using 2-forms.

Part 3: the physical meaning of Maxwell’s relations, and their formulation in terms of symplectic geometry.

For how Maxwell’s relations are connected to Hamilton’s equations, see this post:

Classical mechanics versus thermodynamics.

The Cyclic Identity for Partial Derivatives

13 September, 2021


As an undergrad I learned a lot about partial derivatives in physics classes. But they told us rules as needed, without proving them. This rule completely freaked me out. If derivatives are kinda like fractions, shouldn’t this equal 1?

Let me show you why it’s -1.

First, consider an example:

This example shows that the identity is not crazy. But in fact it
holds the key to the general proof! Since (u,v) is a coordinate system we can assume without loss of generality that u = x, v = y. At any point we can approximate w to first order as ax + by + c for some constants a,b,c. But for derivatives the constant c doesn’t matter, so we can assume it’s zero.

Then just compute!

There’s also a proof using differential forms that you might like
better. You can see it here, along with an application to

But this still leaves us yearning for more intuition — and for me, at least, a more symmetrical, conceptual proof. Over on Twitter, someone named
Postdoc/cake provided some intuition using the same example from thermodynamics:

Using physics intuition to get the minus sign:

  1. increasing temperature at const volume = more pressure (gas pushes out more)
  2. increasing temperature at const pressure = increasing volume (ditto)
  3. increasing pressure at const temperature = decreasing volume (you push in more)

Jules Jacobs gave the symmetrical, conceptual proof that I was dreaming of:

This proof is more sophisticated than my simple argument, but it’s very pretty, and it generalizes to higher dimensions in ways that’d be hard to guess otherwise.

He uses some tricks that deserve explanation. As I’d hoped, the minus signs come from the anticommutativity of the wedge product of 1-forms, e.g.

du \wedge dv = - dv \wedge du

But what lets us divide quantities like this? Remember, u, v, w are all functions on the plane, so du, dv, dw are 1-forms on the plane. And since the space of 2-forms at a point in the plane is 1-dimensional, we can divide them! After all, given two vectors in a 1d vector space, the first is always some number times the second, as long as the second is nonzero. So we can define their ratio to be that number.

For Jacobs’ argument, we also need that these ratios obey rules like

\displaystyle{ \frac{\alpha}{\beta} \cdot \frac{\gamma}{\delta} = \frac{\alpha}{\delta} \cdot \frac{\gamma}{\beta}  }

But this is easy to check: whenever \alpha, \beta, \gamma, \delta are vectors in a 1-dimensional vector space, division obeys the above rule. To put it in fancy language, nonzero vectors in any 1-dimensional real vector space form a ‘torsor’ for the multiplicative group of nonzero real numbers:

• John Baez, Torsors made easy.

Finally, Jules used this sort of fact:

\displaystyle{ \frac{du \wedge dw}{dv \wedge dw} = \left. \frac{\partial u}{\partial v} \right|_w }

Actually he forgot to write down this particular equation at the top of his short proof—but he wrote down three others of the same form, and they all work the same way. Why are they true?

I claim this equation is true at some point of the plane whenever u, v, w are smooth functions on the plane and dv \wedge dw doesn’t vanish at that point. Let’s see why!

First of all, by the inverse function theorem, if dv \wedge dw \ne 0 at some point in the plane, the functions v and w serve as coordinates in some neighborhood of that point. In this case we can work out du in terms of these coordinates, and we get

\displaystyle{ du = \frac{\partial u}{\partial v} dv + \frac{\partial u}{\partial w} dw }

or more precisely

\displaystyle{ du = \left.\frac{\partial u}{\partial v}\right|_w dv +   \left. \frac{\partial u}{\partial w} \right|_v dw }


\begin{array}{ccl}  du \wedge dw &=&  \displaystyle{ \left(\left.\frac{\partial u}{\partial v}\right|_w dv +   \left. \frac{\partial u}{\partial w} \right|_v dw\right) \wedge dw } \\ \\  &=& \displaystyle{ \left.\frac{\partial u}{\partial v}\right|_w dv \wedge dw }   \end{array}


\displaystyle{ \frac{du \wedge dw}{dv \wedge dw} = \left. \frac{\partial u}{\partial v} \right|_w }

as desired!