Last time I sketched how two related forms of geometry, symplectic and contact geometry, show up in thermodynamics. Today I want to explain how they show up in probability theory.
For some reason I haven’t seen much discussion of this! But people should have looked into this. After all, statistical mechanics explains thermodynamics in terms of probability theory, so if some mathematical structure shows up in thermodynamics it should appear in statistical mechanics… and thus ultimately in probability theory.
I just figured out how this works for symplectic and contact geometry.
Suppose a system has possible states. We’ll call these microstates, following the tradition in statistical mechanics. If you don’t know what ‘microstate’ means, don’t worry about it! But the rough idea is that if you have a macroscopic system like a rock, the precise details of what its atoms are doing are described by a microstate, and many different microstates could be indistinguishable unless you look very carefully.
We’ll call the microstates So, if you don’t want to think about physics, when I say microstate I’ll just mean an integer from 1 to n.
Next, a probability distribution assigns a real number to each microstate, and these numbers must sum to 1 and be nonnegative. So, we have though not every vector in is a probability distribution.
I’m sure you’re wondering why I’m using rather than to stand for an observable instead of a probability distribution. Am I just trying to confuse you?
No: I’m trying to set up an analogy to physics!
Last time I introduced symplectic geometry using classical mechanics. The most important example of a symplectic manifold is the cotangent bundle of a manifold A point of is a pair consisting of a point and a cotangent vector In classical mechanics the point describes the position of some physical system, while describes its momentum.
So, I’m going to set up an analogy like this:
Classical Mechanics | Probability Theory | |
position | probability distribution | |
momentum | ??? |
But what is to momentum as probability is to position?
A big clue is the appearance of symplectic geometry in thermodynamics, which I also outlined last time. We can use this to get some intuition about the analogue of momentum in probability theory.
In thermodynamics, a system has a manifold of states. (These are not the ‘microstates’ I mentioned before: we’ll see the relation later.) There is a function
describing the entropy of the system as a function of its state. There is a law of thermodynamics saying that
This equation picks out a submanifold of namely
Moreover this submanifold is Lagrangian: the symplectic structure vanishes when restricted to it:
This is very beautiful, but it goes by so fast you might almost miss it! So let’s clutter it up a bit with coordinates. We often use local coordinates on and describe a point using these coordinates, getting a point
They give rise to local coordinates on the cotangent bundle The are called extensive variables, because they are typically things that you can measure only by totalling up something over the whole system, like the energy or volume of a cylinder of gas. The are called intensive variables, because they are typically things that you can measure locally at any point, like temperature or pressure.
In these local coordinates, the symplectic structure on is the 2-form given by
The equation
serves as a law of physics that determines the intensive variables given the extensive ones when our system is in thermodynamic equilibrium. Written out using coordinates, this law says
It looks pretty bland here, but in fact it gives formulas for the temperature and pressure of a gas, and many other useful formulas in thermodynamics.
Now we are ready to see how all this plays out in probability theory! We’ll get an analogy like this, which goes hand-in-hand with our earlier one:
Thermodynamics | Probability Theory | |
extensive variables | probability distribution | |
intensive variables | ??? |
This analogy is clearer than the last, because statistical mechanics reveals that the extensive variables in thermodynamics are really just summaries of probability distributions on microstates. Furthermore, both thermodynamics and probability theory have a concept of entropy.
So, let’s take our manifold to consist of probability distributions on the set of microstates I was talking about before: the set Actually, let’s use nowhere vanishing probability distributions:
I’m requiring to ensure is a manifold, and also to make sure is differentiable: it ceases to be differentiable when one of the probabilities hits zero.
Since is a manifold, its cotangent bundle is a symplectic manifold And here’s the good news: we have a god-given entropy function
namely the Shannon entropy
So, everything I just described about thermodynamics works in the setting of plain old probability theory! Starting from our manifold and the entropy function, we get all the rest, leading up to the Lagrangian submanifold
that describes the relation between extensive and intensive variables.
For computations it helps to pick coordinates on Since the probabilities sum to 1, they aren’t independent coordinates on So, we can either pick all but one of them as coordinates, or learn how to deal with non-independent coordinates, which are already completely standard in projective geometry. Let’s do the former, just to keep things simple.
These coordinates on give rise in the usual way to coordinates and on the cotangent bundle These play the role of extensive and intensive variables, respectively, and it should be very interesting to impose the equation
where is the Shannon entropy. This picks out a Lagrangian submanifold
So, the question becomes: what does this mean? If this formula gives the analogue of momentum for probability theory, what does this analogue of momentum mean?
Here’s a preliminary answer: says how fast entropy increases as we increase the probability that our system is in the ith microstate. So if we think of nature as ‘wanting’ to maximize entropy, the quantity says how eager it is to increase the probability
Indeed, you can think of as a bit like pressure—one of the most famous intensive quantities in thermodynamics. A gas ‘wants’ to expand, and its pressure says precisely how eager it is to expand. Similarly, a probability distribution ‘wants’ to flatten out, to maximize entropy, and says how eager it is to increase the probability in order to do this.
But what can we do with this concept? And what does symplectic geometry do for probability theory?
I will start tackling these questions next time.
One thing I’ll show is that when we reduce thermodynamics to probability theory using the ideas of statistical mechanics, the appearance of symplectic geometry in thermodynamics follows from its appearance in probability theory.
Another thing I want to investigate is how other geometrical structures on the space of probability distributions, like the Fisher information metric, interact with the symplectic structure on its cotangent bundle. This will integrate symplectic geometry and information geometry.
I also want to bring contact geometry into the picture. It’s already easy to see from our work last time how this should go. We treat the entropy as an independent variable, and replace with a larger manifold having as an extra coordinate. This is a contact manifold with contact form
This contact manifold has a submanifold where we remember that entropy is a function of the probability distribution and define in terms of as usual:
And as we saw last time, is a Legendrian submanifold, meaning
But again, we want to understand what these ideas from contact geometry really do for probability theory!
For all my old posts on information geometry, go here:
It seems to me that if we want to know what to call , then we should calculate it:
Now, is often called the surprisal; it tells you how surprised you should be if an event of probability occurs (from no surprise if the event is certain to infinite surprise if the event is impossible). For example, the entropy is the expected surprisal. And so is basically the surprisal of microstate , only we subtract (the surprisal associated with a probability of ) for some reason.
But actually, there’s a flaw in my calculation, because I forgot that there are only independent variables, so I need to add on , where , so that
Therefore, the correct value of is , the relative surprisal of microstate relative to microstate (the state whose probability we arbitrarily chose not to include as an independent variable). At least the mysterious s cancelled.
Great post and a fascinating topic! I did my thesis on symplectic integrators ( the same type that power Hamiltonian Monte Carlo), and the connection with information geometry is really intriguing. My advisor at UCSD had one related paper that attempted to connect symplectic and information geometry in a discrete setting by connecting divergence functions and a generating function.
https://www.mdpi.com/1099-4300/19/10/518