Last time we worked out an analogy between classical mechanics, thermodynamics and probability theory. The latter two look suspiciously similar:
|Classical Mechanics||Thermodynamics||Probability Theory|
This is no coincidence. After all, in the subject of statistical mechanics we explain classical thermodynamics using probability theory—and entropy is revealed to be Shannon entropy (or its quantum analogue).
Now I want to make this precise.
To connect classical thermodynamics to probability theory, I’ll start by discussing ‘statistical manifolds’. I introduced the idea of a statistical manifold in Part 7: it’s a manifold equipped with a map sending each point to a probability distribution on some measure space Now I’ll say how these fit into the second column of the above chart.
Then I’ll talk about statistical manifolds of a special sort used in thermodynamics, which I’ll call ‘Gibbsian’, since they really go back to Josiah Willard Gibbs.
In a Gibbsian statistical manifold, for each the probability distribution is a ‘Gibbs distribution’. Physically, these Gibbs distributions describe thermodynamic equilibria. For example, if you specify the volume, energy and number of particles in a box of gas, there will be a Gibbs distribution describing what the particles do in thermodynamic equilibrium under these conditions. Mathematically, Gibbs distributions maximize entropy subject to some constraints specified by the point
More precisely: in a Gibbsian statistical manifold we have a list of observables whose expected values serve as coordinates for points and is the probability distribution that maximizes entropy subject to the constraint that the expected value of is We can derive most of the interesting formulas of thermodynamics starting from this!
Let’s fix a measure space with measure A statistical manifold is then a manifold equipped with a smooth map assigning to each point a probability distribution on which I’ll call So, is a function on with
The idea here is that the space of all probability distributions on may be too huge to understand in as much detail as we’d like, so instead we describe some of these probability distributions—a family parametrized by points of some manifold —using the map This is the basic idea behind parametric statistics.
Information geometry is the geometry of statistical manifolds. Any statistical manifold comes with a bunch of interesting geometrical structures. One is the ‘Fisher information metric’, a Riemannian metric I explained in Part 7. Another is a 1-parameter family of connections on the tangent bundle which is important in Amari’s approach to information geometry. You can read about this here:
• Hiroshi Matsuzoe, Statistical manifolds and affine differential geometry, in Advanced Studies in Pure Mathematics 57, pp. 303–321.
I don’t want to talk about it now—I just wanted to reassure you that I’m not completely ignorant of it!
I want to focus on the story I’ve been telling, which is about entropy. Our statistical manifold comes with a smooth entropy function
We can use this entropy function to do many of the things we usually do in thermodynamics! For example, at any point where this function is differentiable, its differential gives a cotangent vector
which has an important physical meaning. In coordinates we have
and we call the intensive variable conjugate to For example if is energy, will be ‘coolness’: the reciprocal of temperature.
Defining this way gives a Lagrangian submanifold
of the cotangent bundle We can also get contact geometry into the game by defining a contact manifold and a Legendrian submanifold
But I’ve been talking about these ideas for the last three episodes, so I won’t say more just now! Instead, I want to throw a new idea into the pot.
Gibbsian statistical manifolds
Thermodynamics, and statistical mechanics, spend a lot of time dealing with statistical manifold of a special sort I’ll call ‘Gibbsian’. In these, each probability distribution is a ‘Gibbs distribution’, meaning that it maximizes entropy subject to certain constraints specified by the point
How does this work? For starters, an integrable function
is called a random variable, or in physics an observable. The expected value of an observable is a smooth real-valued function on our statistical manifold
In other words, is a function whose value at at any point is the expected value of with respect to the probability distribution
Now, suppose our statistical manifold is n-dimensional and we have n observables Their expected values will be smooth functions on our manifold—and sometimes these functions will be a coordinate system!
This may sound rather unlikely, but it’s really not so outlandish. Indeed, if there’s a point such that the differentials of the functions are linearly independent at this point, these functions will be a coordinate system in some neighborhood of this point, by the inverse function theorem. So, we can take this neighborhood, use it as our statistical manifold, and the functions will be coordinates.
So, let’s assume the expected values of our observables give a coordinate system on Let’s call these coordinates so that
Now for the kicker: we say our statistical manifold is Gibbsian if for each point is the probability distribution that maximizes entropy subject to the above condition!
Which condition? The condition saying that
for all i. This is just the previous equation spelled out so that you can see it’s a condition on
This assumption of the entropy-maximizing nature of is a very powerful, because it implies a useful and nontrivial formula for It’s called the Gibbs distribution:
Here is the intensive variable conjugate to while is the partition function: the thing we must divide by to make sure integrates to 1. In other words:
By the way, this formula may look confusing at first, since the left side depends on the point in our statistical manifold, while there’s no visible in the right side! Do you see what’s going on?
I’ll tell you: the conjugate variable sitting on the right side of the above formula, depends on Remember, we got it by taking the partial derivative of the entropy in the direction
and the evaluating this derivative at the point
But wait a minute! here is the entropy—but the entropy of what?
The entropy of of course!
So there’s something circular about our formula for To know you need to know the conjugate variables but to compute these you need to know the entropy of
This is actually okay. While circular, the formula for is still true. It’s harder to work with than you might hope. But it’s still extremely useful.
Next time I’ll prove that this formula for is true, and do a few things with it. All this material was discovered by Gibbs in the late 1800’s, and it’s lurking any good book on statistical mechanics—but not phrased in the language of statistical manifolds. The physics textbooks usually consider special cases, like a box of gas where:
• is energy, is 1/temperature.
• is volume, is –pressure/temperature.
• is the number of particles, is chemical potential / pressure.
While these special cases are important and interesting, I’d rather be general!
I said “Any statistical manifold comes with a bunch of interesting geometrical structures”, but in fact some conditions are required. For example, the Fisher information metric is only well-defined and nondegenerate under some conditions on the map For example, if maps every point of to the same probability distribution, the Fisher information metric will vanish.
Similarly, the entropy function is only smooth under some conditions on
Furthermore, the integral
may not converge for all values of the numbers But in my discussion of Gibbsian statistical manifolds, I was assuming that an entropy-maximizing probability distribution with
actually exists. In this case the probability distribution is also unique (almost everywhere).
For all my old posts on information geometry, go here: