## Information Geometry (Part 5)

I’m trying to understand the Fisher information metric and how it’s related to Öttinger’s formalism for ‘dissipative mechanics’ — that is, mechanics including friction. They involve similar physics, and they involve similar math, but it’s not quite clear how they fit together.

I think it will help to do an example. The harmonic oscillator is a trusty workhorse throughout physics, so let’s do that.

So: suppose you have a rock hanging on a spring, and it can bounce up and down. Suppose it’s in thermal equilibrium with its environment. It will wiggle up and down ever so slightly, thanks to thermal fluctuations. The hotter it is, the more it wiggles. These vibrations are random, so its position and momentum at any given moment can be treated as random variables.

If we take quantum mechanics into account, there’s an extra source of randomness: quantum fluctuations. Now there will be fluctuations even at zero temperature. Ultimately this is due to the uncertainty principle. Indeed, if you know the position for sure, you can’t know the momentum at all!

Let’s see how the position, momentum and energy of our rock will fluctuate given that we know all three of these quantities on average. The fluctuations will form a little fuzzy blob, roughly ellipsoidal in shape, in the 3-dimensional space whose coordinates are position, momentum and energy:

Yeah, I know you’re sick of this picture, but this time it’s for real: I want to calculate what this ellipsoid actually looks like! I’m not promising I’ll do it — I may get stuck, or bored — but at least I’ll try.

Before I start the calculation, let’s guess the answer. A harmonic oscillator has a position $q$ and momentum $p$, and its energy is

$H = \frac{1}{2}(q^2 + p^2)$

Here I’m working in units where lots of things equal 1, to keep things simple.

You’ll notice that this energy has rotational symmetry in the position-momentum plane. This is ultimately what makes the harmonic oscillator such a beloved physical system. So, we might naively guess that our little ellipsoid will have rotational symmetry as well, like this:

or this:

Here I’m using the $x$ and $y$ coordinates for position and momentum, while the $z$ coordinate stands for energy. So in these examples the position and momentum fluctuations are the same size, while the energy fluctuations, drawn in the vertical direction, might be bigger or smaller.

Unfortunately, this guess really is naive. After all, there are lots of these ellipsoids, one centered at each point in position-momentum-energy space. Remember the rules of the game! You give me any point in this space. I take the coordinates of this point as the mean values of position, momentum and energy, and I find the maximum-entropy state with these mean values. Then I work out the fluctuations in this state, and draw them as an ellipsoid.

If you pick a point where position and momentum have mean value zero, you haven’t broken the rotational symmetry of the problem. So, my ellipsoid must be rotationally symmetric. But if you pick some other mean value for position and momentum, all bets are off!

Fortunately, this naive guess is actually right: all the ellipsoids are rotationally symmetric — even the ones centered at nonzero values of position and momentum! We’ll see why soon. And if you’ve been following this series of posts, you’ll know what this implies: the “Fisher information metric” $g$ on position-momentum-energy space has rotational symmetry about any vertical axis. (Again, I’m using the vertical direction for energy.) So, if we slice this space with any horizontal plane, the metric on this plane must be the plane’s usual metric times a constant:

$g = \mathrm{constant} \, (dq^2 + dp^2)$

Why? Because only the usual metric on the plane, or any multiple of it, has ordinary rotations around every point as symmetries.

So, roughly speaking, we’re recovering the ‘obvious’ geometry of the position-momentum plane from the Fisher information metric. We’re recovering ‘ordinary’ geometry from information geometry!

But this should not be terribly surprising, since we used the harmonic oscillator Hamiltonian

$H = \frac{1}{2}(q^2 + p^2)$

as an input to our game. It’s mainly just a confirmation that things are working as we’d hope.

There’s more, though. Last time I realized that because observables in quantum mechanics don’t commute, the Fisher information metric has a curious skew-symmetric partner called $\omega$. So, we should also study this in our example. And when we do, we’ll see that restricted to any horizontal plane in position-momentum-energy space, we get

$\omega = \mathrm{constant} \, (dq \, dp - dp \, dq)$

This looks like a mutant version of the Fisher information metric

$g = \mathrm{constant} \, (dq^2 + dp^2)$

and if you know your geometry, you’ll know it’s the usual ‘symplectic structure’ on the position-energy plane — at least, times some constant.

All this is very reminiscent of Öttinger’s work on dissipative mechanics. But we’ll also see something else: while the constant in $g$ depends on the energy — that is, on which horizontal plane we take — the constant in $\omega$ does not!

Why? It’s perfectly sensible. The metric $g$ on our horizontal plane keeps track of fluctuations in position and momentum. Thermal fluctuations get bigger when it’s hotter — and to boost the average energy of our oscillator, we must heat it up. So, as we increase the energy, moving our horizontal plane further up in position-momentum-energy space, the metric on the plane gets bigger! In other words, our ellipsoids get a fat cross-section at high energies.

On the other hand, the symplectic structure $\omega$ arises from the fact that position $q$ and momentum $p$ don’t commute in quantum mechanics. They obey Heisenberg’s ‘canonical commutation relation’:

$q p - p q = i$

This relation doesn’t involve energy, so $\omega$ will be the same on every horizontal plane. And it turns out this relation implies

$\omega = \mathrm{constant} \, (dq \, dp - dp \, dq)$

for some constant we’ll compute later.

Okay, that’s the basic idea. Now let’s actually do some computations. For starters, let’s see why all our ellipsoids have rotational symmetry!

To do this, we need to understand a bit about the mixed state $\rho$ that maximizes entropy given certain mean values of position, momentum and energy. So, let’s choose the numbers we want for these mean values (also known as ‘expected values’ or ‘expectation values’):

$\langle H \rangle = E$

$\langle q \rangle = q_0$

$\langle p \rangle = p_0$

I hope this isn’t too confusing: $H, p, q$ are our observables which are operators, while $E, p_0, q_0$ are the mean values we have chosen for them. The state $\rho$ depends on $E, p_0$ and $q_0$.

We’re doing quantum mechanics, so position $q$ and momentum $p$ are both self-adjoint operators on the Hilbert space $L^2(\mathbb{R})$:

$(q\psi)(x) = x \psi(x)$

$(p\psi)(x) = - i \frac{d \psi}{dx}(x)$

Indeed all our observables, including the Hamiltonian

$H = \frac{1}{2} (p^2 + q^2)$

are self-adjoint operators on this Hilbert space, and the state $\rho$ is a density matrix on this space, meaning a positive self-adjoint operator with trace 1.

Now: how do we compute $\rho$? It’s a Lagrange multiplier problem: maximizing some function given some constraints. And it’s well-known that when you solve this problem, you get

$\rho = \frac{1}{Z} e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)}$

where $\lambda^1, \lambda^2, \lambda^3$ are three numbers we yet have to find, and $Z$ is a normalizing factor called the partition function:

$Z = \mathrm{tr} (e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)} )$

Now let’s look at a special case. If we choose $\lambda^1 = \lambda^2 = 0$, we’re back a simpler and more famous problem, namely maximizing entropy subject to a constraint only on energy! The solution is then

$\rho = \frac{1}{Z} e^{-\beta H} , \qquad Z = \mathrm{tr} (e^{- \beta H} )$

Here I’m using the letter $\beta$ instead of $\lambda^3$ because this is traditional. This quantity has an important physical meaning! It’s the reciprocal of temperature in units where Boltzmann’s constant is 1.

Anyway, back to our special case! In this special case it’s easy to explicitly calculate $\rho$ and $Z$. Indeed, people have known how ever since Planck put the ‘quantum’ in quantum mechanics! He figured out how black-body radiation works. A box of hot radiation is just a big bunch of harmonic oscillators in thermal equilibrium. You can work out its partition function by multiplying the partition function of each one.

So, it would be great to reduce our general problem to this special case. To do this, let’s rewrite

$Z = \mathrm{tr} (e^{-(\lambda^1 q + \lambda^2 p + \lambda^3 H)} )$

in terms of some new variables, like this:

$\rho = \frac{1}{Z} e^{-\beta(H - f q - g p)}$

where now

$Z = \mathrm{tr} (e^{-\beta(H - f q - g p)} )$

Think about it! Now our problem is just like an oscillator with a modified Hamiltonian

$H' = H - f q - g p$

What does this mean, physically? Well, if you push on something with a force $f$, its potential energy will pick up a term $- f q$. So, the first two terms are just the Hamiltonian for a harmonic oscillator with an extra force pushing on it!

I don’t know a nice interpretation for the $- g p$ term. We could say that besides the extra force equal to $f$, we also have an extra ‘gorce’ equal to $g$. I don’t know what that means. Luckily, I don’t need to! Mathematically, our whole problem is invariant under rotations in the position-momentum plane, so whatever works for $q$ must also work for $p$.

Now here’s the cool part. We can complete the square:

\begin{aligned} H' & = \frac{1}{2} (q^2 + p^2) - f q - g p \\ &= \frac{1}{2}(q^2 - 2 q f + f^2) + \frac{1}{2}(p^2 - 2 q g + g^2) - \frac{1}{2}(g^2 + f^2) \\ &= \frac{1}{2}((q - f)^2 + (p - g)^2) - \frac{1}{2}(g^2 + f^2) \end{aligned}

so if we define ‘translated’ position and momentum operators:

$q' = q - f, \qquad p' = p - g$

we have

$H' = \frac{1}{2}({q'}^2 + {p'}^2) - \frac{1}{2}(g^2 + f^2)$

So: apart from a constant, $H'$ is just the harmonic oscillator Hamiltonian in terms of ‘translated’ position and momentum operators!

In other words: we’re studying a strange variant of the harmonic oscillator, where we are pushing on it with an extra force and also an extra ‘gorce’. But this strange variant is exactly the same as the usual harmonic oscillator, except that we’re working in translated coordinates on position-momentum space, and subtracting a constant from the Hamiltonian.

These are pretty minor differences. So, we’ve succeeded in reducing our problem to the problem of a harmonic oscillator in thermal equilibrium at some temperature!

This makes it easy to calculate

$Z = \mathrm{tr} (e^{-\beta(H - f q - g p)} ) = \mathrm{tr}(e^{-\beta H'})$

By our formula for $H'$, this is just

$Z = e^{\frac{1}{2}(g^2 + f^2)} \; \mathrm{tr} (e^{-\frac{1}{2}({q'}^2 + {p'}^2)})$

And the second factor here equals the partition function for the good old harmonic oscillator:

$Z = e^{\frac{1}{2}(g^2 + f^2)} \; \mathrm{tr} (e^{-\beta H})$

So now we’re back to a textbook problem. The eigenvalues of the harmonic oscillator Hamiltonian are

$n + \frac{1}{2}$

where

$n = 0,1,2,3, \dots$

So, the eigenvalues of $e^{-\beta H}$ are are just

$e^{-\beta(n + \frac{1}{2})}$

and to take the trace of this operator, we sum up these eigenvalues:

$\mathrm{tr}(e^{-\beta H}) = \sum_{n = 0}^\infty e^{-\beta (n + \frac{1}{2})} = \frac{e^{-\beta/2}}{1 - e^{-\beta}}$

So:

$Z = e^{\frac{1}{2}(g^2 + f^2)} \; \frac{e^{-\beta/2}}{1 - e^{-\beta}}$

We can now compute the Fisher information metric using this formula:

$g_{ij} = \frac{\partial^2}{\partial \lambda^i \partial \lambda^j} \ln Z$

if we remember how our new variables are related to the $\lambda^i$:

$\lambda^1 = \beta f , \qquad \lambda^2 = \beta g, \qquad \lambda^3 = \beta$

It’s just calculus! But I’m feeling a bit tired, so I’ll leave this pleasure to you.

For now, I’d rather go back to our basic intuition about how the Fisher information metric describes fluctuations of observables. Mathematically, this means it’s the real part of the covariance matrix

$g_{ij} = \mathrm{Re} \langle \, (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle) \, \rangle$

where for us

$X_1 = q, \qquad X_2 = p, \qquad X_3 = E$

Here we are taking expected values using the mixed state $\rho$. We’ve seen this mixed state is just like the maximum-entropy state of a harmonic oscillator at fixed temperature — except for two caveats: we’re working in translated coordinates on position-momentum space, and subtracting a constant from the Hamiltonian. But neither of these two caveats affects the fluctuations $(X_i - \langle X_i \rangle)$ or the covariance matrix.

So, as indeed we’ve already seen, $g_{ij}$ has rotational symmetry in the 1-2 plane. Thus, we’ll completely know it once we know $g_{11} = g_{22}$ and $g_{33}$; the other components are zero for symmetry reasons. $g_{11}$ will equal the variance of position for a harmonic oscillator at a given temperature, while $g_{33}$ will equal the variance of its energy. We can work these out or look them up.

I won’t do that now: I’m after insight, not formulas. For physical reasons, it’s obvious that $g_{11}$ must diminish with diminishing energy — but not go to zero. Why? Well, as the temperature approaches zero, a harmonic oscillator in thermal equilibrium approaches its state of least energy: the so-called ‘ground state’. In its ground state, the standard deviations of position and momentum are as small as allowed by the Heisenberg uncertainty principle:

$\Delta p \Delta q \ge \frac{1}{2}$

and they’re equal, so

$g_{11} = (\Delta q)^2 = \frac{1}{2}$.

That’s enough about the metric. Now, what about the metric’s skew-symmetric partner? This is:

$\omega_{ij} = \mathrm{Im} \langle \, (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle) \, \rangle$

Last time we saw that $\omega$ is all about expected values of commutators:

$\omega_{ij} = \frac{1}{2i} \langle [X_i, X_j] \rangle$

and this makes it easy to compute. For example,

$[X_1, X_2] = q p - p q = i$

so

$\omega_{12} = \frac{1}{2}$

Of course

$\omega_{11} = \omega_{22} = 0$

by skew-symmetry, so we know the restriction of $\omega$ to any horizontal plane. We can also work out other components, like $\omega_{13}$, but I don’t want to. I’d rather just state this:

Summary: Restricted to any horizontal plane in the position-momentum-energy space, the Fisher information metric for the harmonic oscillator is

$g = \mathrm{constant} (dq_0^2 + dp_0^2)$

with a constant depending on the temperature, equalling $\frac{1}{2}$ in the zero-temperature limit, and increasing as the temperature rises. Restricted to the same plane, the Fisher information metric’s skew-symmetric partner is

$\omega = \frac{1}{2} dq_0 \wedge dp_0$

(Remember, the mean values $q_0, p_0, E_0$ are the coordinates on position-momentum-energy space. We could also use coordinates $f, g, \beta$ or $f, g$ and temperature. In the chatty intro to this article you saw formulas like those above but without the subscripts; that’s before I got serious about using $q$ and $p$ to mean operators.)

And now for the moral. Actually I have two: a physics moral and a math moral.

First, what is the physical meaning of $g$ or $\omega$ when restricted to a plane of constant $E_0$, or if you prefer, a plane of constant temperature?

Physics Moral: Restricted to a constant-temperature plane, $g$ is the covariance matrix for our observables. It is temperature-dependent. In the zero-temperature limit, the thermal fluctuations go away and $g$ depends only on quantum fluctuations in the ground state. On the other hand, $\omega$ restricted to a constant-temperature plane describes Heisenberg uncertainty relations for noncommuting observables. In our example, it is temperature-independent.

Second, what does this have to do with Kähler geometry? Remember, the complex plane has a complex-valued metric on it, called a Kähler structure. Its real part is a Riemannian metric, and its imaginary part is a symplectic structure. We can think of the the complex plane as the position-momentum plane for a point particle. Then the symplectic structure is the basic ingredient needed for Hamiltonian mechanics, while the Riemannian structure is the basic ingredient needed for the harmonic oscillator Hamiltonian.

Math Moral: In the example we considered, $\omega$ restricted to a constant-temperature plane is equal to $\frac{1}{2}$ the usual symplectic structure on the complex plane. On the other hand, $g$ restricted to a constant-temperature plane is a multiple of the usual Riemannian metric on the complex plane — but this multiple is $\frac{1}{2}$ only when the temperature is zero! So, only at temperature zero are $g$ and $\omega$ the real and imaginary parts of a Kähler structure.

It will be interesting to see how much of this stuff is true more generally. The harmonic oscillator is much nicer than your average physical system, so it can be misleading, but I think some of the morals we’ve seen here can be generalized.

Some other time I may so more about how all this is
related to Öttinger’s formalism, but the quick point is that he too has mixed states, and a symmetric $g$, and a skew-symmetric $\omega$. So it’s nice to see if they match up in an example.

Finally, two footnotes on terminology:

β: In fact, this quantity $\beta = 1/kT$ is so important it deserves a better name than ‘reciprocal of temperature’. How about ‘coolness’? An important lesson from statistical mechanics is that coolness is more fundamental than temperature. This makes some facts more plausible. For example, if you say “you can never reach absolute zero,” it sounds very odd, since you can get as close as you like, and it’s even possible to get negative temperatures — but temperature zero remains tantalizingly out of reach. But “you can never attain infinite coolness” — now that makes sense.

Gorce: I apologize to Richard Feynman for stealing the word ‘gorce’ and using it a different way. Does anyone have a good intuition for what’s going on when you apply my sort of ‘gorce’ to a point particle? You need to think about velocity-dependent potentials, of that I’m sure. In the presence of a velocity-dependent potential, momentum is not just mass times velocity. Which is good: if it were, we could never have a system where the mean value of both $q$ and $p$ stayed constant over time!

### 53 Responses to Information Geometry (Part 5)

1. Peter Morgan says:

Self-advertising, I’m afraid, but I still like enough of my “A succinct presentation of the quantized Klein–Gordon field, and a similar quantum presentation of the classical Klein–Gordon random field” – quant-ph/0411156, published as Phys. Lett. A 338, 8-12(2005), http://dx.doi.org/10.1016/j.physleta.2005.02.019 , to suggest it here, despite its limitations. Section 4, particularly, where I more-or-less characterize the similarities and differences between thermal and quantum fluctuations, albeit only for free quantum fields.

I like “coolness” for beta, in which case we can perhaps also talk about “quantum coolness” for quantum fluctuations.

• John Baez says:

Your paper looks interesting, but I’m afraid I’ll probably have forgotten about it by the time I go to the library next – I don’t know how to bust through the pay-wall and view it online at Phys. Lett. A.

Oh: it’s free on the arXiv. Much better.

A couple of questions:

1) Could you try studying quantum fields at nonzero temperature in your formalism, and would that perhaps allow you to interpolate between the quantum field and the thermal classical field? You could treat both Planck’s constant and temperature as adjustable parameters, and get a 2-parameter family of theories.

2) Why does your thermal classical field have Galilean invariance? That seems very odd to me, since the Klein-Gordon equation is Lorentz-invariant.

• Peter Morgan says:

1) Could you try studying quantum fields at nonzero temperature in your formalism, and would that perhaps allow you to interpolate between the quantum field and the thermal classical field?

Equation (9) in the paper I mentioned does just that. Also,

$\mathrm{Tr}[\exp(-\beta\hat{H}-\lambda\hat{\Xi})],$

where $\hat{\Xi}$ is a Lorentz-invariant analogue of the Hamiltonian, equation (15), generates quantum states that have “extra” quantum fluctuations, without changing the commutation relations. The tricky moment is that one cannot “reduce” quantum fluctuations without changing the commutation relations to use a smaller Planck’s constant.

I’ve thought for some time that this means that there ought to be a “quantum entropy” thermodynamic dual to quantum fluctuations, but I’ve never been able to make it go. I’ve also thought that Verlinde’s ideas on gravity make slightly more sense if it’s variations of quantum fluctuations rather than of thermal fluctuations that “cause” gravity, largely because GR is founded on local Lorentz invariance, but making that go is another game. I want to understand interacting QFTs in Minkowski space at zero temperature first.

2) Why does your thermal classical field have Galilean invariance? That seems very odd to me, since the Klein-Gordon equation is Lorentz-invariant.

The classical thermal state is determined by our choice of a Hamiltonian, which is the 00-component of a tensor, so a choice of time-like direction has to be made to determine a particular thermal state. Same for thermal states of a quantum field, except the Hamiltonian is the time-like component of a 4-vector.

• John Baez says:

John wrote:

1) Could you try studying quantum fields at nonzero temperature in your formalism, and would that perhaps allow you to interpolate between the quantum field and the thermal classical field?

Peter wrote:

Equation (9) in the paper I mentioned does just that.

John wrote:

2) Why does your thermal classical field have Galilean invariance? That seems very odd to me, since the Klein-Gordon equation is Lorentz-invariant.

Robert wrote:

The classical thermal state is determined by our choice of a Hamiltonian, which is the 00-component of a tensor, so a choice of time-like direction has to be made to determine a particular thermal state.

Right. But to me “Galilean invariance” means invariance under the Galilean group, which includes rotations, spacetime translations and most notably Galilean transformations of the form

$x \mapsto x + v t$
$t \mapsto t$

as familiar from physics before special relativity. I was shocked to hear that a thermal Klein-Gordon field was invariant under Galilean transformations! But I guess you didn’t mean that.

• Peter Morgan says:

Right. I got myself in a twist at the time with how to say this. It’s a 3D Euclidean subgroup of the Poincaré group plus time translations, the subgroup of the Poincaré group that leaves a time-like 4-vector invariant. No single name that I know of?

• John Baez says:

Peter wrote:

t’s a 3D Euclidean subgroup of the Poincaré group plus time translations, the subgroup of the Poincaré group that leaves a time-like 4-vector invariant.

It’s an important group, because it’s the intersection of the Poincaré group and the Galilei group. So, it’s the group consisting of all spacetime symmetries that Einstein and Galileo would agree on!

No single name that I know of?

It’s usually called the Aristotle group. That’s a cute name, because Aristotle believed in a concept of absolute rest. But I don’t think it’s historically accurate, because I think he believed the Earth was located at the center of the universe, with gravity pulling from everywhere towards the Earth’s center. So his symmetry group lacked spatial translations: just

$\mathbb{R} \times SO(3)$

for time translations and spatial rotations, I guess.

2. Aron says:

I guess one simple way to write the “gorce” would be with a vector potential, like in electromagnetism. Since it would modify the momentum on the Hamiltonian from p → p – qA, for a particle with charge q.

It is also interesting to think that in good units if we put an electric field E and a vector potential A both can be identified with f and g in the same manner as the proportionality constant is q for the force and for the substitution of p.

But probably you were looking for something else, more generic perhaps.

• John Baez says:

Actually this vector potential idea is exactly what I’m looking for. For some reason I was mixed up and thought it wouldn’t quite work.

So you’re saying that my Hamiltonian $H'$ can be thought of as the Hamiltonian for a charged particle in a harmonic oscillator potential together with a constant $E$ field and a constant $A$ field?

Very interesting, especially since a constant $A$ field gives no $B$ field (and for that matter, there is no $B$ field in 1-dimensional space: the $B$ field has $n(n-1)/2$ components when space has dimension $n$).

3. Thank you for yet another wonderful post. I wonder if there is some sort of prescription in either information geometry or the Ottinger formalism to some sort of “principle of least action” in dissipative settings. The challenge there, I imagine, would be to define the dissipative Lagrangian. Can one get it by applying a Legendre transform to the modified Hamiltonian? And what does it all mean physically?

• John Baez says:

I’m glad you enjoyed this post, Manoj. I spent a lot of time on it, and it was fun to write: at first it involved a lot of ugly algebra, but then I figured out how to replace most of that with words. By the end, though, I was afraid its length would merely scare everyone away.

I am very confused about people’s attempts to generalize the principle of least action to dissipative settings. Two famous attempts are Prigogine’s ‘Principle of Minimum Entropy Generation’ and Jaynes’ ‘Principle of Maximum Entropy Generation’. Perhaps that gives you a taste of why I am very confused.

However, right now I am very happy because I’m beginning to see how information geometry is related to Öttinger’s formalism, and some other formalisms for dissipative mechanics that involve both a skew-symmetric Poisson bracket and a symmetric ‘dissipative bracket’. I think the story will be quite beautiful.

• Manoj says:

• John Baez says:

I hadn’t guessed you were seriously interested in chasing down leads on generalizing the principle of least action to dissipative settings. In case you are, here is my complete collection of clues.

First I heard about these three papers:

• L.M. Martyusheva and V.D. Seleznev, Maximum entropy production principle in physics, chemistry and biology” Physics Reports 426 (April 2006), 1-45.

• R. C. Dewar, Maximum entropy production and the fluctuation theorem. (Available only with subscription.)

I never got around to reading Dewar’s paper… and I was very confused, because Ilya Prigogine has a quite successful principle of least entropy production that applies to certain linear systems. But Martyusheva and Seleznev write:

1.2.6. The relation of Ziegler’s maximum entropy production principle and Prigogine’s minimum entropy production principle

If one casts a glance at the heading, he may think that the two principles are absolutely contradictory. This is not the case. It follows from the above discussion that both linear and nonlinear thermodynamics can be constructed deductively using Ziegler’s principle. This principle yields, as a particular case (Section 1.2.3), Onsager’s variational principle, which holds only for linear nonequilibrium thermodynamics. Prigogine’s minimum entropy production principle (see Section 1.1) follows already from Onsager–Gyarmati’s principle as a particular statement, which is valid for stationary processes in the presence of free forces. Thus, applicability of Prigogine’s principle is much narrower than applicability of Ziegler’s principle.

Then David Corfield got me really excited by noting that Dewar’s paper relies on some work by the great E. T. Jaynes, where he proposes something called the ‘Maximum Caliber Principle':

• E. T. Jaynes, Macroscopic prediction, in H. Haken (ed.) Complex systems – operational approaches in neurobiology, Springer-Verlag, Berlin, 1985, pp. 254–269.

And I read this paper and got really excited… but then I got distracted by other things.

But then, here on this blog, John F tried to convince me that Jaynes’ ‘Maximum Entropy Method” for statistical reasoning is not distinct from his Maximum Caliber Principle. In pondering that, I bumped into this:

Abstract: Jaynes’ maximum entropy (MaxEnt) principle was recently used to give a conditional, local derivation of the “maximum entropy production” (MEP) principle, which states that a flow system with fixed flow(s) or gradient(s) will converge to a steady state of maximum production of thermodynamic entropy (R.K. Niven, Phys. Rev. E, in press). The analysis provides a steady state analog of the MaxEnt formulation of equilibrium thermodynamics, applicable to many complex flow systems at steady state. The present study examines the classification of physical systems, with emphasis on the choice of constraints in MaxEnt. The discussion clarifies the distinction between equilibrium, fluid flow, source/sink, flow/reactive and other systems, leading into an appraisal of the application of MaxEnt to steady state flow and reactive systems.

… which even cites some papers applying these ideas to climate change!

So, that’s where I am now. I’ve spent far too little time actually thinking about these issues. I’m sure if I did, it would all start to make sense. But right now it’s information geometry that’s mainly on my mind. It’s probably all part of the same story, but it’s a somewhat different portion of that story…

If you find more clues, or figure something out, please let me know!

• Thanks for sharing those references! I suppose the point is that these are all variational formulations, as is the principle of least action. Hmm…

• John Baez says:

By the way, I see you have an interesting-looking paper on chemical reaction network theory. Unfortunately I don’t know anything about that subject, so I don’t know what a “siphon” is, and I don’t know anything about the theories of “normal networks by Gnacadja, atomic event-systems by Adleman et al. and constructive networks by Shinar et al.” But I love systems that can be described using graphs. What’s a nice easy way to start learning this stuff?

I see you posted your paper on an arXiv group called “quantitative biology: molecular networks”. There are some pretty futuristic-sounding papers here, like:

• Steve T. Piantadosi and James P. Crutchfield, How the dimension of space affects the products of pre-biotic evolution: the spatial population dynamics of structural complexity and the emergence of membranes.

Abstract: We show that autocatalytic networks of epsilon-machines and their population dynamics differ substantially between spatial (geographically distributed) and nonspatial (panmixia) populations. Generally, regions of spacetime-invariant autocatalytic networks—or domains—emerge in geographically distributed populations. These are separated by functional membranes of complementary epsilon-machines that actively translate between the domains and are responsible for their growth and stability. We analyze both spatial and nonspatial populations, determining the algebraic properties of the autocatalytic networks that allow for space to affect the dynamics and so generate autocatalytic domains and membranes. In addition, we analyze populations of intermediate spatial architecture, delineating the thresholds at which spatial memory (information storage) begins to determine the character of the emergent auto-catalytic organization.

Zounds!

• Thanks, John. I have recently come to realise that the catalysis results in this paper are much more general than mass action kinetics. They are actually network-level results which apply to a large class of dynamics, both deterministic and stochastic. I am working on a revision that will reflect this point of view, and contain more expository material.

John: What’s a nice easy way to start learning this stuff?

You can start with Gunawardena 2003. The toric geometry aspects of the subject are stressed in Toric Dynamical Systems by Craciun, Dickenstein, Shiu and Sturmfels. Our paper On the mathematics of the law of mass action may be of interest if you want to see the mathematics worked out in full gory detail for undirected graphs. For the notion of siphon, see Angeli, De Leenheer, Sontag, 2007.

Finally the classic references are General Mass Action Kinetics by Horn and Jackson and Lectures on Reaction Networks by Martin Feinberg.

• John Baez says:

Thanks a lot for the references! If you work at the level of ‘complexes’ rather than individual molecular species, I believe chemical reaction networks may be an example of the ‘box models’ discussed by Nathan Urban in “week304″. This may not be a helpful connection, but there must be some out there. So I’m very glad you’re trying to take techniques from mass action kinetics and present them in a way that people from other fields might more easily understand.

I actually spent a bunch of time studying toric geometry, but I never got around to explaining it on This Week’s Finds, so now I’ve forgotten a bunch. It’s wonderful stuff, though!

• I can not find the “reply” button below your last comment. How curious!

I took a quick look at box models, and I’m afraid I could not figure out how they are different from Markov chains. Nevertheless, I believe your comment is “morally” correct. If all complexes were species, then mass action kinetics too would reduce to Markov chains.

Just a word of caution that this is not true in general. It is true that the flow between complexes depends linearly on the complexes, and this flow contributes stoichiometrically (and linearly) to the changes in different species. However, there is a hidden non-linearity in the manner in which changes in species concentrations feed back to cause changes in the complexes.

• John Baez says:

Manoj wrote:

However, there is a hidden non-linearity in the manner in which changes in species concentrations feed back to cause changes in the complexes.

I think I understand what you mean now, if “complex” means something like “collection of molecules”. For example, if we have a reaction

A + A → B

then the reaction rate scales as the square of the concentration of A. Is that the kind of nonlinearity you meant?

• John wrote:

I think I understand what you mean now, if “complex” means something like “collection of molecules”. For example, if we have a reaction

A + A → B

then the reaction rate scales as the square of the concentration of A. Is that the kind of nonlinearity you meant?

Yes, that’s what I meant. Sorry for making it sound so mysterious. :-)

4. phorgyphynance says:

Hi,

I’ve started a new job on Monday, so am a bit tied up lately, but after reading this, please have a look at this:

Black-Scholes and Schrodinger

Note the canonical commutation relation at the bottom, where standard deviation plays the role of Planck’s constant.

There is something deep here that I still haven’t put my hands on…

More later…

• phorgyphynance says:

Note also that there is a dissipative term. What is the financial analog? :)

5. John F says:

Neat stuff! As always, your presentations have clarity sadly lacking in my legal pads riddled with sign mistakes. I certainly don’t have any insight into the math moral, although I do wonder if it depends on the exponential commutator trick, where the operators commute with their commutator.

In a previous post, I seem to remember you were musing on other factors of 2 or 1/2. I think a lot of these are possibly related to half-on-one-side half-on-the-other transformations like the Wigner transformation

http://en.wikipedia.org/wiki/Wigner_quasi-probability_distribution

• John F says:

Oh yeah, the exponential commutator trick is a special case of Baker Campbell Hausdorff

http://en.wikipedia.org/wiki/Baker%E2%80%93Campbell%E2%80%93Hausdorff_formula

Anyway, according to Liu “Gibbs States and the Consistency of Local Density Matrices”, http://arxiv.org/PS_cache/quant-ph/pdf/0603/0603012v1.pdf
the feasibility of Gibbs states does not depend upon maximizing entropy. But it does depend on the Golden-Thompson result

http://en.wikipedia.org/wiki/Golden%E2%80%93Thompson_inequality

It would to be interesting to see how much mileage you can get out of just Golden-Thompson.

• John F says:

John B,
just wondering, a couple of things. 1) What is the categorification of an inequality? 2) Does it help anything to relate a categorification of an equality as categorification of two linked inequalities?

I think the Golden-Thompson inequality leads to an Atiyah–Singer inequality, btw.

• John Baez says:

John F wrote:

What is the categorification of an inequality?

I don’t really want to talk about categorification here; that’s what the n-Category Café is for. Over here, my ilk began trying to categorify the Cauchy-Schwarz inequality. And over here, I joined the fray and managed to categorify a similar-looking but much less interesting inequality, the ‘TARDIS inequality’. David Tweed and others joined in…

So, how about asking over at one of those places? I’m trying to keep my split personality split: there’s no telling what would happen if Jekyll met Hyde.

Over there, I’m sure someone would be happy to answer your questions.

6. Stuart says:

I like “coolness” as a more descriptive name for Beta. Presumably the symbol for its SI unit is Fz? :)

7. Justin says:

Since a metric => Christoffel symbols => Ricci tensor/curvature, what do you make of the consequence(*) that several common likelihood functions imply a Ricci (information) tensor proportional to the (information) metric, i.e. these are in many cases Einstein manifolds?

(*)You should check this yourself.

• John Baez says:

Thanks for the suggestion/question. I’d need to check this myself before I could say anything really interesting. But lately I’ve been learning more about so-called symmetric spaces and their relation to Jordan algebras, for some work on the foundations of quantum theory… and in the process I realized, much to my shock, that this stuff was related to information geometry! And since some symmetric spaces are Einstein manifolds, I suspect your comment is somehow related. But maybe you could help me out by pointing me to the examples you’re thinking of.

• Justin says:

As examples, here are four (two-parameter, univariate) distributions which I claim yield Ric = (R/n)*g, indicating that the real part of the almost-Kahler structure is Einsteinian. (n is the dimensionality of the metric, therefore it is equal to the number of parameters — in each case below, n = 2)
Gaussian: R = -1
Cauchy: R = -4
Levy: R = -7
Gamma: R = 0

The first three examples are notable for belonging to the class of stable distributions. The last case turns out to be Ricci-flat, which is a surprising result.

8. streamfortyseven says:

Don’t know if this is of interest to anyone here, but it sounds like it’s in the ballpark…

PIRSA:10110080
Title: Physics as Information: Quantum Theory meets Relativity
Speaker(s): Giacomo Mauro D’Ariano – Universita degli Studi di Pavia
Abstract: I will review some recent advances on the line of deriving quantum field theory from pure quantum information processing. The general idea is that there is only Quantum Theory (without quantization rules), and the whole Physics—including space-time and relativity—is emergent from the processing. And, since Quantum Theory itself is made with purely informational principles, the whole Physics must be reformulated in information-theoretical terms. Here’s the TOC of the talk: a) Very short review of the informational axiomatization of Quantum Theory; b) How space-time and relativistic covariance emerge from the quantum computation; c) Special relativity without space: other ideas; d) Dirac equation derived as information flow (without the need of Lorentz covariance); e) Information-theoretical meaning of inertial mass and Planck constant; f) Observable consequences (at the Planck scale?); h) What about Gravity? Three alternatives as a start for a brainstorming.
Date: 30/11/2010 – 4:00 pm
Series: Quantum Foundations
Location: 301
URL: http://pirsa.org/10110080/

• John Baez says:

Thanks! Here are some papers by Giacomo Mauro D’Ariano. I’ll look them over sometime. Right now I’m busy trying to finish a paper myself! When I’m done I plan to continue this series of posts on information geometry — I’m just getting started.

• John Baez says:

Here is a paper that more explicitly seeks to relate information geometry to general relativity:

• Ariel Caticha, The information geometry of space and time.

Personally I’m quite suspicious when people make grand claims about how they’ve succeeded in reducing Big Concept #1 to Big Concept #2 — at least until I’ve checked the details. I looked briefly at the above paper and noticed that it does not make a connection to Einstein’s equations or the Einstein-Hilbert Lagrangian, although it does seem to get some Lorentzian geometry coming out of information geometry.

I find Justin’s remarks here to be easier to swallow: no grand claims, but very specific claims that certain important Fisher information metrics obey Einstein’s equation. I haven’t checked these claims, but the modesty with which they were made improves my a priori estimate of their truth value.

9. John baez wrote: In fact, this quantity $\beta = 1/kT$ is so important it deserves a better name than ‘reciprocal of temperature’. How about ‘coolness’?

I use coldness in Section 6.1 of

http://de.arxiv.org/pdf/0810.1019v1

John Baez wrote:

I’m beginning to see how information geometry is related to Öttinger’s formalism, and some other formalisms for dissipative mechanics that involve both a skew-symmetric Poisson bracket and a symmetric ‘dissipative bracket’.

But you forget that in Oettinger’s treatement only the conservative (Poisson bracket) part is sort of canonical and that the dissipative part has far more freedom and incorporates material properties (in agreement with the physics of complex fluids), while your omega is also canonical. Thus the connection you glimpse is spurious.

By the way, in the n-category cafe one could write comments in a separate window while heere one must scroll up and down to the bottom and the text one is quoting. How do I reply to a reply or to the main text from a separate window?

• DavidTweed says:

For your last question, the approach I take is to right-click on “Reply” and choose open in new tab so I can flick back-and-forth (or window if you prefer).

• John Baez says:

Arnold wrote:

Thus the connection you glimpse is spurious.

Time will tell. I didn’t forget what you thought I forgot. I have a plan, but it requires more calculations, and it will take a while.

• I wrote:

the dissipative part has far more freedom and incorporates material properties (in agreement with the physics of complex fluids), while your omega is also canonical. Thus the connection you glimpse is spurious.

John wrote:

Time will tell. I didn’t forget what you thought I forgot. I have a plan, but it requires more calculations, and it will take a while.

A canonical antibracket could perhaps take the role of the noninformative prior in a Bayesian treatment of max entropy, but is surely not enough to correctly handle phenomenology.

Math has more cross-correspondences; in each case where the math fits Nature, only some of these are exploited by Nature, though in many cases the full gamut of uses of particular mathematical theories make (together) use of all important structural features.

So the presence of mathematical structure in a particular physics context may be misleading, since its presence is there for another physics context.

While reviewing the large amount of material on the foundations of quantum mechanics I stumbled upon a lot of useful structure that helped in understanding the physics. But I also stumbled upon as much or even more apparent structure that didn’t hold water when I tried to make it serve the exploration of properties of physical systems. Separating chaff from wheat in this area is a long and arduous process.

The relations between fundamental physics and information were thoroughly explored by Frieden a number of years ago:

• B. Roy Frieden, Physics from Fisher Information: a Unification 1998 (and a similar book 2004).

For an assessment, see the book by R.F. Streater,
Lost Causes In and Beyond Physics, 2007; also

http://www.mth.kcl.ac.uk/~dlavis/papers/Lavis&Streater2002.pdf

and, from a very different perspective,

I didn’t find his point of view convincing, and it didn’t help me to understand more than I already understood before. So I count it as a spurious connection.

Your approach resembles Frieden’s but you add ingredients from Jordan algebras and Kaehler manifolds.

But these do not describe the dissipative structure that you seem to hope it would.

Rather they play an important role in the description of conservative dynamics via coherent states in systems with a large dynamical group, which leads to a description in terms of symmetric spaces and hence, for certain of these, in terms of symmetric cones and Jordan algebras.

The way the metric and the symmetric space emerges (as a coset space of a Lie group) is described – though not fully enough to capture the complete web of connections – in:

• Zhang et al., Coherent states: theory and some applications, Rev. Mod. Phys. 62, 867–927 (1990).

This connection has strong implications for the study of symmetric and nearly symmetric systems, and
necessitates the purely mathematical bits that you see and try to organize. And indeed, this is fascinating stuff, worth to be organized in a way that makes it more accessible to others.

But this structure also generate accidental, misleading hints that do not lead anywhere. For example, given the metric and the general setting of statistical mechanics, one has automatically an information interpretation, in spite of a lacking underlying link to actual information in the statistical sense. (There are papers on this, but I don’t recall which ones.)

• John Baez says:

Arnold wrote:

The relations between fundamental physics and information were thoroughly explored by Frieden a number of years ago…

I wouldn’t say that. I’ve been reading his book, and while it has a few nice things in it, I find it pretty thin gruel.

Thanks for all the references! I’m not at all persuaded that my plan (which I have not revealed) is a futile one. But the only way for me to tell is to think a lot more.

If you never hear anything more from me about the relation between dissipative mechanics and information geometry, you may conclude that I have nothing interesting to say.

• Justin says:

Amari’s monograph “Methods of Information Geometry” seems to provide, among other things, insight into the non-symmetric part of a particular class of information theoretic metrics in terms of non-vanishing torsion tensor; although, I have just begun to pore through the details. I will speculate outside my expertise in saying this might be related to loop quantum gravity.

In what seems to generalize and support my previous argument for R = const, Amari makes the following claim, “In particular, every 2-dimensional statistical model which admits a group structure, a location-scale model for instance, turns out to be a space of constant curvature with respect to… the Fisher metric.” I have yet to verify this myself. It is interesting to observe that, if all 2-parameter models which admit a group structure exhibit negative constant Ricci scalar curvature, they are (broadly speaking) conformally equivalent to the unit disk, based on the uniformization theorem.

• Robert Smart says:

Which Frieden book are you reading? I just picked up the new (2004) book [in my doomed effort to prop up physical bookshops]. It claims to be significantly improved.

I also just discovered videolectures.net. It has a lecture titled “Information Geometry” by Sanjoy Dasgupta. It also has a David MacKay “Information Theory” lecture. I haven’t had a chance to see these yet. I was disappointed not to find any John Baez lectures.

• John Baez says:

Robert wrote:

Which Frieden book are you reading? I just picked up the new (2004) book [in my doomed effort to prop up physical bookshops]. It claims to be significantly improved.

Frieden has a 1998 book called Physics from Fisher Information: a Unification and—perhaps responding to accusations that he was being insufficiently ambitious—a 2004 book called Science from Fisher Information: a Unification. Most of the first book seems to be included almost unchanged in the second, but the second has some extra stuff.

It’s possible that I’m simply missing the point, but what I understood did not impress me greatly. Ray Streater, who is a well-known physicist and quite an expert on Fisher information, seems to agree:

• Raymond Streater, Lost causes in theoretical physics: Physics from Fisher information.

And I must say that it doesn’t exactly boost my confidence when a book includes portraits, hand-drawn by the author, of famous scientists including the author himself.

• streamfortyseven says:

Boltzmann’s constant, k, has the value of 1.38065 x 10**-23 J/K, so beta = 1/(k*T) has units of reciprocal energy, 1/J, rather than reciprocal temperature. Put another way,
using k = 0.069504 cm**1/T
beta = 1/0.069504cm**-1

• streamfortyseven says:

both of the cm units above should be 1/cm …

• Justin says:

It is perfectly acceptable, in fact customary in the theoretical discussion of $\beta$, to use a non-SI system of units where Boltzmann’s k = 1. In this case, Energy and Temperature share the same unit. Also, I believe you meant 0.69504 [1/(K cm)] in your choice of units.

• John Baez says:

As Justin suggests, I’m following the usual practice of theoretical physicists, and working in units where

$\hbar = c = k = 1$

Then temperature has units of inverse time — or in other words, energy. People who aren’t theoretical physicists may find this appalling, but it’s actually very illuminating.

10. To read Part 6, go here!

11. I don’t know if it is too late to ask, but I just cannot get $g_{13}=0$, by differentiating the expression for $\ln Z$ with respect to $\lambda$.

• John Baez says:

Darko wrote:

I don’t know if it is too late to ask…

It’s never too late to continue discussions on this blog… at least as long as I’m alive!

I didn’t do the calculation. I believed that we need $g_{13}= 0$ thanks to fact that the metric has rotational symmetry in the 12 plane, namely the $pq$ plane. But now this argument seems valid only at the points fixed by these rotations, namely those points where $p = q = 0$.

So, I think I was wrong. Thanks! If you’ve calculated $g_{13},$ I’d like to see it.

• In term of $\lambda$‘s I get

$g_{13}=-2\lambda_1/\beta^3$

from first derivative of $\ln Z =-\lambda_1/\beta^2$

so $g_{13}=2/\beta.$

Expression for average energy in terms of $\beta$ can not be inverted.

• John Baez says:

Thanks! Someday I plan to pursue this example a bit further…

• Average p term disappeared, and there is no edit button, Sorry for that. Here are equations again:
$g_{13}=-2\lambda_1/\beta^3$
$\langle p\rangle=-\lambda_1/\beta^2$
$g_{13}=2\langle p\rangle/\beta$

12. Patrick O'Neill says:

I take the coordinates of this point as the mean values of position, momentum and energy, and I find the least-entropy state with these mean values.

I don’t mean to pick nits, but should that be maximum, rather?

• John Baez says:

Thanks! I’ve fixed this here and on the website version of this series. I appreciate all corrections, no matter how small.