Information Geometry (Part 4)

Before moving on, I’d like to clear up a mistake I’d been making in all my previous posts on this subject.

(By now I’ve tried to fix those posts, because people often get information from the web in a hasty way, and I don’t want my mistake to spread. But you’ll still see traces of my mistake infecting the comments on those posts.)

So what’s the mistake? It’s embarrassingly simple, but also simple to fix. A Riemannian metric must be symmetric:

g_{ij} = g_{ji}

Now, I had defined the Fisher information metric to be the so-called ‘covariance matrix’:

g_{ij} = \langle (X_i - \langle X_i \rangle) \;(X_j- \langle X_j \rangle)\rangle

where X_i are some observable-valued functions on a manifold M, and the angle brackets mean “expectation value”, computed using a mixed state \rho that also depends on the point in M.

The covariance matrix is symmetric in classical mechanics, since then observables commute, so:

\langle AB \rangle = \langle BA \rangle

But it’s not symmetric is quantum mechanics! After all, suppose q is the position operator for a particle, and p is the momentum operator. Then according to Heisenberg

qp = pq + i

in units where Planck’s constant is 1. Taking expectation values, we get:

\langle qp \rangle = \langle pq \rangle + i

and in particular:

\langle qp \rangle \ne \langle pq \rangle

We can use this to get examples where g_{ij} is not symmetric.

However, it turns out that the real part of the covariance matrix is symmetric, even in quantum mechanics — and that’s what we should use as our Fisher information metric.

Why is the real part of the covariance matrix symmetric, even in quantum mechanics? Well, suppose \rho is any density matrix, and A and B are any observables. Then by definition

\langle AB \rangle = \mathrm{tr} (\rho AB)

so taking the complex conjugate of both sides

\langle AB\rangle^*  = \mathrm{tr}(\rho AB)^* = \mathrm{tr}((\rho A B)^*) = \mathrm{tr}(B^* A^* \rho^*)

where I’m using an asterisk both for the complex conjugate of a number and the adjoint of an operator. But our observables are self-adjoint, and so is our density matrix, so we get

\mathrm{tr}(B^* A^* \rho^*) = \mathrm{tr}(B A \rho) = \mathrm{tr}(\rho B A) = \langle B A \rangle

where in the second step we used the cyclic property of the trace. In short:

\langle AB\rangle^* = \langle BA \rangle

If we take real parts, we get something symmetric:

\mathrm{Re} \langle AB\rangle =  \mathrm{Re} \langle BA \rangle

So, if we redefine the Fisher information metric to be the real part of the covariance matrix:

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle) \; (X_j- \langle X_j \rangle)\rangle

then it’s symmetric, as it should be.

Last time I mentioned a general setup using von Neumann algebras, that handles the classical and quantum situations simultaneously. That applies here! Taking the real part has no effect in classical mechanics, so we don’t need it there — but it doesn’t hurt, either.

Taking the real part never has any effect when i = j, either, since the expected value of the square of an observable is a nonnegative number:

\langle (X_i - \langle X_i \rangle)^2 \rangle \ge 0

This has two nice consequences.

First, we get

g_{ii} = \langle (X_i - \langle X_i \rangle)^2 \rangle  \ge 0

and since this is true in any coordinate system, our would-be metric g is indeed nonnegative. It’ll be an honest Riemannian metric whenever it’s positive definite.

Second, suppose we’re working in the special case discussed in Part 2, where our manifold is an open subset of \mathbb{R}^n, and \mathbb{\rho} at the point x \in \mathbb{R}^n is the Gibbs state with \langle X_i \rangle = x_i. Then all the usual rules of statistical mechanics apply. So, we can compute the variance of the observable X_i using the partition function Z:

\langle (X_i - \langle X_i \rangle)^2 \rangle = \frac{\partial^2}{\partial \lambda_i^2} \ln Z

In other words,

g_{ii} =  \frac{\partial^2}{\partial \lambda_i^2} \ln Z

But since this is true in any coordinate system, we must have

g_{ij} =  \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z

(Here I’m using a little math trick: two symmetric bilinear forms whose diagonal entries agree in any basis must be equal. We’ve already seen that the left side is symmetric, and the right side is symmetric by a famous fact about mixed partial derivatives.)

However, I’m pretty sure this cute formula

g_{ij} =  \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z

only holds in the special case I’m talking about now, where points in \mathbb{R}^n are parametrizing Gibbs states in the obvious way. In general we must use

g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)(X_j- \langle X_j \rangle)\rangle

or equivalently,

g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

Okay. So much for cleaning up Last Week’s Mess. Here’s something new. We’ve seen that whenever A and B are observables (that is, self-adjoint),

\langle AB\rangle^* = \langle BA \rangle

We got something symmetric by taking the real part:

\mathrm{Re} \langle AB\rangle =  \mathrm{Re} \langle BA \rangle


\mathrm{Re} \langle AB \rangle = \frac{1}{2} \langle AB + BA \rangle

But by the same reasoning, we get something antisymmetric by taking the imaginary part:

\mathrm{Im} \langle AB\rangle =  - \mathrm{Im} \langle BA \rangle

and indeed,

\mathrm{Im} \langle AB \rangle = \frac{1}{2i} \langle AB - BA \rangle

Commutators like AB-BA are important in quantum mechanics, so maybe we shouldn’t just throw out the imaginary part of the covariance matrix in our desperate search for a Riemannian metric! Besides the symmetric tensor on our manifold M:

g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

we can also define a skew-symmetric tensor:

\omega_{ij} = \mathrm{Im} \,  \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

This will vanish in the classical case, but not in the quantum case!

If you’ve studied enough geometry, you should now be reminded of things like ‘Kähler manifolds’ and ‘almost Kähler manifolds’. A Kähler manifold is a manifold that’s equipped with a symmetric tensor g and a skew-symmetric tensor \omega which fit together in the best possible way. An almost Kähler manifold is something similar, but not quite as nice. We should probably see examples of these arising in information geometry! And that could be pretty interesting.

But in general, if we start with any old manifold M together with a function \rho taking values in mixed states, we seem to be making M into something even less nice. It gets a symmetric bilinear form g on each tangent space, and a skew-symmetric bilinear form \omega, and they vary smoothly from point to point… but they might be degenerate, and I don’t see any reason for them to ‘fit together’ in the nice way we need for a Kähler or almost Kähler manifold.

However, I still think something interesting might be going on here. For one thing, there are other situations in physics where a space of states is equipped with a symmetric g and a skew-symmetric \omega. They show up in ‘dissipative mechanics’ — the study of systems whose entropy increases.

To conclude, let me remind you of some things I said in week295 of This Week’s Finds. This is a huge digression from information geometry, but I’d like to lay out the the puzzle pieces in public view, in case it helps anyone get some good ideas.

I wrote:

• Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.

I thank Arnold Neumaier for pointing out this book! It considers a fascinating generalization of Hamiltonian mechanics that applies to systems with dissipation: for example, electrical circuits with resistors, or mechanical systems with friction.

In ordinary Hamiltonian mechanics the space of states is a manifold and time evolution is a flow on this manifold determined by a smooth function called the Hamiltonian, which describes the energy of any state. In this generalization the space of states is still a manifold, but now time evolution is determined by two smooth functions: the energy and the entropy! In ordinary Hamiltonian mechanics, energy is automatically conserved. In this generalization that’s also true, but energy can go into the form of heat… and entropy automatically increases!

Mathematically, the idea goes like this. We start with a Poisson manifold, but in addition to the skew-symmetric Poisson bracket {F,G} of smooth functions on some manifold, we also have a symmetric bilinear bracket [F,G] obeying the Leibniz law

[F,GH] = [F,G]H + G[F,H]

and this positivity condition:

[F,F] ≥ 0

The time evolution of any function is given by a generalization of Hamilton’s equations:

dF/dt = {H,F} + [S,F]

where H is a function called the "energy" or "Hamiltonian", and S is a function called the "entropy". The first term on the right is the usual one. The new second term describes dissipation: as we shall see, it pushes the state towards increasing entropy.

If we require that

[H,F] = {S,F} = 0

for every function F, then we get conservation of energy, as usual in Hamiltonian mechanics:

dH/dt = {H,H} + [S,H] = 0

But we also get the second law of thermodynamics:

dS/dt = {H,S} + [S,S] ≥ 0

Entropy always increases!

Öttinger calls this framework “GENERIC” – an annoying acronym for “General Equation for the NonEquilibrium Reversible-Irreversible Coupling”. There are lots of papers about it. But I’m wondering if any geometers have looked into it!

If we didn’t need the equations [H,F] = {S,F} = 0, we could easily get the necessary brackets starting with a Kähler manifold. The imaginary part of the Kähler structure is a symplectic structure, say ω, so we can define

{F,G} = ω(dF,dG)

as usual to get Poisson brackets. The real part of the Kähler structure is a Riemannian structure, say g, so we can define

[F,G] = g(dF,dG)

This satisfies

[F,GH] = [F,G]H + G[F,H]


[F,F] ≥ 0

Don’t be fooled: this stuff is not rocket science. In particular, the inequality above has a simple meaning: when we move in the direction of the gradient of F, the function F increases. So adding the second term to Hamilton’s equations has the effect of pushing the system towards increasing entropy.

Note that I’m being a tad unorthodox by letting ω and g eat cotangent vectors instead of tangent vectors – but that’s no big deal. The big deal is this: if we start with a Kähler manifold and define brackets this way, we don’t get [H,F] = 0 or {S,F} = 0 for all functions F unless H and S are constant! That’s no good for applications to physics. To get around this problem, we would need to consider some sort of degenerate Kähler structure – one where ω and g are degenerate bilinear forms on the cotangent space.

Has anyone thought about such things? They remind me a little of "Dirac structures" and "generalized complex geometry" – but I don’t know enough about those subjects to know if they’re relevant here.

This GENERIC framework suggests that energy and entropy should be viewed as two parts of a single entity – maybe even its real and imaginary parts! And that in turn reminds me of other strange things, like the idea of using complex-valued Hamiltonians to describe dissipative systems, or the idea of “inverse temperature as imaginary time”. I can’t tell yet if there’s a big idea lurking here, or just a mess….

36 Responses to Information Geometry (Part 4)

  1. Lee Brown Jr. says:

    If the expectation of X is a value obtained by integrating over the manifold, then it is merely a number. similarly, if the metric is obtained by integrating over the manifold, then it is only a number. This differs from a traditional metric in the sense that a traditional metric varies from point to point.

    So, in one case, you a have a matrix of numbers, in another, you have a matrix of functions. Right?

    • John Baez says:

      It sounds like you’re getting a bit mixed up between two spaces that show up in classical information geometry. It’s easy to do.

      1) First we have a classical phase space \Omega. A point in here is a pure state of some physical system: for example, the position and momentum of a particle. In practice \Omega is often a manifold, but I’m not assuming this: I’m just assuming it’s a measure space with some measure that I call d \omega. A probability distribution

      p: \Omega \to [0,\infty)

      is called a mixed state of the physical system, and a real-valued measurable function

      A : \Omega \to \mathbb{R}

      is called an observable. The expectation value of an observable in a mixed state is defined by

      \langle A \rangle = \int_\Omega A(\omega) p(\omega) d\omega

      2) Next we have some space parametrizing mixed states of our physical system. Mathematically speaking, this is a statistical manifold, meaning a smooth manifold M equipped with a smooth function

      p: M \to \{\mathrm{mixed \; states}\}

      In other words: each point x \in M is assigned a probability distribution p_x on \Omega. But in my posts I never write the subscript x… and usually I write \rho instead of p, since I’m trying to discuss the classical and quantum cases together, and \rho is the usual notation for a mixed state in quantum mechanics.

      But everything in this comment is purely classical.

      … if the metric is obtained by integrating over the manifold, then it is only a number. This differs from a traditional metric in the sense that a traditional metric varies from point to point.

      The Fisher information metric is a metric on M, and it’s a ‘traditional metric’: it varies from point to point. It’s defined by doing an integral over \Omega.

      I’ve given lots of formulas for it, but here are a few more, all equivalent. I’m using local coordinates \lambda^i on the manifold M.

      For starters,

      g_{ij} = \langle \frac{\partial \ln p}{\partial \lambda^i} \frac{\partial \ln p}{\partial \lambda^j} \rangle

      but since the expectation value is defined as an integral over \Omega, we have

      g_{ij} = \int_\Omega \frac{\partial \ln p(\omega)}{\partial \lambda^i} \; \frac{\partial \ln p(\omega)}{\partial \lambda^j} \; p(\omega) \, d \omega

      or if you prefer a more heavy notation that clarifies what depends on a point x \in M:

      g_{ij}(x) = \int_\Omega \frac{\partial \ln p_x(\omega)}{\partial \lambda^i} \; \frac{\partial \ln p_x(\omega)}{\partial \lambda^j} \; p_x(\omega) \, d \omega

      We also saw that g_{ij} is a covariance matrix:

      g_{ij} = \langle (X_i - \langle X_i \rangle) \, (X_j - \langle X_j \rangle) \rangle

      where X_i are some observable-valued functions on X.
      It’s also the matrix of second partial derivatives of the logarithm of the partition function:

      g_{ij} = \frac{\partial^2}{\partial \lambda^i \partial \lambda^j} \,\ln Z

      All this stuff has a quantum version, too, and that’s what I’ve been emphasizing in this series of posts. In the quantum version the measure space \Omega is replaced by a Hilbert space, but the manifold M remains.

      • Lee Brown Jr. says:

        I have one more observation. Typically, there are two distinctly different symmetric tensors. The metric, and the stress-energy/curvature tensor.

        Could it be that the quantity you calculate would be better described as the equivalent to the stress-energy tensor? Or perhaps the stress-deviator tensor.

        Just tossing that out there.

  2. John F says:

    Why not conjugate one of the lambdas to begin with? I think it’s true that the expectation values of A* B and B* A are the same so gij is then already symmetric.

    • John Baez says:

      Alas, it’s not true that the expectation values of A^* B and B^* A are the same. If this were true, the expectation values of AB and BA would be the same when A and B are self-adjoint. But I gave a counterexample above: take A to be the position operator and B to be momentum operator. Then:

      \langle q p \rangle = \langle p q \rangle + i

      By the way: in what I wrote above, and indeed in all my posts on this topic, I’m using “observable” to mean “self-adjoint operator” (in quantum mechanics) or “real-valued measurable function on phase space” (in classical mechanics). So, when I was talking about observables A and B, I was assuming that A^* = A and B^* = A… and I used these equations in my proof that

      \langle A B \rangle^* = \langle B A \rangle

      • Lee Brown Jr. says:

        Isn’t what we consider to be actually
        P(A|B) ? That is, a conditional probability. This is not symmetric even in classical probability theory. Moreover it corresponds to the measurment process more accurately than just .

        Ordering is important, even in traditional probability theory.

        • Lee Brown Jr. says:

          sorry the formulas got left out in the text above. I was just saying that a correlation is not the same as a conditional probability, although both involve two variables in conjunction.

        • John Baez says:

          I’m not completely sure what you’re trying to say, but yeah: I’m not talking about conditional probabilities here, I’m talking about a kind of “correlation”: the expectation value of a product of two observables,

          \langle A B \rangle

          This is symmetric in the classical case but not the quantum case.

  3. phorgyphynance says:

    (By now I’ve tried to fix those posts, because people often get information from the web in a hasty way, and I don’t want my mistake to spread. But you’ll still see traces of my mistake infecting the comments on those posts.)

    Could you add a similar statement on the previous posts? My comments referred to a different version then what appears now and I’m not sure I agree with the changes, so I might not have made those comments with the material as it stands now.

  4. phorgyphynance says:

    Is it possible that the Leibniz condition up there contains a typo? I’d expect something more like:

    [F,GH] = [F,G]H + G[F,H].

  5. John Baez says:

    When I mentioned the resemblance of Öttinger’s formalism for non-equilibrium thermodynamics (with its skew-symmetric Poisson bracket and symmetric ‘dissipative bracket’) to the quantum generalization of the Fisher information metric (with its skew-symmetric imaginary part \omega and symmetric real part g), I was far from sure they were really related. But the more I think about it, the more the clues keep piling up! Here are two:

    1) The phase space in Öttinger’s formalism is a space of macrostates, or in other words, mixed states of some underlying system. This is why his phase space has an entropy function defined on it, as well as a Hamiltonian.

    So in fact, Öttinger’s phase space is precisely the sort of space that comes with a Fisher information metric: namely, a manifold M together with a smooth function

    \rho: M \to \{\mathrm{mixed \; states} \}

    2) The skew-symmetric imaginary part \omega of the quantum Fisher information metric arises from the commutators of observables, so it’s indeed closely related to the Poisson bracket in classical mechanics. To see this, recall that

    \omega_{ij} = \mathrm{Im} \,  \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})

    but since the observables X_i are defined by

    X_i = \frac{\partial \ln \rho}{\partial \lambda_i}

    we can write this as

    \omega_{ij} = \mathrm{Im} \,  \mathrm{tr} (\rho X_i X_j)

    or in other words

    \omega_{ij} = \mathrm{Im}\langle X_i  X_j\rangle

    However, I noted in this blog entry that for self-adjoint A, B we have

    \mathrm{Im} \langle A B \rangle =  \frac{1}{2} \langle [A,B] \rangle

    where [A,B] is the commutator AB -BA. So, we have

    \omega_{ij} = \frac{1}{2} \langle [X_i , X_j ]\rangle

    In short: \omega comes from taking an expectation value of a commutator! Since Poisson brackets are the classical analogue of commutators, we shouldn’t be surprised that \omega might be what gives Öttinger’s phase space the structure of a Poisson manifold.

    The story is getting even clearer now that I’m working out an example. I’ll try to report on that soon.

  6. phorgyphynance says:

    I still need to convince myself that, in general,

    -\frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j}\ln Z = \text{Re}(\langle (X_i -\langle X_i\rangle)(X_j -\langle X_j\rangle)\rangle.

    If true, then I’m back on board with you :)

    • John Baez says:

      You don’t want a minus sign there. You want

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j}\ln Z = \mathrm{Re}(\langle (X_i -\langle X_i\rangle)(X_j -\langle X_j\rangle)\rangle

      Note, I’m only claiming this equation holds in Crooks’ formalism or its quantum analogue, where M is an open subset of \mathbb{R}^n, and

      \rho : M \to \{ \textrm{mixed \; states} \}

      has the the property that \rho_x is the Gibbs state with

      \langle X_i \rangle = x_i

      for some prespecified choice of observables X_i. I’m not claiming it holds in my more general formalism, where M is an arbitrary manifold and

      \rho : M \to \{ \textrm{mixed \; states} \}

      is any smooth function taking values in the interior of the set of mixed states.

      You see, in my general formalism there’s no reason to expect that the second derivatives of the partition function Z can be expressed in terms of the X_i, which are defined using first derivatives of \ln \rho. The function

      \rho : M \to \{ \textrm{mixed \; states} \}

      is so flexible that I see no reason for an equation expressing second derivatives in terms of first derivatives! But in Crooks’ formalism they’re tightly linked.

      Second, note that in Crooks’ original formalism, which is all about classical statistical mechanics, there’s no need to take the real part: the covariance matrix is already real so we have

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j}\ln Z = (\langle (X_i -\langle X_i\rangle)(X_j -\langle X_j\rangle)\rangle

      It’s only in the quantum version that we need to take a real part on the right-hand side. And I sketched why this works. First, consider the case i = j and do the usual thermodynamics calculation to show that

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^i}\ln Z = (\langle (X_i -\langle X_i\rangle)(X_i -\langle X_i\rangle)\rangle

      Both sides are real here so we also have

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^i}\ln Z = \mathrm{Re}(\langle (X_i -\langle X_i\rangle)(X_i -\langle X_i\rangle)\rangle

      Then, since this is true in any coordinate system, we conclude

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j}\ln Z = \mathrm{Re}(\langle (X_i -\langle X_i\rangle)(X_j -\langle X_j\rangle)\rangle

      Here I’m using the fact that both sides are symmetric in i and j, and two symmetric bilinear forms that have the same diagonal entries in any basis must be equal, thanks to the polarization identity.

      But don’t trust me me, check it out. Of course you could also just calculate both sides.

    • phorgyphynance says:

      You don’t want a minus sign there.

      You’re right. I was carrying over an errant minus sign from an earlier calculation of mine.

      Of course you could also just calculate both sides.

      I know I’m old school, but I’d like to see this calculation. I’m trying (believe me, I’m even losing sleep), but haven’t been able yet to calculate both sides.

      • phorgyphynance says:

        Just to clarify…

        It is fairly easy to calculate both sides in the classical case. I’m talking about the quantum version.

      • John Baez says:

        Okay, the quantum version. I’m trying to avoid calculating

        \mathrm{Re}\,\langle \, (X_i -\langle X_i\rangle)\,(X_j -\langle X_j\rangle) \,\rangle

        directly, because there are too many noncommuting operators running around: after all,

        \langle X_i X_j \rangle = \mathrm{tr}( \rho X_i X_j )

        and while expressions built from two noncommuting operators act commutative inside a trace, that ain’t true for three.

        That’s why I want to use the polarization identity to focus on the case i = j, and bootstrap my way from there.

        So, just to get started, one question is whether you believe

        \frac{\partial^2}{\partial {\lambda^i}^2} \ln Z = \langle (X_i -\langle X_i\rangle) \,(X_i -\langle X_i\rangle) \rangle


        Z = \mathrm{tr} (e^{-\lambda^i X_i})

        and we’re computing expectation values like this:

        \langle A \rangle = \mathrm{tr}(\rho A)


        \rho = \frac{1}{Z} e^{-\lambda^i X_i}

        And I guess an even more basic question is whether you believe

        -\frac{\partial}{\partial \lambda^i} \ln Z = \langle X_i \rangle

        Even this was mildly nerve-racking at first, because on the left side we need to understand

        \frac{\partial}{\partial \lambda_i} Z = \frac{\partial}{\partial \lambda^i} \mathrm{tr}(e^{-\lambda^i X_i})

        and now we see all those noncommuting X_i. But I think the cyclic property of the trace saves us, letting us show

        \frac{\partial}{\partial \lambda^i} \mathrm{tr}(e^{-\lambda^i X_i}) =  \mathrm{tr} (- X_i e^{-\lambda^i X_i})

        with the help of the power series expansion of the exponential.

        Even if you agree with me this far, you may worry about second derivative:

        \frac{\partial^2}{\partial {\lambda^i}^2} Z = \frac{\partial^2}{\partial {\lambda^i}^2} \mathrm{tr}(e^{-\lambda^i X_i})

        In fact, now you’ve got me scared about whether this really equals

        \langle \, (X_i -\langle X_i\rangle) \, (X_i -\langle X_i\rangle)\,\rangle

        I sure thought it did! Maybe not. But in a way I won’t feel too bad if this blows up in my face, since this is foundational stuff about computing expectation values in quantum statistical mechanics, not the new stuff I’m trying to make up. I mean, if you can’t compute variances by taking the second derivative of the log of the partition function in quantum mechanics, that’s not my problem: that’s everyone’s problem!

        What I consider “my problem” is the non-symmetry of

        \langle \, (X_i -\langle X_i\rangle)\,(X_j -\langle X_j\rangle)\, \rangle

        and what this means.

      • phorgyphynance says:

        But I think the cyclic property of the trace saves us, letting us show

        \frac{\partial}{\partial \lambda^i} \mathrm{tr}(e^{-\lambda^i X_i}) = \mathrm{tr} (- X_i e^{-\lambda^i X_i})

        with the help of the power series expansion of the exponential.

        Yeah. I convinced myself of this (assuming no problems with infinity regarding the cyclicality of trace). It is kind of neat. However, that doesn’t help us when computing \frac{\partial\rho}{\partial \lambda^i}.

        Even if you agree with me this far, you may worry about second derivative

        Yeah, it’s not that I’m worried, but this is where I was having trouble. I get

        \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j} \ln Z = -\text{tr}(X_j \frac{\partial\rho}{\partial\lambda^i}).

        In the classical case, we could use a trick

        \frac{\partial\rho}{\partial\lambda^i} = \frac{\partial\ln\rho }{\partial\lambda^i} \rho = -(X_i - \langle X_i \rangle) \rho,

        but I’m not sure if we can use the same trick, i.e. the chain rule, in the quantum case.

        • phorgyphynance says:

          but I’m not sure if we can use the same trick, i.e. the chain rule, in the quantum case.

          And the cyclicality of trace doesn’t save us here.

        • phorgyphynance says:

          I got a little further showing

          \frac{\partial\rho}{\partial\lambda^i} = \langle X_i \rangle \rho + \frac{1}{Z} \frac{\partial}{\partial\lambda^i} \exp(-\lambda^k X_k)

          so that

          \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j} \ln Z = -\langle X_i\rangle\langle X_j\rangle - \text{tr}(\frac{X_j}{Z}\frac{\partial}{\partial\lambda^i} \exp(-\lambda^k X_k)).

          Unfortunately, we can’t apply the chain so the second term is not \langle X_j X_i\rangle.

        • phorgyphynance says:

          Ah. But this proves the diagonal element is what you wanted it to be, since cyclicality saves the chain rule again, i.e.

          \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^i} \ln Z =

          -\langle X_i\rangle^2 - \text{tr}(\frac{X_i}{Z}\frac{\partial}{\partial\lambda^i} \exp(-\lambda^k X_k)) =

          \langle X_i^2\rangle -\langle X_i\rangle^2 .

          I was hoping to do things the hard way without the polarization identity.

        • John Baez says:

          I’ll have to check all your formulas — some of them look unfamiliar and pretty cool. But now it’s my bed-time.

          assuming no problems with infinity regarding the cyclicality of trace…

          One does need to be a bit careful here in an infinite-dimensional Hilbert space. If the operator A is trace class and B is bounded, then A B and B A are trace class and

          \mathrm{tr}(AB) = \mathrm{tr}(BA)

          However, in applications to quantum mechanics many of our observables are unbounded self-adjoint operators, so one needs more specialized theorems. And this is not just pedantry, because at phase transitions the derivatives of \mathrm{ln} Z can become infinite, which means that some of the expressions we’re talking about do indeed become undefined, thanks to divergent infinite sums!

          But before worrying about these issues, it’s good to start by assuming all the sums converge, as they do in the finite-dimensional case, and see if the basic ideas are sound. And it sounds like that’s what you just did!

        • John Baez says:

          Okay, I checked all your formulas and I agree with them all! Nice proof.

          So, in quantum statistical mechanics we have

          \frac{\partial^2}{\partial{\lambda^i}^2}\ln Z =  \langle X_i^2\rangle -\langle X_i\rangle^2 =

          \mathrm{Re}\,\langle \, (X_i -\langle X_i\rangle)\,(X_i -\langle X_i\rangle) \,\rangle

          in every coordinate system, so by polarization we may conclude:

          \frac{\partial^2}{\partial \lambda^i \partial \lambda^j} \ln Z =

          \mathrm{Re}\,\langle \, (X_i -\langle X_i\rangle)\,(X_j -\langle X_j\rangle) \,\rangle

          This should be in a textbook somewhere! Does anyone out there know where it can be found?

        • John Baez says:

          So, in quantum statistical mechanics we have

          \frac{\partial^2}{\partial \lambda^i \partial \lambda^j} \ln Z =  \mathrm{Re}\,\langle \, (X_i -\langle X_i\rangle)\,(X_j -\langle X_j\rangle) \,\rangle

          This should be in a textbook somewhere! Does anyone out there know where it can be found?

          Formulas for quantum covariances can be found in many textbooks on nonequilibrium statistical physics.
          See e.g., Section 5.3 and 6.3 of
, where things are phrased as conventionally in terms of the Kubo inner product.

          I haven’t seen though the formula with the real part, so maybe your application of polarization is not justified?

          By the way, I am using a large font to display the text in my konqueror browser, and am dismayed that the formulas don’t scale with the remaining text. The indices are almost unreadablefor me. Perhaps this can be improved!?

        • Mike Stay says:

          Re: enlarging the images

          Install Userscripts for Konqueror

          then write a script that iterates over the images in the page served from and add or change the ‘s’ parameter. That will cause them to reload larger.

          (If you don’t know JavaScript, let me know and I’ll write it for you.)

        • John Baez says:

          Arnold wrote:

          I haven’t seen though the formula with the real part, so maybe your application of polarization is not justified?

          Polarization says that two symmetric bilinear forms B, B' with

          B(v,v) = B'(v,v)

          for all v also have

          B(v,w) = B'(v,w)

          for all v, w. The proof is easy: there’s an explicit formula for B(v,w) in terms of its ‘diagonal’ entries:

          2B(v,w) = B(v+w,v+w)-B(v,v)-B(w,w)

          as long as we’re working over a field that allows division by 2.

          The 2nd partial derivatives

          \partial_i \partial_j f

          of any smooth function define a symmetric bilinear form, and so does the quantity

          \mathrm{Re} \langle (X_i - \langle X_i \rangle)\,(X_j - \langle X_j \rangle) \rangle

          So I think everything is fine — Eric and I went over it pretty carefully, here on the blog.

          Perhaps this can be improved!?

          I’m sorry, I don’t know how. This problem would presumably arise on any WordPress math blog, e.g. Terry Tao’s blog or the Secret Blogging Seminar, which are both quite popular. So, maybe they know a solution. WordPress blogs produce LaTeX images as png files. png files are rescalable, so presumably a sufficiently smart browser could do it, e.g. with a plugin. I can create pngs in different sizes from this end:

          \int x dx

          should be a lot bigger than

          \int x dx

          But I don’t see anything at my end that helps you rescale the math symbols at your end.

          By the way, to write math equations on a WordPress blog, you just put the word ‘latex’ right after the first dollar sign:

          $latex \sqrt{3} $



          Double dollars don’t work here. So, it’s easy for you to write math on this blog, but maybe not so easy for you read it.

          I chose to move to this blog instead of a blog with technology more like that of the n-Category Cafe because many mathematicians with old browsers were reluctant to obtain the math fonts necessary to view the posts there, and that limited the readership. I thought this problem would be even worse at Azimuth, since many of the readers aren’t mathematicians.

    • phorgyphynance says:

      *light bulb*

      Now that we know

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^i} \ln Z = \langle X_i^2\rangle -\langle X_i\rangle^2 = \parallel X_i \parallel^2

      it follows directly that

      \begin{aligned}\left(\frac{\partial}{\partial\lambda^i}+\frac{\partial}{\partial\lambda^j}\right)\left(\frac{\partial}{\partial\lambda^i}+\frac{\partial}{\partial\lambda^j}\right) \ln Z &= \langle (X_i+X_j)^2\rangle - \langle X_i + X_j \rangle^2 \\ &= \parallel X_i\parallel^2 + \parallel X_j\parallel^2 + 2\text{Re}\left(\langle X_i X_j\rangle - \langle X_i\rangle\langle X_j\rangle\right), \end{aligned}

      which means that

      \frac{\partial}{\partial\lambda^i}\frac{\partial}{\partial\lambda^j} \ln Z = \text{Re}\left(\langle X_i X_j\rangle - \langle X_i\rangle\langle X_j\rangle\right) = g_{ij}.

      I’m back on board :)

      • John Baez says:

        Phorgyphynance wrote:

        I’m back on board :)


        Thanks for not believing my claim about the second derivative of \ln Z until you checked it. I’d never checked it before in the quantum case, though physicists seem to use it all the time. Now that you’re on board, I am ready to do some interesting stuff.

        Your argument above, going from the formula for \frac{\partial}{\partial \lambda^i} \frac{\partial}{\partial \lambda^i} \ln Z to the formula for \frac{\partial}{\partial \lambda^i} \frac{\partial}{\partial \lambda^j} \ln Z, uses the same trick as a well-known proof of the polarization identity. Namely: if B is a symmetric bilinear form and

        Q(x) = B(x,x)

        is the corresponding quadratic form, then

        B(x+y,x+y) = B(x,x) + B(y,y) + 2 B(x,y)

        so we can recover the bilinear form from the quadratic form:

        Q(x+y) - Q(x) - Q(y) = 2 B(x,y)

        So, if two symmetric bilinear forms give the same quadratic form, they must be equal.

        (Well, at least when we’re allowed to divide by 2! This is one reason why math over the integers mod 2 is very different than math over the real numbers or even the integers mod 3. Over the integers mod 2, there’s more information in the symmetric bilinear form than the corresponding quadratic form, so all heck breaks loose.)

      • Eric says:


        Cool :)

        Thanks for not believing my claim about the second derivative of \ln Z until you checked it.

        Well, everything seemed very clean and pretty, i.e. it “felt right”, until you threw in the \text{Re}. For a second it seemed like you were saying, “Well, what I wanted was symmetric, but I got something unsymmetric, so let’s just symmetrize it.” Symmetrizing things willy nilly doesn’t “feel right” so I was happy to see it come out once you recognize the diagonal elements can be associated with a norm.

        I have some real practical applications of this in mind if things work out the way I hope, so I’m glad to be back on board and looking forward to the rest of the ride :)

  7. streamfortyseven says:

    This is off-topic, but this guy is into math and perhaps you could use some visualisations… So here’s a game he wrote. He must have worked on 2001:A Space Odyssey in a previous life.

  8. […] Part 1     • Part 2     • Part 3     • Part 4     • Part […]

  9. Squark says:

    Can you give a quasi realistic example of a GENERIC system?

    • John Baez says:

      I would like to give a very simple example, but the examples I’ve seen are too complicated for me to summarize here. If you’re a clever guy you can use sneaky tricks to find an online copy of this book:

      • Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.

      and that’s one way to see a bunch of examples. This book is also good:

      • Georgy Lebon, David Jou and J. Casas-Vázquez, Understanding Nonequlibrium Thermodynamics, Springer, 2008.

    • John Baez says:

      By the way, someone pointed out that we don’t need

      [H,F] = \{S,F\} = 0

      for all functions F. To derive the few results I describe, it’s enough to have

      [H,S] = \{S,H\} = 0

      It seems the Öttinger assumes the stronger formulation but only uses the weaker one—see the text before equation (1.22) in his book Beyond Equilibrium Thermodynamics.

      I’m afraid I don’t remember who pointed out this fact, and I can’t find the place on this blog where they did it! But I think it’s important.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.