Before moving on, I’d like to clear up a mistake I’d been making in all my previous posts on this subject.

(By now I’ve tried to fix those posts, because people often get information from the web in a hasty way, and I don’t want my mistake to spread. But you’ll still see traces of my mistake infecting the *comments* on those posts.)

So what’s the mistake? It’s embarrassingly simple, but also simple to fix. A Riemannian metric must be symmetric:

Now, I had defined the Fisher information metric to be the so-called ‘covariance matrix’:

where are some observable-valued functions on a manifold , and the angle brackets mean “expectation value”, computed using a mixed state that also depends on the point in .

The covariance matrix is symmetric in classical mechanics, since then observables commute, so:

But it’s not symmetric is quantum mechanics! After all, suppose is the position operator for a particle, and is the momentum operator. Then according to Heisenberg

in units where Planck’s constant is 1. Taking expectation values, we get:

and in particular:

We can use this to get examples where is not symmetric.

However, it turns out that the *real part* of the covariance matrix is symmetric, even in quantum mechanics — and that’s what we should use as our Fisher information metric.

Why is the real part of the covariance matrix symmetric, even in quantum mechanics? Well, suppose is any density matrix, and and are any observables. Then by definition

so taking the complex conjugate of both sides

where I’m using an asterisk both for the complex conjugate of a number and the adjoint of an operator. But our observables are self-adjoint, and so is our density matrix, so we get

where in the second step we used the cyclic property of the trace. In short:

If we take real parts, we get something symmetric:

So, if we redefine the Fisher information metric to be the *real part* of the covariance matrix:

then it’s symmetric, as it should be.

Last time I mentioned a general setup using von Neumann algebras, that handles the classical and quantum situations simultaneously. That applies here! Taking the real part has no effect in classical mechanics, so we don’t need it there — but it doesn’t hurt, either.

Taking the real part never has any effect when , either, since the expected value of the *square* of an observable is a nonnegative number:

This has two nice consequences.

First, we get

and since this is true in *any* coordinate system, our would-be metric is indeed nonnegative. It’ll be an honest Riemannian metric whenever it’s positive definite.

Second, suppose we’re working in the special case discussed in Part 2, where our manifold is an open subset of , and at the point is the Gibbs state with . Then all the usual rules of statistical mechanics apply. So, we can compute the variance of the observable using the partition function :

In other words,

But since this is true in *any* coordinate system, we must have

(Here I’m using a little math trick: two symmetric bilinear forms whose diagonal entries agree in *any* basis must be equal. We’ve already seen that the left side is symmetric, and the right side is symmetric by a famous fact about mixed partial derivatives.)

However, I’m pretty sure this cute formula

only holds in the special case I’m talking about now, where points in are parametrizing Gibbs states in the obvious way. In general we must use

or equivalently,

Okay. So much for cleaning up Last Week’s Mess. Here’s something new. We’ve seen that whenever and are observables (that is, self-adjoint),

We got something symmetric by taking the real part:

Indeed,

But by the same reasoning, we get something *antisymmetric* by taking the *imaginary* part:

and indeed,

Commutators like are important in quantum mechanics, so maybe we shouldn’t just throw out the imaginary part of the covariance matrix in our desperate search for a Riemannian metric! Besides the symmetric tensor on our manifold :

we can also define a skew-symmetric tensor:

This will vanish in the classical case, but not in the quantum case!

If you’ve studied enough geometry, you should now be reminded of things like ‘Kähler manifolds’ and ‘almost Kähler manifolds’. A Kähler manifold is a manifold that’s equipped with a symmetric tensor and a skew-symmetric tensor which fit together in the best possible way. An almost Kähler manifold is something similar, but not quite as nice. We should probably see examples of these arising in information geometry! And that could be pretty interesting.

But in general, if we start with any old manifold together with a function taking values in mixed states, we seem to be making into something even less nice. It gets a symmetric bilinear form on each tangent space, and a skew-symmetric bilinear form , and they vary smoothly from point to point… but they might be degenerate, and I don’t see any reason for them to ‘fit together’ in the nice way we need for a Kähler or almost Kähler manifold.

However, I still think something interesting might be going on here. For one thing, there are *other* situations in physics where a space of states is equipped with a symmetric and a skew-symmetric . They show up in ‘dissipative mechanics’ — the study of systems whose entropy increases.

To conclude, let me remind you of some things I said in week295 of This Week’s Finds. This is a huge digression from information geometry, but I’d like to lay out the the puzzle pieces in public view, in case it helps anyone get some good ideas.

I wrote:

• Hans Christian Öttinger,

Beyond Equilibrium Thermodynamics, Wiley, 2005.I thank Arnold Neumaier for pointing out this book! It considers a fascinating generalization of Hamiltonian mechanics that applies to systems with dissipation: for example, electrical circuits with resistors, or mechanical systems with friction.

In ordinary Hamiltonian mechanics the space of states is a manifold and time evolution is a flow on this manifold determined by a smooth function called the Hamiltonian, which describes the

energyof any state. In this generalization the space of states is still a manifold, but now time evolution is determined by two smooth functions: the energy and theentropy!In ordinary Hamiltonian mechanics, energy is automatically conserved. In this generalization that’s also true, but energy can go into the form of heat… and entropy automaticallyincreases!Mathematically, the idea goes like this. We start with a Poisson manifold, but in addition to the skew-symmetric Poisson bracket {F,G} of smooth functions on some manifold, we also have a symmetric bilinear bracket [F,G] obeying the Leibniz law

[F,GH] = [F,G]H + G[F,H]

and this positivity condition:

[F,F] ≥ 0

The time evolution of any function is given by a generalization of Hamilton’s equations:

dF/dt = {H,F} + [S,F]

where H is a function called the "energy" or "Hamiltonian", and S is a function called the "entropy". The first term on the right is the usual one. The new second term describes dissipation: as we shall see, it pushes the state towards increasing entropy.

If we require that

[H,F] = {S,F} = 0

for every function F, then we get conservation of energy, as usual in Hamiltonian mechanics:

dH/dt = {H,H} + [S,H] = 0

But we also get the second law of thermodynamics:

dS/dt = {H,S} + [S,S] ≥ 0

Entropy always increases!

Öttinger calls this framework “GENERIC” – an annoying acronym for “General Equation for the NonEquilibrium Reversible-Irreversible Coupling”. There are lots of papers about it. But I’m wondering if any geometers have looked into it!

If we didn’t need the equations [H,F] = {S,F} = 0, we could easily get the necessary brackets starting with a Kähler manifold. The imaginary part of the Kähler structure is a symplectic structure, say ω, so we can define

{F,G} = ω(dF,dG)

as usual to get Poisson brackets. The real part of the Kähler structure is a Riemannian structure, say g, so we can define

[F,G] = g(dF,dG)

This satisfies

[F,GH] = [F,G]H + G[F,H]

and

[F,F] ≥ 0

Don’t be fooled: this stuff is not rocket science. In particular, the inequality above has a simple meaning: when we move in the direction of the gradient of F, the function F increases. So adding the second term to Hamilton’s equations has the effect of pushing the system towards increasing entropy.

Note that I’m being a tad unorthodox by letting ω and g eat cotangent vectors instead of tangent vectors – but that’s no big deal. The big deal is this: if we start with a Kähler manifold and define brackets this way, we don’t get [H,F] = 0 or {S,F} = 0 for all functions F unless H and S are constant! That’s no good for applications to physics. To get around this problem, we would need to consider some sort of

degenerateKähler structure – one where ω and g are degenerate bilinear forms on the cotangent space.Has anyone thought about such things? They remind me a little of "Dirac structures" and "generalized complex geometry" – but I don’t know enough about those subjects to know if they’re relevant here.

This GENERIC framework suggests that energy and entropy should be viewed as two parts of a single entity – maybe even its real and imaginary parts! And that in turn reminds me of other strange things, like the idea of using complex-valued Hamiltonians to describe dissipative systems, or the idea of “inverse temperature as imaginary time”. I can’t tell yet if there’s a big idea lurking here, or just a mess….

If the expectation of X is a value obtained by integrating over the manifold, then it is merely a number. similarly, if the metric is obtained by integrating over the manifold, then it is only a number. This differs from a traditional metric in the sense that a traditional metric varies from point to point.

So, in one case, you a have a matrix of numbers, in another, you have a matrix of functions. Right?

It sounds like you’re getting a bit mixed up between two spaces that show up in classical information geometry. It’s easy to do.

1) First we have a

classical phase space. A point in here is apure stateof some physical system: for example, the position and momentum of a particle. In practice is often a manifold, but I’m not assuming this: I’m just assuming it’s a measure space with some measure that I call . A probability distributionis called a

mixed stateof the physical system, and a real-valued measurable functionis called an

observable. Theexpectation valueof an observable in a mixed state is defined by2) Next we have some space parametrizing mixed states of our physical system. Mathematically speaking, this is a

statistical manifold, meaning a smooth manifold equipped with a smooth functionIn other words: each point is assigned a probability distribution on . But in my posts I never write the subscript … and usually I write instead of , since I’m trying to discuss the classical and quantum cases together, and is the usual notation for a mixed state in quantum mechanics.

But everything in

thiscomment is purely classical.The Fisher information metric is a metric on , and it’s a ‘traditional metric’: it varies from point to point. It’s defined by doing an integral over .

I’ve given lots of formulas for it, but here are a few more, all equivalent. I’m using local coordinates on the manifold .

For starters,

but since the expectation value is defined as an integral over , we have

or if you prefer a more heavy notation that clarifies what depends on a point :

We also saw that is a covariance matrix:

where are some observable-valued functions on .

It’s also the matrix of second partial derivatives of the logarithm of the partition function:

All this stuff has a quantum version, too, and that’s what I’ve been emphasizing in this series of posts. In the quantum version the measure space is replaced by a Hilbert space, but the manifold remains.

I have one more observation. Typically, there are two distinctly different symmetric tensors. The metric, and the stress-energy/curvature tensor.

Could it be that the quantity you calculate would be better described as the equivalent to the stress-energy tensor? Or perhaps the stress-deviator tensor.

Just tossing that out there.

Why not conjugate one of the lambdas to begin with? I think it’s true that the expectation values of A* B and B* A are the same so gij is then already symmetric.

Alas, it’s not true that the expectation values of and are the same. If this were true, the expectation values of and would be the same when and are self-adjoint. But I gave a counterexample above: take to be the position operator and to be momentum operator. Then:

By the way: in what I wrote above, and indeed in all my posts on this topic, I’m using “observable” to mean “self-adjoint operator” (in quantum mechanics) or “real-valued measurable function on phase space” (in classical mechanics). So, when I was talking about observables and , I was assuming that and … and I used these equations in my proof that

Isn’t what we consider to be actually

P(A|B) ? That is, a conditional probability. This is not symmetric even in classical probability theory. Moreover it corresponds to the measurment process more accurately than just .

Ordering is important, even in traditional probability theory.

sorry the formulas got left out in the text above. I was just saying that a correlation is not the same as a conditional probability, although both involve two variables in conjunction.

I’m not completely sure what you’re trying to say, but yeah: I’m not talking about conditional probabilities here, I’m talking about a kind of “correlation”: the expectation value of a product of two observables,

This is symmetric in the classical case but not the quantum case.

Could you add a similar statement on the previous posts? My comments referred to a different version then what appears now and I’m not sure I agree with the changes, so I might not have made those comments with the material as it stands now.

Will do.

Is it possible that the Leibniz condition up there contains a typo? I’d expect something more like:

You’re right. I’ll fix the Leibniz condition here and back in “week295″, which apparently nobody read.

When I mentioned the resemblance of Öttinger’s formalism for non-equilibrium thermodynamics (with its skew-symmetric Poisson bracket and symmetric ‘dissipative bracket’) to the quantum generalization of the Fisher information metric (with its skew-symmetric imaginary part and symmetric real part ), I was far from sure they were really related. But the more I think about it, the more the clues keep piling up! Here are two:

1) The phase space in Öttinger’s formalism is a space of macrostates, or in other words,

mixed statesof some underlying system. This is why his phase space has an entropy function defined on it, as well as a Hamiltonian.So in fact, Öttinger’s phase space is precisely the sort of space that comes with a Fisher information metric: namely, a manifold together with a smooth function

2) The skew-symmetric imaginary part of the quantum Fisher information metric arises from the commutators of observables, so it’s indeed closely related to the Poisson bracket in classical mechanics. To see this, recall that

but since the observables are defined by

we can write this as

or in other words

However, I noted in this blog entry that for self-adjoint we have

where is the commutator . So, we have

In short: comes from taking an expectation value of a commutator! Since Poisson brackets are the classical analogue of commutators, we shouldn’t be surprised that might be what gives Öttinger’s phase space the structure of a Poisson manifold.

The story is getting even clearer now that I’m working out an example. I’ll try to report on that soon.

I still need to convince myself that, in general,

If true, then I’m back on board with you :)

You don’t want a minus sign there. You want

Note, I’m only claiming this equation holds in Crooks’ formalism or its quantum analogue, where is an open subset of , and

has the the property that is the Gibbs state with

for some prespecified choice of observables . I’m not claiming it holds in my more general formalism, where is an arbitrary manifold and

is

anysmooth function taking values in the interior of the set of mixed states.You see, in my general formalism there’s no reason to expect that the second derivatives of the partition function can be expressed in terms of the , which are defined using

firstderivatives of . The functionis so flexible that I see no reason for an equation expressing second derivatives in terms of first derivatives! But in Crooks’ formalism they’re tightly linked.

Second, note that in Crooks’ original formalism, which is all about

classicalstatistical mechanics, there’s no need to take the real part: the covariance matrix is already real so we haveIt’s only in the

quantumversion that we need to take a real part on the right-hand side. And I sketched why this works. First, consider the case and do the usual thermodynamics calculation to show thatBoth sides are real here so we also have

Then, since this is true

in any coordinate system, we concludeHere I’m using the fact that both sides are symmetric in and , and two symmetric bilinear forms that have the same diagonal entries

in any basismust be equal, thanks to the polarization identity.But don’t trust me me, check it out. Of course you could also just calculate both sides.

You’re right. I was carrying over an errant minus sign from an earlier calculation of mine.

I know I’m old school, but I’d like to see this calculation. I’m trying (believe me, I’m even losing sleep), but haven’t been able yet to calculate both sides.

Just to clarify…

It is fairly easy to calculate both sides in the classical case. I’m talking about the quantum version.

Okay, the quantum version. I’m trying to avoid calculating

directly, because there are too many noncommuting operators running around: after all,

and while expressions built from

twononcommuting operators act commutative inside a trace, that ain’t true forthree.That’s why I want to use the polarization identity to focus on the case , and bootstrap my way from there.

So, just to get started, one question is whether you believe

where

and we’re computing expectation values like this:

with

And I guess an even more basic question is whether you believe

Even this was mildly nerve-racking at first, because on the left side we need to understand

and now we see all those noncommuting . But I think the cyclic property of the trace saves us, letting us show

with the help of the power series expansion of the exponential.

Even if you agree with me this far, you may worry about second derivative:

In fact, now you’ve got me scared about whether this really equals

I sure

thoughtit did! Maybe not. But in a way I won’t feel too bad ifthisblows up in my face, since this is foundational stuff about computing expectation values in quantum statistical mechanics,notthe new stuff I’m trying to make up. I mean, if you can’t compute variances by taking the second derivative of the log of the partition function in quantum mechanics, that’s not my problem: that’severyone’sproblem!What I consider “my problem” is the non-symmetry of

and what this means.

Yeah. I convinced myself of this (assuming no problems with infinity regarding the cyclicality of trace). It is kind of neat. However, that doesn’t help us when computing

Yeah, it’s not that I’m worried, but this is where I was having trouble. I get

In the classical case, we could use a trick

but I’m not sure if we can use the same trick, i.e. the chain rule, in the quantum case.

And the cyclicality of trace doesn’t save us here.

I got a little further showing

so that

Unfortunately, we can’t apply the chain so the second term is not

Ah. But this proves the diagonal element is what you wanted it to be, since cyclicality saves the chain rule again, i.e.

I was hoping to do things the hard way without the polarization identity.

I’ll have to check all your formulas — some of them look unfamiliar and pretty cool. But now it’s my bed-time.

One does need to be a bit careful here in an infinite-dimensional Hilbert space. If the operator is trace class and is bounded, then and are trace class and

However, in applications to quantum mechanics many of our observables are

unboundedself-adjoint operators, so one needs more specialized theorems. And this is not just pedantry, because at phase transitions the derivatives of can become infinite, which means that some of the expressions we’re talking aboutdo indeedbecome undefined, thanks to divergent infinite sums!But before worrying about these issues, it’s good to start by assuming all the sums converge, as they do in the finite-dimensional case, and see if the basic ideas are sound. And it sounds like that’s what you just did!

Okay, I checked all your formulas and I agree with them all! Nice proof.

So, in quantum statistical mechanics we have

in every coordinate system, so by polarization we may conclude:

This should be in a textbook somewhere! Does anyone out there know where it can be found?

John Baez says:

Formulas for quantum covariances can be found in many textbooks on nonequilibrium statistical physics.

See e.g., Section 5.3 and 6.3 of

http://de.arxiv.org/pdf/0810.1019v1, where things are phrased as conventionally in terms of the Kubo inner product.

I haven’t seen though the formula with the real part, so maybe your application of polarization is not justified?

By the way, I am using a large font to display the text in my konqueror browser, and am dismayed that the formulas don’t scale with the remaining text. The indices are almost unreadablefor me. Perhaps this can be improved!?

Re: enlarging the images

Install Userscripts for Konqueror

http://kde-apps.org/content/show.php?content=51482

then write a script that iterates over the images in the page served from l.wordpress.com and add or change the ‘s’ parameter. That will cause them to reload larger.

(If you don’t know JavaScript, let me know and I’ll write it for you.)

Arnold wrote:

Polarization says that two symmetric bilinear forms , with

for all also have

for all . The proof is easy: there’s an explicit formula for in terms of its ‘diagonal’ entries:

as long as we’re working over a field that allows division by 2.

The 2nd partial derivatives

of any smooth function define a symmetric bilinear form, and so does the quantity

So I think everything is fine — Eric and I went over it pretty carefully, here on the blog.

I’m sorry, I don’t know how. This problem would presumably arise on any WordPress math blog, e.g. Terry Tao’s blog or the Secret Blogging Seminar, which are both quite popular. So, maybe they know a solution. WordPress blogs produce LaTeX images as png files. png files are rescalable, so presumably a sufficiently smart browser could do it, e.g. with a plugin. I can create pngs in different sizes from this end:

should be a lot bigger than

But I don’t see anything at my end that helps you rescale the math symbols at your end.

By the way, to write math equations on a WordPress blog, you just put the word ‘latex’ right after the first dollar sign:

$latex \sqrt{3} $

produes

Double dollars don’t work here. So, it’s easy for you to write math on this blog, but maybe not so easy for you read it.

I chose to move to this blog instead of a blog with technology more like that of the n-Category Cafe because many mathematicians with old browsers were reluctant to obtain the math fonts necessary to view the posts there, and that limited the readership. I thought this problem would be even worse at Azimuth, since many of the readers aren’t mathematicians.

*light bulb*

Now that we know

it follows directly that

which means that

I’m back on board :)

Phorgyphynance wrote:

Great!

Thanks for not believing my claim about the second derivative of until you checked it. I’d never checked it before in the quantum case, though physicists seem to use it all the time. Now that you’re on board, I am ready to do some interesting stuff.

Your argument above, going from the formula for to the formula for , uses the same trick as a well-known proof of the polarization identity. Namely: if is a symmetric bilinear form and

is the corresponding quadratic form, then

so we can recover the bilinear form from the quadratic form:

So, if two symmetric bilinear forms give the same quadratic form, they must be equal.

(Well, at least when we’re allowed to divide by 2! This is one reason why math over the integers mod 2 is very different than math over the real numbers or even the integers mod 3. Over the integers mod 2, there’s more information in the symmetric bilinear form than the corresponding quadratic form, so all heck breaks loose.)

Cool :)

Well, everything seemed very clean and pretty, i.e. it “felt right”, until you threw in the . For a second it seemed like you were saying, “Well, what I wanted was symmetric, but I got something unsymmetric, so let’s just symmetrize it.” Symmetrizing things willy nilly doesn’t “feel right” so I was happy to see it come out once you recognize the diagonal elements can be associated with a norm.

I have some real practical applications of this in mind if things work out the way I hope, so I’m glad to be back on board and looking forward to the rest of the ride :)

Some al practical applications, eh? Great!

This is off-topic, but this guy is into math and perhaps you could use some visualisations… So here’s a game he wrote. He must have worked on 2001:A Space Odyssey in a previous life. http://dmytry.pandromeda.com/games/index.html

[...] Part 1 • Part 2 • Part 3 • Part 4 • Part [...]

Can you give a quasi realistic example of a GENERIC system?

I would like to give a very simple example, but the examples I’ve seen are too complicated for me to summarize here. If you’re a clever guy you can use sneaky tricks to find an online copy of this book:

• Hans Christian Öttinger,

Beyond Equilibrium Thermodynamics, Wiley, 2005.and that’s one way to see a bunch of examples. This book is also good:

• Georgy Lebon, David Jou and J. Casas-Vázquez,

Understanding Nonequlibrium Thermodynamics, Springer, 2008.By the way, someone pointed out that we don’t need

for all functions . To derive the few results I describe, it’s enough to have

It seems the Öttinger assumes the stronger formulation but only uses the weaker one—see the text before equation (1.22) in his book

Beyond Equilibrium Thermodynamics.I’m afraid I don’t remember who pointed out this fact, and I can’t find the place on this blog where they did it! But I think it’s important.