Before moving on, I’d like to clear up a mistake I’d been making in all my previous posts on this subject.
(By now I’ve tried to fix those posts, because people often get information from the web in a hasty way, and I don’t want my mistake to spread. But you’ll still see traces of my mistake infecting the comments on those posts.)
So what’s the mistake? It’s embarrassingly simple, but also simple to fix. A Riemannian metric must be symmetric:
Now, I had defined the Fisher information metric to be the so-called ‘covariance matrix’:
where are some observable-valued functions on a manifold
, and the angle brackets mean “expectation value”, computed using a mixed state
that also depends on the point in
.
The covariance matrix is symmetric in classical mechanics, since then observables commute, so:
But it’s not symmetric is quantum mechanics! After all, suppose is the position operator for a particle, and
is the momentum operator. Then according to Heisenberg
in units where Planck’s constant is 1. Taking expectation values, we get:
and in particular:
We can use this to get examples where is not symmetric.
However, it turns out that the real part of the covariance matrix is symmetric, even in quantum mechanics — and that’s what we should use as our Fisher information metric.
Why is the real part of the covariance matrix symmetric, even in quantum mechanics? Well, suppose is any density matrix, and
and
are any observables. Then by definition
so taking the complex conjugate of both sides
where I’m using an asterisk both for the complex conjugate of a number and the adjoint of an operator. But our observables are self-adjoint, and so is our density matrix, so we get
where in the second step we used the cyclic property of the trace. In short:
If we take real parts, we get something symmetric:
So, if we redefine the Fisher information metric to be the real part of the covariance matrix:
then it’s symmetric, as it should be.
Last time I mentioned a general setup using von Neumann algebras, that handles the classical and quantum situations simultaneously. That applies here! Taking the real part has no effect in classical mechanics, so we don’t need it there — but it doesn’t hurt, either.
Taking the real part never has any effect when , either, since the expected value of the square of an observable is a nonnegative number:
This has two nice consequences.
First, we get
and since this is true in any coordinate system, our would-be metric is indeed nonnegative. It’ll be an honest Riemannian metric whenever it’s positive definite.
Second, suppose we’re working in the special case discussed in Part 2, where our manifold is an open subset of , and
at the point
is the Gibbs state with
. Then all the usual rules of statistical mechanics apply. So, we can compute the variance of the observable
using the partition function
:
In other words,
But since this is true in any coordinate system, we must have
(Here I’m using a little math trick: two symmetric bilinear forms whose diagonal entries agree in any basis must be equal. We’ve already seen that the left side is symmetric, and the right side is symmetric by a famous fact about mixed partial derivatives.)
However, I’m pretty sure this cute formula
only holds in the special case I’m talking about now, where points in are parametrizing Gibbs states in the obvious way. In general we must use
or equivalently,
Okay. So much for cleaning up Last Week’s Mess. Here’s something new. We’ve seen that whenever and
are observables (that is, self-adjoint),
We got something symmetric by taking the real part:
Indeed,
But by the same reasoning, we get something antisymmetric by taking the imaginary part:
and indeed,
Commutators like are important in quantum mechanics, so maybe we shouldn’t just throw out the imaginary part of the covariance matrix in our desperate search for a Riemannian metric! Besides the symmetric tensor on our manifold
:
we can also define a skew-symmetric tensor:
This will vanish in the classical case, but not in the quantum case!
If you’ve studied enough geometry, you should now be reminded of things like ‘Kähler manifolds’ and ‘almost Kähler manifolds’. A Kähler manifold is a manifold that’s equipped with a symmetric tensor and a skew-symmetric tensor
which fit together in the best possible way. An almost Kähler manifold is something similar, but not quite as nice. We should probably see examples of these arising in information geometry! And that could be pretty interesting.
But in general, if we start with any old manifold together with a function
taking values in mixed states, we seem to be making
into something even less nice. It gets a symmetric bilinear form
on each tangent space, and a skew-symmetric bilinear form
, and they vary smoothly from point to point… but they might be degenerate, and I don’t see any reason for them to ‘fit together’ in the nice way we need for a Kähler or almost Kähler manifold.
However, I still think something interesting might be going on here. For one thing, there are other situations in physics where a space of states is equipped with a symmetric and a skew-symmetric
. They show up in ‘dissipative mechanics’ — the study of systems whose entropy increases.
To conclude, let me remind you of some things I said in week295 of This Week’s Finds. This is a huge digression from information geometry, but I’d like to lay out the the puzzle pieces in public view, in case it helps anyone get some good ideas.
I wrote:
• Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.
I thank Arnold Neumaier for pointing out this book! It considers a fascinating generalization of Hamiltonian mechanics that applies to systems with dissipation: for example, electrical circuits with resistors, or mechanical systems with friction.
In ordinary Hamiltonian mechanics the space of states is a manifold and time evolution is a flow on this manifold determined by a smooth function called the Hamiltonian, which describes the energy of any state. In this generalization the space of states is still a manifold, but now time evolution is determined by two smooth functions: the energy and the entropy! In ordinary Hamiltonian mechanics, energy is automatically conserved. In this generalization that’s also true, but energy can go into the form of heat… and entropy automatically increases!
Mathematically, the idea goes like this. We start with a Poisson manifold, but in addition to the skew-symmetric Poisson bracket {F,G} of smooth functions on some manifold, we also have a symmetric bilinear bracket [F,G] obeying the Leibniz law
[F,GH] = [F,G]H + G[F,H]
and this positivity condition:
[F,F] ≥ 0
The time evolution of any function is given by a generalization of Hamilton’s equations:
dF/dt = {H,F} + [S,F]
where H is a function called the "energy" or "Hamiltonian", and S is a function called the "entropy". The first term on the right is the usual one. The new second term describes dissipation: as we shall see, it pushes the state towards increasing entropy.
If we require that
[H,F] = {S,F} = 0
for every function F, then we get conservation of energy, as usual in Hamiltonian mechanics:
dH/dt = {H,H} + [S,H] = 0
But we also get the second law of thermodynamics:
dS/dt = {H,S} + [S,S] ≥ 0
Entropy always increases!
Öttinger calls this framework “GENERIC” – an annoying acronym for “General Equation for the NonEquilibrium Reversible-Irreversible Coupling”. There are lots of papers about it. But I’m wondering if any geometers have looked into it!
If we didn’t need the equations [H,F] = {S,F} = 0, we could easily get the necessary brackets starting with a Kähler manifold. The imaginary part of the Kähler structure is a symplectic structure, say ω, so we can define
{F,G} = ω(dF,dG)
as usual to get Poisson brackets. The real part of the Kähler structure is a Riemannian structure, say g, so we can define
[F,G] = g(dF,dG)
This satisfies
[F,GH] = [F,G]H + G[F,H]
and
[F,F] ≥ 0
Don’t be fooled: this stuff is not rocket science. In particular, the inequality above has a simple meaning: when we move in the direction of the gradient of F, the function F increases. So adding the second term to Hamilton’s equations has the effect of pushing the system towards increasing entropy.
Note that I’m being a tad unorthodox by letting ω and g eat cotangent vectors instead of tangent vectors – but that’s no big deal. The big deal is this: if we start with a Kähler manifold and define brackets this way, we don’t get [H,F] = 0 or {S,F} = 0 for all functions F unless H and S are constant! That’s no good for applications to physics. To get around this problem, we would need to consider some sort of degenerate Kähler structure – one where ω and g are degenerate bilinear forms on the cotangent space.
Has anyone thought about such things? They remind me a little of "Dirac structures" and "generalized complex geometry" – but I don’t know enough about those subjects to know if they’re relevant here.
This GENERIC framework suggests that energy and entropy should be viewed as two parts of a single entity – maybe even its real and imaginary parts! And that in turn reminds me of other strange things, like the idea of using complex-valued Hamiltonians to describe dissipative systems, or the idea of “inverse temperature as imaginary time”. I can’t tell yet if there’s a big idea lurking here, or just a mess….
If the expectation of X is a value obtained by integrating over the manifold, then it is merely a number. similarly, if the metric is obtained by integrating over the manifold, then it is only a number. This differs from a traditional metric in the sense that a traditional metric varies from point to point.
So, in one case, you a have a matrix of numbers, in another, you have a matrix of functions. Right?
It sounds like you’re getting a bit mixed up between two spaces that show up in classical information geometry. It’s easy to do.
1) First we have a classical phase space
. A point in here is a pure state of some physical system: for example, the position and momentum of a particle. In practice
is often a manifold, but I’m not assuming this: I’m just assuming it’s a measure space with some measure that I call
. A probability distribution
is called a mixed state of the physical system, and a real-valued measurable function
is called an observable. The expectation value of an observable in a mixed state is defined by
2) Next we have some space parametrizing mixed states of our physical system. Mathematically speaking, this is a statistical manifold, meaning a smooth manifold
equipped with a smooth function
In other words: each point
is assigned a probability distribution
on
. But in my posts I never write the subscript
… and usually I write
instead of
, since I’m trying to discuss the classical and quantum cases together, and
is the usual notation for a mixed state in quantum mechanics.
But everything in this comment is purely classical.
The Fisher information metric is a metric on
, and it’s a ‘traditional metric’: it varies from point to point. It’s defined by doing an integral over
.
I’ve given lots of formulas for it, but here are a few more, all equivalent. I’m using local coordinates
on the manifold
.
For starters,
but since the expectation value is defined as an integral over
, we have
or if you prefer a more heavy notation that clarifies what depends on a point
:
We also saw that
is a covariance matrix:
where
are some observable-valued functions on
.
It’s also the matrix of second partial derivatives of the logarithm of the partition function:
All this stuff has a quantum version, too, and that’s what I’ve been emphasizing in this series of posts. In the quantum version the measure space
is replaced by a Hilbert space, but the manifold
remains.
I have one more observation. Typically, there are two distinctly different symmetric tensors. The metric, and the stress-energy/curvature tensor.
Could it be that the quantity you calculate would be better described as the equivalent to the stress-energy tensor? Or perhaps the stress-deviator tensor.
Just tossing that out there.
Why not conjugate one of the lambdas to begin with? I think it’s true that the expectation values of A* B and B* A are the same so gij is then already symmetric.
Alas, it’s not true that the expectation values of
and
are the same. If this were true, the expectation values of
and
would be the same when
and
are self-adjoint. But I gave a counterexample above: take
to be the position operator and
to be momentum operator. Then:
By the way: in what I wrote above, and indeed in all my posts on this topic, I’m using “observable” to mean “self-adjoint operator” (in quantum mechanics) or “real-valued measurable function on phase space” (in classical mechanics). So, when I was talking about observables
and
, I was assuming that
and
… and I used these equations in my proof that
Isn’t what we consider to be actually
P(A|B) ? That is, a conditional probability. This is not symmetric even in classical probability theory. Moreover it corresponds to the measurment process more accurately than just .
Ordering is important, even in traditional probability theory.
sorry the formulas got left out in the text above. I was just saying that a correlation is not the same as a conditional probability, although both involve two variables in conjunction.
I’m not completely sure what you’re trying to say, but yeah: I’m not talking about conditional probabilities here, I’m talking about a kind of “correlation”: the expectation value of a product of two observables,
This is symmetric in the classical case but not the quantum case.
Could you add a similar statement on the previous posts? My comments referred to a different version then what appears now and I’m not sure I agree with the changes, so I might not have made those comments with the material as it stands now.
Will do.
Is it possible that the Leibniz condition up there contains a typo? I’d expect something more like:
You’re right. I’ll fix the Leibniz condition here and back in “week295”, which apparently nobody read.
When I mentioned the resemblance of Öttinger’s formalism for non-equilibrium thermodynamics (with its skew-symmetric Poisson bracket and symmetric ‘dissipative bracket’) to the quantum generalization of the Fisher information metric (with its skew-symmetric imaginary part
and symmetric real part
), I was far from sure they were really related. But the more I think about it, the more the clues keep piling up! Here are two:
1) The phase space in Öttinger’s formalism is a space of macrostates, or in other words, mixed states of some underlying system. This is why his phase space has an entropy function defined on it, as well as a Hamiltonian.
So in fact, Öttinger’s phase space is precisely the sort of space that comes with a Fisher information metric: namely, a manifold
together with a smooth function
2) The skew-symmetric imaginary part
of the quantum Fisher information metric arises from the commutators of observables, so it’s indeed closely related to the Poisson bracket in classical mechanics. To see this, recall that
but since the observables
are defined by
we can write this as
or in other words
However, I noted in this blog entry that for self-adjoint
we have
where
is the commutator
. So, we have
In short:
comes from taking an expectation value of a commutator! Since Poisson brackets are the classical analogue of commutators, we shouldn’t be surprised that
might be what gives Öttinger’s phase space the structure of a Poisson manifold.
The story is getting even clearer now that I’m working out an example. I’ll try to report on that soon.
I still need to convince myself that, in general,
If true, then I’m back on board with you :)
You don’t want a minus sign there. You want
Note, I’m only claiming this equation holds in Crooks’ formalism or its quantum analogue, where
is an open subset of
, and
has the the property that
is the Gibbs state with
for some prespecified choice of observables
. I’m not claiming it holds in my more general formalism, where
is an arbitrary manifold and
is any smooth function taking values in the interior of the set of mixed states.
You see, in my general formalism there’s no reason to expect that the second derivatives of the partition function
can be expressed in terms of the
, which are defined using first derivatives of
. The function
is so flexible that I see no reason for an equation expressing second derivatives in terms of first derivatives! But in Crooks’ formalism they’re tightly linked.
Second, note that in Crooks’ original formalism, which is all about classical statistical mechanics, there’s no need to take the real part: the covariance matrix is already real so we have
It’s only in the quantum version that we need to take a real part on the right-hand side. And I sketched why this works. First, consider the case
and do the usual thermodynamics calculation to show that
Both sides are real here so we also have
Then, since this is true in any coordinate system, we conclude
Here I’m using the fact that both sides are symmetric in
and
, and two symmetric bilinear forms that have the same diagonal entries in any basis must be equal, thanks to the polarization identity.
But don’t trust me me, check it out. Of course you could also just calculate both sides.
You’re right. I was carrying over an errant minus sign from an earlier calculation of mine.
I know I’m old school, but I’d like to see this calculation. I’m trying (believe me, I’m even losing sleep), but haven’t been able yet to calculate both sides.
Just to clarify…
It is fairly easy to calculate both sides in the classical case. I’m talking about the quantum version.
Okay, the quantum version. I’m trying to avoid calculating
directly, because there are too many noncommuting operators running around: after all,
and while expressions built from two noncommuting operators act commutative inside a trace, that ain’t true for three.
That’s why I want to use the polarization identity to focus on the case
, and bootstrap my way from there.
So, just to get started, one question is whether you believe
where
and we’re computing expectation values like this:
with
And I guess an even more basic question is whether you believe
Even this was mildly nerve-racking at first, because on the left side we need to understand
and now we see all those noncommuting
. But I think the cyclic property of the trace saves us, letting us show
with the help of the power series expansion of the exponential.
Even if you agree with me this far, you may worry about second derivative:
In fact, now you’ve got me scared about whether this really equals
I sure thought it did! Maybe not. But in a way I won’t feel too bad if this blows up in my face, since this is foundational stuff about computing expectation values in quantum statistical mechanics, not the new stuff I’m trying to make up. I mean, if you can’t compute variances by taking the second derivative of the log of the partition function in quantum mechanics, that’s not my problem: that’s everyone’s problem!
What I consider “my problem” is the non-symmetry of
and what this means.
Yeah. I convinced myself of this (assuming no problems with infinity regarding the cyclicality of trace). It is kind of neat. However, that doesn’t help us when computing
Yeah, it’s not that I’m worried, but this is where I was having trouble. I get
In the classical case, we could use a trick
but I’m not sure if we can use the same trick, i.e. the chain rule, in the quantum case.
And the cyclicality of trace doesn’t save us here.
I got a little further showing
so that
Unfortunately, we can’t apply the chain so the second term is not
Ah. But this proves the diagonal element is what you wanted it to be, since cyclicality saves the chain rule again, i.e.
I was hoping to do things the hard way without the polarization identity.
I’ll have to check all your formulas — some of them look unfamiliar and pretty cool. But now it’s my bed-time.
One does need to be a bit careful here in an infinite-dimensional Hilbert space. If the operator
is trace class and
is bounded, then
and
are trace class and
However, in applications to quantum mechanics many of our observables are unbounded self-adjoint operators, so one needs more specialized theorems. And this is not just pedantry, because at phase transitions the derivatives of
can become infinite, which means that some of the expressions we’re talking about do indeed become undefined, thanks to divergent infinite sums!
But before worrying about these issues, it’s good to start by assuming all the sums converge, as they do in the finite-dimensional case, and see if the basic ideas are sound. And it sounds like that’s what you just did!
Okay, I checked all your formulas and I agree with them all! Nice proof.
So, in quantum statistical mechanics we have
in every coordinate system, so by polarization we may conclude:
This should be in a textbook somewhere! Does anyone out there know where it can be found?
John Baez says:
Formulas for quantum covariances can be found in many textbooks on nonequilibrium statistical physics.
See e.g., Section 5.3 and 6.3 of
http://de.arxiv.org/pdf/0810.1019v1, where things are phrased as conventionally in terms of the Kubo inner product.
I haven’t seen though the formula with the real part, so maybe your application of polarization is not justified?
By the way, I am using a large font to display the text in my konqueror browser, and am dismayed that the formulas don’t scale with the remaining text. The indices are almost unreadablefor me. Perhaps this can be improved!?
Re: enlarging the images
Install Userscripts for Konqueror
http://kde-apps.org/content/show.php?content=51482
then write a script that iterates over the images in the page served from l.wordpress.com and add or change the ‘s’ parameter. That will cause them to reload larger.
(If you don’t know JavaScript, let me know and I’ll write it for you.)
Arnold wrote:
Polarization says that two symmetric bilinear forms
,
with
for all
also have
for all
. The proof is easy: there’s an explicit formula for
in terms of its ‘diagonal’ entries:
as long as we’re working over a field that allows division by 2.
The 2nd partial derivatives
of any smooth function define a symmetric bilinear form, and so does the quantity
So I think everything is fine — Eric and I went over it pretty carefully, here on the blog.
I’m sorry, I don’t know how. This problem would presumably arise on any WordPress math blog, e.g. Terry Tao’s blog or the Secret Blogging Seminar, which are both quite popular. So, maybe they know a solution. WordPress blogs produce LaTeX images as png files. png files are rescalable, so presumably a sufficiently smart browser could do it, e.g. with a plugin. I can create pngs in different sizes from this end:
should be a lot bigger than
But I don’t see anything at my end that helps you rescale the math symbols at your end.
By the way, to write math equations on a WordPress blog, you just put the word ‘latex’ right after the first dollar sign:
$latex \sqrt{3} $
produes
Double dollars don’t work here. So, it’s easy for you to write math on this blog, but maybe not so easy for you read it.
I chose to move to this blog instead of a blog with technology more like that of the n-Category Cafe because many mathematicians with old browsers were reluctant to obtain the math fonts necessary to view the posts there, and that limited the readership. I thought this problem would be even worse at Azimuth, since many of the readers aren’t mathematicians.
*light bulb*
Now that we know
it follows directly that
which means that
I’m back on board :)
Phorgyphynance wrote:
Great!
Thanks for not believing my claim about the second derivative of
until you checked it. I’d never checked it before in the quantum case, though physicists seem to use it all the time. Now that you’re on board, I am ready to do some interesting stuff.
Your argument above, going from the formula for
to the formula for
, uses the same trick as a well-known proof of the polarization identity. Namely: if
is a symmetric bilinear form and
is the corresponding quadratic form, then
so we can recover the bilinear form from the quadratic form:
So, if two symmetric bilinear forms give the same quadratic form, they must be equal.
(Well, at least when we’re allowed to divide by 2! This is one reason why math over the integers mod 2 is very different than math over the real numbers or even the integers mod 3. Over the integers mod 2, there’s more information in the symmetric bilinear form than the corresponding quadratic form, so all heck breaks loose.)
Cool :)
Well, everything seemed very clean and pretty, i.e. it “felt right”, until you threw in the
. For a second it seemed like you were saying, “Well, what I wanted was symmetric, but I got something unsymmetric, so let’s just symmetrize it.” Symmetrizing things willy nilly doesn’t “feel right” so I was happy to see it come out once you recognize the diagonal elements can be associated with a norm.
I have some real practical applications of this in mind if things work out the way I hope, so I’m glad to be back on board and looking forward to the rest of the ride :)
Some
al practical applications, eh? Great!
This is off-topic, but this guy is into math and perhaps you could use some visualisations… So here’s a game he wrote. He must have worked on 2001:A Space Odyssey in a previous life. http://dmytry.pandromeda.com/games/index.html
[…] Part 1 • Part 2 • Part 3 • Part 4 • Part […]
Can you give a quasi realistic example of a GENERIC system?
I would like to give a very simple example, but the examples I’ve seen are too complicated for me to summarize here. If you’re a clever guy you can use sneaky tricks to find an online copy of this book:
• Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.
and that’s one way to see a bunch of examples. This book is also good:
• Georgy Lebon, David Jou and J. Casas-Vázquez, Understanding Nonequlibrium Thermodynamics, Springer, 2008.
By the way, someone pointed out that we don’t need
for all functions
. To derive the few results I describe, it’s enough to have
It seems the Öttinger assumes the stronger formulation but only uses the weaker one—see the text before equation (1.22) in his book Beyond Equilibrium Thermodynamics.
I’m afraid I don’t remember who pointed out this fact, and I can’t find the place on this blog where they did it! But I think it’s important.