So far in this series of posts I’ve been explaining a paper by Gavin Crooks. Now I want to go ahead and explain a little research of my own.
I’m not claiming my results are new — indeed I have no idea whether they are, and I’d like to hear from any experts who might know. I’m just claiming that this is some work I did last weekend.
People sometimes worry that if they explain their ideas before publishing them, someone will ‘steal’ them. But I think this overestimates the value of ideas, at least in esoteric fields like mathematical physics. The problem is not people stealing your ideas: the hard part is giving them away. And let’s face it, people in love with math and physics will do research unless you actively stop them. I’m reminded of this scene from the Marx Brothers movie where Harpo and Chico, playing wandering musicians, walk into a hotel and offer to play:
Groucho: What do you fellows get an hour?
Chico: Oh, for playing we getta ten dollars an hour.
Groucho: I see…What do you get for not playing?
Chico: Twelve dollars an hour.
Groucho: Well, clip me off a piece of that.
Chico: Now, for rehearsing we make special rate. Thatsa fifteen dollars an hour.
Groucho: That’s for rehearsing?
Chico: Thatsa for rehearsing.
Groucho: And what do you get for not rehearsing?
Chico: You couldn’t afford it.
So, I’m just rehearsing in public here — but I of course I hope to write a paper about this stuff someday, once I get enough material.
Remember where we were. We had considered a manifold — let’s finally give it a name, say — that parametrizes Gibbs states of some physical system. By Gibbs state, I mean a state that maximizes entropy subject to constraints on the expected values of some observables. And we had seen that in favorable cases, we get a Riemannian metric on
! It looks like this:
where are our observables, and the angle bracket means ‘expected value’.
All this applies to both classical or quantum mechanics. Crooks wrote down a beautiful formula for this metric in the classical case. But since I’m at the Centre for Quantum Technologies, not the Centre for Classical Technologies, I redid his calculation in the quantum case. The big difference is that in quantum mechanics, observables don’t commute! But in the calculations I did, that didn’t seem to matter much — mainly because I took a lot of traces, which imposes a kind of commutativity:
In fact, if I’d wanted to show off, I could have done the classical and quantum cases simultaneously by replacing all operators by elements of any von Neumann algebra equipped with a trace. Don’t worry about this much: it’s just a general formalism for treating classical and quantum mechanics on an equal footing. One example is the algebra of bounded operators on a Hilbert space, with the usual concept of trace. Then we’re doing quantum mechanics as usual. But another example is the algebra of suitably nice functions on a suitably nice space, where taking the trace of a function means integrating it. And then we’re doing classical mechanics!
For example, I showed you how to derive a beautiful formula for the metric I wrote down a minute ago:
But if we want to do the classical version, we can say Hey, presto! and write it down like this:
What did I do just now? I changed the trace to an integral over some space . I rewrote
as
to make you think ‘probability distribution’. And I don’t need to take the real part anymore, since is everything already real when we’re doing classical mechanics. Now this metric is the Fisher information metric that statisticians know and love!
In what follows, I’ll keep talking about the quantum case, but in the back of my mind I’ll be using von Neumann algebras, so everything will apply to the classical case too.
So what am I going to do? I’m going to fix a big problem with the story I’ve told so far.
Here’s the problem: so far we’ve only studied a special case of the Fisher information metric. We’ve been assuming our states are Gibbs states, parametrized by the expectation values of some observables . Our manifold
was really just some open subset of
: a point in here was a list of expectation values.
But people like to work a lot more generally. We could look at any smooth function from a smooth manifold
to the set of density matrices for some quantum system. We can still write down the metric
in this more general situation. Nobody can stop us! But it would be better if we could derive this formula, as before, starting from a formula like the one we had before:
The challenge is that now we don’t have observables to start with. All we have is a smooth function
from some manifold to some set of states. How can we pull observables out of thin air?
Well, you may remember that last time we had
where were some functions on our manifold and
was the partition function. Let’s copy this idea.
So, we’ll start with our density matrix , but then write it as
where is some self-adjoint operator and
(Note that , like
, is really an operator-valued function on
. So, I should write something like
to denote its value at a particular point
, but I won’t usually do that. As usual, I expect some intelligence on your part!)
Now we can repeat some calculations I did last time. As before, let’s take the logarithm of :
and then differentiate it. Suppose are local coordinates near some point of
. Then
Last time we had nice formulas for both terms on the right-hand side above. To get similar formulas now, let’s define operators
This gives a nice name to the first term on the right-hand side above. What about the second term? We can calculate it out:
where in the last step we use the chain rule. Next, use the definition of and
, and get:
This is just what we got last time! Ain’t it fun to calculate when it all works out so nicely?
So, putting both terms together, we see
or better:
This is a nice formula for the ‘fluctuation’ of the observables , meaning how much they differ from their expected values. And it looks exactly like the formula we had last time! The difference is that last time we started out assuming we had a bunch of observables,
, and defined
to be the state maximizing the entropy subject to constraints on the expectation values of all these observables.
Now we’re starting with and working backwards.
From here on out, it’s easy. As before, we can define to be the real part of the covariance matrix:
Using the formula
we get
or
Voilà!
When this matrix is positive definite at every point, we get a Riemanian metric on . Last time I said this is what people call the ‘Bures metric’ — though frankly, now that I examine the formulas, I’m not so sure. But in the classical case, it’s called the Fisher information metric.
Differential geometers like to use as a shorthand for
, so they’d write down our metric in a prettier way:
Differential geometers like coordinate-free formulas, so let’s also give a coordinate-free formula for our metric. Suppose is a point in our manifold, and suppose
are tangent vectors to this point. Then
Here is a smooth operator-valued function on
, and
means the derivative of this function in the
direction at the point
.
So, this is all very nice. To conclude, two more points: a technical one, and a more important philosophical one.
First, the technical point. When I said could be any smooth function from a smooth manifold to some set of states, I was actually lying. That’s an important pedagogical technique: the brazen lie.
We can’t really take the logarithm of every density matrix. Remember, we take the log of a density matrix by taking the log of all its eigenvalues. These eigenvalues are ≥ 0, but if one of them is zero, we’re in trouble! The logarithm of zero is undefined.
On the other hand, there’s no problem taking the logarithm of our density-matrix-valued function when it’s positive definite at each point of
. You see, a density matrix is positive definite iff its eigenvalues are all > 0. In this case it has a unique self-adjoint logarithm.
So, we must assume is positive definite. But what’s the physical significance of this ‘positive definiteness’ condition? Well, any density matrix can be diagonalized using some orthonormal basis. It can then be seen as probabilistic mixture — not a quantum superposition! — of pure states taken from this basis. Its eigenvalues are the probabilities of finding the mixed state to be in one of these pure states. So, saying that all its eigenvalues are all > 0 amounts to saying that all the pure states in this orthonormal basis show up with nonzero probability! Intuitively, this means our mixed state is ‘really mixed’. For example, it can’t be a pure state. In math jargon, it means our mixed state is in the interior of the convex set of mixed states.
Second, the philosophical point. Instead of starting with the density matrix , I took
as fundamental. But different choices of
give the same
. After all,
where we cleverly divide by the normalization factor
to get . So, if we multiply
by any positive constant, or indeed any positive function on our manifold
,
will remain unchanged!
So we have added a little extra information when switching from to
. You can think of this as ‘gauge freedom’, because I’m saying we can do any transformation like
where
is a smooth function. This doesn’t change , so arguably it doesn’t change the ‘physics’ of what I’m doing. It does change
. It also changes the observables
But it doesn’t change their ‘fluctuations’
so it doesn’t change the metric .
This gauge freedom is interesting, and I want to understand it better. It’s related to something very simple yet mysterious. In statistical mechanics the partition function begins life as ‘just a normalizing factor’. If you change the physics so that
gets multiplied by some number, the Gibbs state doesn’t change. But then the partition function takes on an incredibly significant role as something whose logarithm you differentiate to get lots of physically interesting information! So in some sense the partition function doesn’t matter much… but changes in the partition function matter a lot.
This is just like the split personality of phases in quantum mechanics. On the one hand they ‘don’t matter’: you can multiply a unit vector by any phase and the pure state it defines doesn’t change. But on the other hand, changes in phase matter a lot.
Indeed the analogy here is quite deep: it’s the analogy between probabilities in statistical mechanics and amplitudes in quantum mechanics, the analogy between in statistical mechanics and
in quantum mechanics, and so on. This is part of a bigger story about ‘rigs’ which I told back in the Winter 2007 quantum gravity seminar, especially in week13. So, it’s fun to see it showing up yet again… even though I don’t completely understand it here.
[Note: in the original version of this post, I omitted the real part in my definition , giving a ‘Riemannian metric’ that was neither real nor symmetric in the quantum case. Most of the comments below are based on that original version, not the new fixed one.]

I’m really enjoying this because of the obvious applications to finance.
Now, if only you could wave the wand of category theory and make this all pop out by magic :)
Can we dualize this and get something like information theoretic 1-forms? I’d like to see something like
This is what I was talking about earlier. A stochastic process has a Brownian motion term and a deterministic term
This is probably related somehow.
There certainly will be metric
, at least when
is nondegenerate as I’m assuming. But what does it mean?
I’ve been doing a lot of local coordinate calculations using observables
But note that in a global approach, we get observable-valued functions by differentiating the operator-valued function
along any vector field
. Say:
Then our metric is
So: we get observables from vector fields… what do we get from 1-forms? Not sure what the best answer is.
By the way, you posted your comment before I was done writing my article! I accidentally hit ‘Post’ — it’s annoyingly easy to do. You might want to reread my post now that it’s done, especially to see the Marx Brothers reference, but also maybe a bit of extra physics.
Wait a second. Which should REALLY be the variance?
or
? I think it should be
in order to relate to stochastic processes.
I’m getting observables from tangent vectors, and the metric from the covariance matrix on observables, so
is the covariance of the i th observable and the j th one, in local coordinates.
Sorry for being dense, but it’s not obvious to me you have tangent vectors
projecting out elements of the covariance matrix. It almost looks like you’ve got components of a 1-form
Then
Maybe there is some Ito trick laying in here somewhere so that you get something like
and
Ok! A little latex in the morning is good for the soul.
Now that I write that out, some things have become clearer. I hope the above sparks some thoughts from those who know more than me about this stuff.
Anyway, I have found peace with the idea that
Progress! But in the process, I have also come to peace with
. This is my current thought
These should (probably) be related by
And there is peace in the world (of my head) again.
I’m not very certain about
but the Ito thing (if there is anything to it) is kind of cute and leads directly to
Phorgyphynance writes:
Good, you shouldn’t be, because it doesn’t make sense!
First of all,
are just names for the local coordinates on our manifold
. They’re not observables, so it doesn’t make much sense to take their expectation values.
True, you can think of a number as an observable that always takes the same value: that is, an observable with zero variance. So a coordinate function can be seen as a special case of an observable-valued function… but one with the special property that
So, by the only reasonable interpretation of the left-hand side of your equation, we get
Certainly nothing to do with the metric!
The following formulas you wrote also do not make sense:
Sorry to be rude, but I get the feeling that you’re making wild guesses in a rush, instead of waiting until everything is crystal clear. There’s not even anything called
in my formalism! It has nothing to do with the Ito calculus, at least not yet.
Ouch.
You do have stochastic processes whether you choose to recognize them yet or not. Once you recognize the stochastic process, you must recognize
.
Perhaps the notation needs some work, e.g. maybe we should write
with
or something, but I’m confident the basic ideas I’m laying out can be made solid with some effort. And I’m also confident they are relevant to what you are doing, but if you don’t want me thinking out loud on your blog, that is understandable. I didn’t see the harm.
Sorry, I guess I have a low tolerance for equations that don’t parse: by avoiding these like the plague, I avoid certain kinds of mistakes. They make me grumpy. I have trained myself to be like that.
I wouldn’t be surprised if there’s a connection between this stuff and stochastic processes, though.
And just for the record, when I say I’m thinking out loud, I don’t mean to imply that I haven’t put serious thought into this before. The first time I became aware of the practical implications of the fact that the covariance matrix was like a metric tensor on a manifold was in the first half of 2005 when I was on Wall Street and I’ve given several presentations on the subject since then. Believe me, people are very familiar with these connections you’re reinventing.
The thing I haven’t personally thought of before is the relation to partition functions. That is very cool. The mathematical connection between what I’m talking about and what you’re doing is obvious to me, but instead of working out the details in the comments here, I’ll try to sort things out and let them bake a while on my own blog as well as possibly on my personal web space on the nLab.
And I’ll try to forget the comment about parsing of equations :P You will eventually see that everything I’ve written will parse (typos and minor notational adjustments aside) and is in fact fairly standard material.
I’m sorry to have hurt your feelings. I hope you realize what I’m doing. I’m trying to understand the subject called information geometry. A lot is known about it; I doubt I’m doing anything really new yet, but I feel the need to explain it and redevelop it in a slightly different way to understand it. I’m not trying to make connections to stochastic calculus, though they probably exist and are probably very interesting. I already have my hands full just trying to understand some basic concepts like the Fisher information metric.
I felt the need to point out that some equations you wrote made no sense given the definitions I had laid down. I did so in a rather rude way, because they actually made my brain hurt. But if you change something or other, you may wind up stating some interesting and/or well-known facts.
Eric writes:
I’m not sure what you mean here:
is not a tangent vector, but
and
are. Maybe that’s what you were trying to say.
Anyway, starting from the definition of the metric in terms of a covariance matrix
I showed that
and you can read the proof above.
Sure, and that’s a nice way of looking at it. Then, by the usual yoga of 1-forms and tangent vectors, this implies
But beware:
is an observable-valued 1-form. I.e., it’s an operator-valued 1-form in the case of quantum statistical mechanics, and a 1-form taking values in functions on the phase space
in the case of classical statistical mechanics.
(If you prefer the language of probability theory to that of classical probability theory, say “random variable” instead of “function on phase space”, and emphasize that
is a measure space. It’s just different terminology for the same thing, at least for what I’m doing here.)
Eric wrote:
There is a source of confusion I’d like to get out of the way, but we’ll need some definitions first:
Let’s say we talk about one dimensional stochastic processes in a continuous variable t. Let
be Brownian motion, that is the only stochastic process with stationary independent increments with mean 0, starting at 0 (
), and concentrated on continuous paths.
The problem is that one cannot make sense of
as a differential form, AFAIK. It makes sense only as a short hand notation of the integral equation
where we have yet to choose an appropriate integral definition, let’s choose the Itô integral for now. (Note that not all stochastic processes have such a representation, the martingale representation theorem tells us that exactly all adapted martingales have one).
The problem is that the paths of Brownian motion are a.s. not of bounded variation, therefore the integral may not exist pathwise as a Riemann-Stieltjes integral.
What we can do however is considering the stochastic processes that are solutions of a family of Itô stochastic integrals described by a finite set of parameters, e.g.
where
is a real polynomial of a degree of at most 4, and
is any real number. Every such stochastic process defines a probability distribution on, say,
, the continuous functions of the interval
. These probability distributions form a stochastic manifold where one could, in principle, calculate the Fisher metric (ugh, I don’t think that I can do that).
You’re absolutely correct when viewing things from the traditional perspective. However, there IS a way to view the stochastic process as a 1-form, but you need to consider it in the context of noncommutative geometry.
I’m pretty sure I can claim to be the first person to ever apply noncommutative geometry to finance :)
Have a look at this paper I wrote back in 2002:
Noncommutative Geometry and Stochastic Calculus: Applications in Mathematical Finance
There, we find that the stochastic processes are indeed 1-forms and the Ito formula follows from noncommutativity of 0-forms and 1-forms, i.e.
This is reminiscent of the common heuristic used when defining Ito formula in elementary math finance texts
Then I let the idea rest for 2 years while I was at MIT Lincoln Lab, but came back to it in 2004 (just prior to moving to Wall Street) with a finite version suitable for numerical modeling:
Financial Modelling Using Discrete Stochastic Calculus
I sometimes feel a little apologetic for bringing up math finance here, but I hope it is clear how these techniques (could possibly) apply more generally including this information geometry stuff.
I see: Maybe I should be more careful when I talk about “differential forms”.
In classical differential geometry, given a real, smooth maifold
, the differential
of a real smooth function
lives in the cotangential bundle of
.
We can integrate
along a path and get a real number.
Transforming this to Connes’ quantized calculus,
becomes a selfadjoint operator,
becomes the commutator of
with
, where
is a fixed selfadjoint operator of square one. The integral becomes the trace. We can still integrate a differential of order one and get a real number.
Now I don’t see a way to fit “$latex dW_t” into Connes’ quantized calculus :-)
It is not a function, an operator, or an differential of order one and we cannot integrate it to get a real number (or a complex one) :-)
Of course you are free to define it to be a basis vector of some abstract vector space and introduce additional algebraic structures that mimic Itô calculus.
Hi Tim,
If you have a look at the paper I wrote with Urs
Discrete differential geometry on causal graphs
you’ll find that there is a particular class of spaces we call “diamonds” for which
is the commutator of
with the “graph operator”
.
There is a particularly nice “diamond” we examine in Section 2.9 (which I expanded in the “Discrete Stochastic Calculus” paper), i.e. the binary tree or 2-diamond. The continuum stochastic calculus is the continuum limit of this discrete calculus on a binary tree. This is the bridge you’re looking for.
One more note:
This is not what happens though. For a given directed graph, there is a corresponding calculus. There is no choice in the matter at all.
The calculus that corresponds automatically to the binary tree is stochastic calculus. There is no arbitrariness.
The story how we came to this is kind of funny. Urs and I were having fun playing around with this stuff and we asked the inverse question. If I give you a graph, you can determine the corresponding calculus. What if I hand you a calculus, can you reverse engineer things and give me a graph that corresponds to this calculus? Just as Urs left for a bike ride across Europe we asked what graph corresponds to stochastic calculus. When he returned, we had both arrived at the answer: the binary tree is the graph that corresponds to stochastic calculus. It is obvious in retrospect.
Eric pointed out this paper:
Thanks! I hope I’ll have time to read it next weekend (the Monday is a holiday), I’ll report back then :-)
Hi Tim,
I’ve written a pretty simple explanation of this on my blog at
Discrete stochastic calculus and commutators
If you’re interested, please feel free to discuss it over there. I love discussing this stuff so feel free to lob over any questions and I’m happy to do my best to answer them.
I’m getting dizzy from all these interconnections.
I looked for additional reading material: Curiously I did not find an unified treatment using von Neumann algebras, although it seems to be pretty simple and elegant (then again everything looks simple and elegant if explained by John). Instead there is an introduction of quantum information theory using finite Hilbert spaces in
Amari, Shun-ichi; Nagaoka, Hiroshi: Methods of information geometry (available on google books, too).
Or see Denes Petz, Catalin Ghinea: Introduction to quantum Fisher information.
And to make the story even more interconnected, people use information geometry to study black hole thermodynamics, see e.g. Jan E. Aman, Narit Pidokrajt: Geometry of Higher-Dimensional Black Hole Thermodynamics.
Thanks for the references! I wanted to invent some of my own stuff before getting brainwashed by the existing literature, but I’ll look at these and see what they say.
Well, when it comes to this sort of thing, if it doesn’t make you dizzy it must not be done yet.
Is there sufficient gauge freedom to fix problems in A and/or states, or are the problems where the physics is? Can a dynamics of the gauge enable modelling wavefunction collapse?
John F wrote:
I don’t understand what ‘problems’ you’re referring to, but right now I’m happy that I can use the gauge freedom in A to choose A so that
that is,
. This implies
and it also implies that our observables
have vanishing expectation values:
This makes them boring if you like nonzero expectation values, but their covariances are still plenty interesting: they are the metric
.
This is the simplest gauge choice.
Sorry, I can’t make sense of that.
By problems I mostly meant zeroes of the weights. I guess this may not require anomalies or singularities in A, but maybe at least caustics. FWIW Berry (again) did nice work in multispectral holograms, e.g.
http://www.pnas.org/content/93/6/2614.full.pdf
Sometimes it seems like everything is gauges, phases, conformal terms, etc; sometimes nothing.
Okay — yeah, I spent an hour yesterday trying to figure out what to do about the metric
when the density matrix
has some zero eigenvalues. This question is important because we can think of pure states as mixed states with a lot of zero eigenvalues! I don’t think changing the gauge on
helps at all, since the above formula for the metric is explicitly gauge-independent.
Right now I’d guess the metric
becomes singular at pure states, so we need a different metric if we want to include pure states in our story. And right now my most promising lead comes from Uhlmann’s papers. He seems to be studying metrics on the space of density matrices in a very thorough, general way, synthesizing and extending other people’s work in a systematic framework.
Ok, Uhlmann’s 1995 paper is helping me understand his 1993 paper, after also reading his ’85 and ’86 (etc.) For some reason this reminds me of an old joke I’ve been wanting to repeat recently but haven’t found a venue for, even if it doesn’t quite fit. A preacher at my church was clowning and mentioned he was sure we’d agree that each of his sermons was better than the next.
Anyway two questions. 1) Do you agree with (Ulmann 1995) “in deviating from the pure to the mixed states … coherence and correlations will not be destroyed suddenly but gradually, continuously.”? 2) He uses lots of positive square roots (one specifically in Equation 37). Do they *have* to be positive?
Here is a thought…
You have discovered a gauge freedom. This gauge freedom simply shifts the mean. I am pretty confident this is a manifestation of the Girsanov theorem.
Ack! I posted the Wikipedia link before reading it assuming the page was decent, but the page actually sucks.
Given a stochastic process
Girsanov’s theorem says that changing the measure modifies the above to
i.e. the covariance structure is unchanged, but the change of measure changes the mean
.
I strongly suspect your gauge freedom represents a change of measure.
PS: The trick you used to gauge transform the mean to zero is the same trick used in finance to convert a stochastic process into a martingale. So it seems that renormalizing the partition function via the gauge freedom turns your observable into a martingale.
My gauge freedom does indeed shift the mean; doing the gauge transform
changes the observable-valued function
as follows:
At each point
, this just takes the observable
and adds to it the number
. So, its mean gets shifted by
, but ‘nothing else changes’.
However, there is no time variable in my formalism, so all remarks about stochastic processes, martingales, Ito’s formula, etc. are irrelevant here — or, optimistically speaking, ‘premature’.
Maybe you can introduce a time-dependence into my formalism and make those remarks meaningful somehow. For example, maybe you can take
and call the coordinate on this manifold
for ‘time’. Then what I’ve been calling ‘observable-valued functions’ can be renamed ‘time-dependent random variables’, at least in the classical case.
Something inside me aches when I see beautiful mathematical physics laid out in front of me, but know I just don’t have the time to learn about it. And I would really like to, because thermodynamics was the one part of undergraduate physics that never really sat well with me.
Wow, sometimes I miss being a PhD student — no crummy grant applications to deal with, no pressure of collaborators waiting for me to write things. Three years ago, I would have been all over this.
Hi, Jamie! That’s sad, I’ve almost always felt I’ve had plenty of time to learn new stuff. The only really bad stretch was a couple years ago when I had way too many papers to finish writing.
The problem is working with coauthors. When I work by myself I write up ideas as I go, so the paper is done when it’s done. With coauthor it’s different: it’s lots of fun when we’re together, dreaming up ideas and working out the details, but not fun at all later on when I’m slowly writing them up. The real problem is feeling guilty that my procrastination may be hurting someone else’s CV — especially when it’s a grad student or postdoc, desperate for a job.
I’m trying to avoid new collaborative papers for this reason, and so far I’m succeeding. This has freed up a lot of time for learning new stuff, and explaining it here. And overall, I think that’s a better use of my time.
The first three posts on information geometry don’t assume much background, especially not much in statistical physics – although John used some physics mumbo jumbo that may scare mathematicians that see it for the first time.
But it’s possible to translate all of it to a pure mathematical language, which is easier to do of course in the presence of specific questions like “I don’t understand symbol X on line Z”.
I know Jamie; he’s good at mathematical physics, so I don’t think he’d have trouble understanding what I wrote. I think he’s just too busy.
Last night, I had some some fun working out all the formulas appearing in this series. John was right. I do feel smarter after doing that :)
However, I still do not have a perfect handle on the nature of all variables appearing. For instance, I got stuck when writing down
Should this be expanded in terms of
,
,
,
,…?
In Crooks’ paper, I see the
are functions of
and
are functions of
.
Should
be expanded using Newtonian calculus or stochastic calculus (via Ito)?
Are you talking about
1) the formalism I’m presenting in this post,
2) the formalism in Gavin Crooks’ first paper, or
3) the formalism in Gavin Crooks’ second paper?
They’re different. I can’t tell you the rules of the game until you tell me which game you’re playing.
For example:
1) There are no variables
or
in anything I explained in this post. I use
to coordinatize an arbitrary manifold
.
and
are smooth operator-valued functions on this manifold. I’m using plain old derivatives, no Ito calculus.
2) In Crooks’ first paper and my first post explaining it,
is used to denote a vector: a list of expectation values of a fixed set of observables
. The variables
thus serve to coordinatize an open set in
. The probability measure
is a function of
, but the observables
are not. The expectation values
equal the coordinate functions
on this open set. The
are some other functions on this open set. The variable
is used not for time — he’s doing equilibrium thermodynamics — but to parametrize a path in this open set. He’s using plain old derivatives, no Ito calculus.
3) I will not attempt to describe the rules of the game in Crooks’ second paper, since I haven’t read it carefully yet.
If you don’t mind, I will stick to formalism 1). In this formalism,
The second step is the main calculation I did in this post.
Thanks! This helps.
I haven’t looked at Crooks’ second paper yet and I thought 1) and 2) were intended to be the same (or at least consistent).
Then why did he say
? :)
I see now that
are like spatial coordinates so 1-forms should have coordinate bases
, but I’m still not quite sure whether
should contain a
term.
This may not be relevant, but it is curious enough that I think I’ll share it. The Ito differential would look something like
The spatial component is the same as you had it, but note the second term of the temporal component. Since
you have a term
which seems curious. I need to learn the relation (if any) between operator-valued functions and stochastic processes, but I’ll work on that elsewhere.
Phorgyphynance wrote:
The current blog entry generalizes Crooks’ first paper in ways that require a significant shift in viewpoint. I wrote:
And the answer is to let
be arbitrary local coordinates on
, not Lagrange multipliers as Crooks was using. Then, define
where
is a certain observable-valued function on
, related to
and the partition function
as follows:
Note that these
are not fixed observables, as they were in Crooks’ formalism! Now they are observable-valued functions on the manifold
.
We need these changes to think of the Fisher information metric as always coming from a covariance matrix.
So, you gotta be careful here.
Thanks for explaining. I hope it is clear that I am very excited about what you are doing and trying my best to understand it. I’ve found that simple “geometry” from the covariance matrix is already very useful in applications and now seeing a deeper level coming from a density matrix is very very cool. I hope to incorporate it into my work.
[…] post is in response to a question from Tim van Beek over at the Azimuth Project blog hosted by my friend John Baez regarding my […]
John wrote:
Phorgyphynance wrote:
I think he uses
to mean time only in this one paragraph. I hadn’t even noticed that.
Of course, Gavin Crooks can say for himself what he’s done, but here’s my take on it:
In the bulk of the paper he’s doing equilibrium thermodynamics, time plays no real role, and he uses
as a parameter for a path in his manifold of thermodynamic states. The main point of the paper — which I didn’t even get around to discussing — is to provide an operational procedure for measuring the arclength of such a path. This involves changing the thermodynamic state, but so slowly that we may consider it as remaining in equilibrium all along.
So, I stand by what I said. But if you want to make some stuff time-dependent, to get some
terms to show up, you can do that.
John wrote in the main post:
Two simple questions:
Isn’t the assumption that the densitiy matrix is positive definite correct for the grand canonical ensemble in “physically realistic” situations? I mean, every state contributes, because we fix the mean energy and mean particle count only, so that every state of the system – regardless of the energy and particle count needed – has a nonzero probability…
On the other hand, do we really need to assume that the density matrix is positive definite to take the logarithm? Let’s say we have a point
on our manifold such that
has a basis of eigenvectors, and we keep all eigenvectors with eigenvalue
, let this set be
. Then we can write
(I hope I get the latex correct, what I intend to write is the representation of the density matrix according to the spectral theorem of compact operators, as here on the nLab).
Now we can define
eigenvector-wise, as you did in the first post on information geometry, setting
.
We can further differentiate this logarithm iff we assume that
has a neighborhood such that
is invariant on this neighborhood, that is the set of eigenvectors with nonzero eigenvalue does not change.
Maybe there is some way to define the differential even in the case that
does change, I don’t know…but is it possible to relax the condition on
from being positive definite to having an invariant
?
Tim wrote:
Great question! Yes, that’s true.
Right. Or if we just consider the canonical ensemble, where each state with energy E shows up with probability proportional to exp(-E/kT), we’ll get a nonzero contribution from every state…
… except in the unphysical but nonetheless incredibly interesting limiting case where T = 0. Then the system goes down to its ground state, or a mixture of its ground states if there’s more than one.
It would be very nice to be able to understand the metric I’m discussing in this limiting case. Why? Well, I explained a metric on pure states in my post on the geometry of quantum phase transitions. It’s called the ‘fidelity metric’ or ‘Fubini-Study metric’, and it’s very nice. It would be cool to relate that metric to the one I’m discussing here!
Just so nobody gets the wrong idea: I did not set
. 
(I know you’re not saying I did: you’re just saying I defined
eigenvector-wise. But people reading your sentence out of context might be confused. I don’t want ’em to think I’m even dumber than I actually am.)
I don’t think formally setting
equal to zero will help me with the question I’m interested in. I want to understand what the metric on the interior of the set of mixed states does as we approach the boundary. I would like it to extend smoothly or at least continuously.
Alas, I suspect it becomes singular. Think about this formula:
Now I’m using this to define a Riemannian metric on the interior of the set of mixed states. (The pullback of this metric to my manifold
is the metric I was talking about in my post.) Now
and
are tangent vectors to some density matrix
.
Everything is fine when none of the eigenvalues of
are zero. What happens when some eigenvalues approach zero? We have one factor of
to help us out — but we’ve got two factors involving a derivative of
. So, I’d expect behavior like
which is singular.
But this is just a hand-waving argument.
Yes, true. But that’s not very helpful for the kind of thing I’m interested in, e.g. extending the metric on the interior of the set of mixed states to the boundary. On the boundary we have pure states, and as we move around on that boundary, the eigenvectors your talking about change.
I think these papers should help a lot:
• Anna Jencova, Geodesic distances on density matrices.
• A. Uhlmann, Density operators as an arena for differential geometry, Rep. Math. Phys. 33 (1993), 253–263.
• A. Uhlmann, Geometric phases and related structures, Rep. Math. Phys. 36 (1995), 461–481.
I could also do some calculations. It ain’t rocket science, it’s basically just matrix multiplication and some calculus. But sometimes those things can be pretty tiring.
I’ have to at least skim those papers, but don’t have any time to do it now, so here is an unqualified response:
I’m very much convinced that the metric gets singular on the boundary, but I’m not sure about what that really means.
One interpretation, of course, is that it is possible to find out for sure if your system is in a mixed state instead of a pure state. In the classical case, points on a statistical manifold with finite Fisher distance cannot be distinguisehd for sure, using a finite set of measurements.
So, while one interesting question is “how do we get rid of singularities?”, another one may be “what statistical resp. physical interpretation does a singularity of the metric have?”.
Pure states may be described, in this context, as points with lower complexity in the sense that they are specified by fewer parameters than mixed states.
I wrote:
Ugh, that’s wrong, unless we assume the classical analogue of an invariant set of eigenvectors: Namely, that the probability distributions of our statistical manifold all have the same support.
Shouldn’t all this talk of setting
really be setting
(which really is the limit
).
Setting
is fine, and that’s what people normally do when defining the entropy of a mixed state to be
When the density matrix
has an eigenvalue equal to 0, they define the operator
to equal 0 when applied to that eigenvector.
However, here Tim and I are trying to interpret the expression
and this is trickier. Indeed it appears, in general, to be ill-defined when
has zero eignenvalues.
John wrote:
Yes, that’s the problem…we need to define the differential of
. We could try to define the logarithm of
to be
eigenvector-wise, so that we don’t need
to be positive definite.
is defined to be 0 for
and
otherwise.
But if we have a path
in our manifold such that there is an eigenvector with an eigenvalue
with
and
I very much doubt we can find a way to define the differential of
along this path at
.
(Gee, hope the latex works out.)
Tim: I fixed a bunch of typos in your latex, but the big problem was this: in systems that mix latex and html, like this blog and also the n-Category Café, you have to be incredibly careful about < and > signs, since these play an important role in html.
In the n-Café you have to be smart enough to use the latex codes \lt and \gt instead of < and > — otherwise you’ll get in trouble.
In this blog you can’t even use \lt and \gt — apparently they get translated into < and > and they then cause trouble!
What you have to do here is use the html codes < and >, even inside latex expressions!
Having told you this, I now expect that you and everyone reading this will remember it forever and never make that mistake again.
I made a big mistake in this blog entry — and possibly the preceding one!
It can be seen most easily here:
A Riemannian metric must be symmetric:
On the other hand, the expression on the right-hand side is not symmetric:
except in the classical case where the observables all commute.
Remember, while I was using quantum notation, my setup was supposed to work for both classical and quantum mechanics. Alas, it only works in the classical case.
You might think the the cyclic property of the trace saves the day:
But this property does not mean everything commutes inside a trace:
So what did I do wrong?
I think it was simply defining the ‘metric’ by
This is symmetric in the classical case, but not the quantum case!
So, I need to go back to the drawing board, at least when it comes to the quantum case.
By the way, for a while I thought my mistake lay here:
In fact we don’t have
unless
and
commute. However, I believe the cyclic property of the trace is enough to show
So, it’s possible that all my calculations are correct, and the only problem is that I’m working with a bizarre asymmetric (and possibly degenerate) version of a ‘Riemannian metric’.
But I need to think more.
In your more general framework, could you still define
?
In the original case of Fisher information metric, we had
so that
and we have
In the more general case you’re considering, maybe you could have
in which case, it may make more sense to define
If we then use the gauge freedom to choose
such that
, then this reduces to
which is kind of neat and gives me a sense of deja vu.
Phorgyphynance made various suggestions, including:
This is a great idea! It’s automatically symmetric. To see when it’s positive definite, and understand what it means, I will compute it in terms of the observables
that I defined.
This is not a great idea! This doesn’t parse if
is supposed to be a tensor (and hopefully a Riemannian metric). Why? Because
is an observable-valued function on
. For example, in the quantum case — the only case I’m having problems with — it’s an operator-valued function.
So then
is an operator-valued rank 2 symmetric tensor on
, and while that’s probably good for something, it’s not what I’m looking for.
In the classical case
is an random-variable-valued rank 2 symmetric tensor, which is again not what I’m after.
So I prefer the first idea.
I’ve also figured out a bunch of stuff myself, but I’ll let it settle for a while before I write about it.
Ok. You’re probably right, but
“feels” better to me for some reason. The fact that it is operator-valued doesn’t bother me too much. For one reason, the metric in my paper with Urs was also a self-adjoint operator.
I’m looking forward to the next part of this series :)
John wrote:
In fact this mistake affected all three blog entries. Luckily, it’s easy to fix.
The problem was that this matrix is not symmetric in the quantum case:
The solution is to take the real part! So now I’ve redefined it:
I’ve tried to fix this problem everywhere it shows up in my blog entries — but not in the comments.
I’ve explained this in more detail in part 4, so if you have any questions, that’s the place to ask ’em!
[…] Part 1 • Part 2 • Part 3 • Part 4 • Part […]
Now I want to describe how the Fisher information metric is related to relative entropy […]
About zero eigenvalues in the density matrix…
— When there aren’t any, then it all seems to boil down to the statement that “all ensembles are gibbs ensembles”. In retrospect, this seems obvious, yet it also seems rather remarkable, because no one ever seems to come out and say it much (well, in physics, they do, but not in other contexts). So I think I learned something new here. (My definition of a “gibbs ensemble” here is “those ensembles which don’t have a zero in the density matrix”, which seems to be a correct definition, right? Or did I miss something?)
— When there are zero eigenvalues, then I have two knee-jerk reactions:
The first, naive one is “well, gee, you should leave those states out of your Hilbert space; it was an error to include them in the first place”.
That attitude fails because the density depends on lambda, and perhaps, as one moves around on the manifold, the density matrix goes from being positive-definite in “most” locations, to having zero eigenvalues in some. But if that’s the case, this too strikes me as being remarkable, at least from the physics point of view.
So, envisioning lambda as a proxy for the temperature, or the fugacity, or whatever, this is saying that, for some values of lambda, there is a state that mixes well into the ensemble, and for other lambdas, it fails to mix at all. I don’t know of any physical system that behaves like this (but maybe my experience is limited). Its like saying that, while manipulating the temperature, there is one state that suddenly completely stops interacting with the rest of the system. Wow! This seems like one heck of a discontinuity, suggesting some phase-transition-like behavior. Call me naive, but describing a model of some kind that exhibits this behaviour seems to be publication-worthy, to me. Unless there’s a “well-known” one you know…
At any rate, it still suggests a way for proceeding forward: you carve up the manifold into pieces, the edges of which are those values of lambda where the density matrix has a zero eigenvalue. As one crosses those edges, one should discard (or ad back in) the detached pure state(s) from your Hilbert space, and otherwise proceed with the usual Gibbs-state calculations.
It would be even more remarkable and amazing if one couldn’t carve up the manifold, say, for example, because the points (values of lambda) for which the density matrix had zeros and where it didn’t were “dense” in each other (in the general topology sense, like rationals dense in reals), or if there was an accumulation point. Of course, mathematically, I suppose this is possible, but as a physical situation, this would seem to be truly remarkable…