So far in this series of posts I’ve been explaining a paper by Gavin Crooks. Now I want to go ahead and explain a little research of my own.

I’m not claiming my results are new — indeed I have no idea whether they are, and I’d like to hear from any experts who might know. I’m just claiming that this is some work I did last weekend.

People sometimes worry that if they explain their ideas before publishing them, someone will ‘steal’ them. But I think this overestimates the value of ideas, at least in esoteric fields like mathematical physics. The problem is not people stealing your ideas: the hard part is *giving them away*. And let’s face it, people in love with math and physics will do research unless you actively stop them. I’m reminded of this scene from the Marx Brothers movie where Harpo and Chico, playing wandering musicians, walk into a hotel and offer to play:

Groucho: What do you fellows get an hour?

Chico: Oh, for playing we getta ten dollars an hour.

Groucho: I see…What do you get for not playing?

Chico: Twelve dollars an hour.

Groucho: Well, clip me off a piece of that.

Chico: Now, for rehearsing we make special rate. Thatsa fifteen dollars an hour.

Groucho: That’s for rehearsing?

Chico: Thatsa for rehearsing.

Groucho: And what do you get for not rehearsing?

Chico: You couldn’t afford it.

So, I’m just rehearsing in public here — but I of course I hope to write a paper about this stuff someday, once I get enough material.

Remember where we were. We had considered a manifold — let’s finally give it a name, say — that parametrizes Gibbs states of some physical system. By **Gibbs state**, I mean a state that maximizes entropy subject to constraints on the expected values of some observables. And we had seen that in favorable cases, we get a Riemannian metric on ! It looks like this:

where are our observables, and the angle bracket means ‘expected value’.

All this applies to both classical or quantum mechanics. Crooks wrote down a beautiful formula for this metric in the classical case. But since I’m at the Centre for *Quantum* Technologies, not the Centre for Classical Technologies, I redid his calculation in the quantum case. The big difference is that in quantum mechanics, observables don’t commute! But in the calculations I did, that didn’t seem to matter much — mainly because I took a lot of traces, which imposes a kind of commutativity:

In fact, if I’d wanted to show off, I could have done the classical and quantum cases simultaneously by replacing all operators by elements of any von Neumann algebra equipped with a trace. Don’t worry about this much: it’s just a general formalism for treating classical and quantum mechanics on an equal footing. One example is the algebra of bounded operators on a Hilbert space, with the usual concept of trace. Then we’re doing quantum mechanics as usual. But another example is the algebra of suitably nice functions on a suitably nice space, where taking the trace of a function means *integrating* it. And then we’re doing classical mechanics!

For example, I showed you how to derive a beautiful formula for the metric I wrote down a minute ago:

But if we want to do the classical version, we can say *Hey, presto!* and write it down like this:

What did I do just now? I changed the trace to an integral over some space . I rewrote as to make you think ‘probability distribution’. And I don’t need to take the real part anymore, since is everything already real when we’re doing classical mechanics. Now this metric is the **Fisher information metric** that statisticians know and love!

In what follows, I’ll keep talking about the quantum case, but in the back of my mind I’ll be using von Neumann algebras, so everything will apply to the classical case too.

So what am I going to do? I’m going to fix a big problem with the story I’ve told so far.

Here’s the problem: so far we’ve only studied a special case of the Fisher information metric. We’ve been assuming our states are Gibbs states, parametrized by the expectation values of some observables . Our manifold was really just some open subset of : a point in here was a list of expectation values.

But people like to work a lot more generally. We could look at *any* smooth function from a smooth manifold to the set of density matrices for some quantum system. We can still write down the metric

in this more general situation. Nobody can stop us! But it would be better if we could *derive* this formula, as before, starting from a formula like the one we had before:

The challenge is that now we don’t have observables to start with. All we have is a smooth function from some manifold to some set of states. How can we pull observables out of thin air?

Well, you may remember that last time we had

where were some functions on our manifold and

was the partition function. Let’s copy this idea.

So, we’ll start with our density matrix , but then write it as

where is some self-adjoint operator and

(Note that , like , is really an operator-valued function on . So, I should write something like to denote its value at a particular point , but I won’t usually do that. As usual, I expect some intelligence on your part!)

Now we can repeat some calculations I did last time. As before, let’s take the logarithm of :

and then differentiate it. Suppose are local coordinates near some point of . Then

Last time we had nice formulas for both terms on the right-hand side above. To get similar formulas now, let’s define operators

This gives a nice name to the first term on the right-hand side above. What about the second term? We can calculate it out:

where in the last step we use the chain rule. Next, use the definition of and , and get:

This is just what we got last time! Ain’t it fun to calculate when it all works out so nicely?

So, putting both terms together, we see

or better:

This is a nice formula for the ‘fluctuation’ of the observables , meaning how much they differ from their expected values. And it looks exactly like the formula we had last time! The difference is that last time we *started out* assuming we had a bunch of observables, , and defined to be the state maximizing the entropy subject to constraints on the expectation values of all these observables.

Now we’re starting with and working backwards.

From here on out, it’s easy. As before, we can define to be the real part of the covariance matrix:

Using the formula

we get

or

*Voilà!*

When this matrix is positive definite at every point, we get a Riemanian metric on . Last time I said this is what people call the ‘Bures metric’ — though frankly, now that I examine the formulas, I’m not so sure. But in the classical case, it’s called the Fisher information metric.

Differential geometers like to use as a shorthand for , so they’d write down our metric in a prettier way:

Differential geometers like coordinate-free formulas, so let’s also give a coordinate-free formula for our metric. Suppose is a point in our manifold, and suppose are tangent vectors to this point. Then

Here is a smooth operator-valued function on , and means the derivative of this function in the direction at the point .

So, this is all very nice. To conclude, two more points: a technical one, and a more important philosophical one.

First, the technical point. When I said could be *any* smooth function from a smooth manifold to some set of states, I was actually lying. That’s an important pedagogical technique: the brazen lie.

We can’t really take the logarithm of *every* density matrix. Remember, we take the log of a density matrix by taking the log of all its eigenvalues. These eigenvalues are ≥ 0, but if one of them is zero, we’re in trouble! The logarithm of zero is undefined.

On the other hand, there’s no problem taking the logarithm of our density-matrix-valued function when it’s positive definite at each point of . You see, a density matrix is positive definite iff its eigenvalues are all > 0. In this case it has a unique self-adjoint logarithm.

So, we must assume is positive definite. But what’s the physical significance of this ‘positive definiteness’ condition? Well, any density matrix can be diagonalized using some orthonormal basis. It can then be seen as probabilistic mixture — not a quantum superposition! — of pure states taken from this basis. Its eigenvalues are the probabilities of finding the mixed state to be in one of these pure states. So, saying that all its eigenvalues are all > 0 amounts to saying that all the pure states in this orthonormal basis show up with *nonzero* probability! Intuitively, this means our mixed state is ‘really mixed’. For example, it can’t be a pure state. In math jargon, it means our mixed state is in the *interior* of the convex set of mixed states.

Second, the philosophical point. Instead of starting with the density matrix , I took as fundamental. But different choices of give the same . After all,

where we cleverly divide by the normalization factor

to get . So, if we multiply by any positive constant, or indeed any positive *function* on our manifold , will remain unchanged!

So we have added a little extra information when switching from to . You can think of this as ‘gauge freedom’, because I’m saying we can do any transformation like

where

is a smooth function. This doesn’t change , so arguably it doesn’t change the ‘physics’ of what I’m doing. It *does* change . It also changes the observables

But it doesn’t change their ‘fluctuations’

so it doesn’t change the metric .

This gauge freedom is interesting, and I want to understand it better. It’s related to something very simple yet mysterious. In statistical mechanics the partition function begins life as ‘just a normalizing factor’. If you change the physics so that gets multiplied by some number, the Gibbs state doesn’t change. But then the partition function takes on an incredibly significant role as something whose logarithm you differentiate to get lots of physically interesting information! So in some sense the partition function doesn’t matter much… but *changes* in the partition function matter a lot.

This is just like the split personality of phases in quantum mechanics. On the one hand they ‘don’t matter’: you can multiply a unit vector by any phase and the pure state it defines doesn’t change. But on the other hand, *changes* in phase matter a lot.

Indeed the analogy here is quite deep: it’s the analogy between probabilities in statistical mechanics and amplitudes in quantum mechanics, the analogy between in statistical mechanics and in quantum mechanics, and so on. This is part of a bigger story about ‘rigs’ which I told back in the Winter 2007 quantum gravity seminar, especially in week13. So, it’s fun to see it showing up yet again… even though I don’t completely understand it here.

[Note: in the original version of this post, I omitted the real part in my definition , giving a ‘Riemannian metric’ that was neither real nor symmetric in the quantum case. Most of the comments below are based on that original version, not the new fixed one.]

I’m really enjoying this because of the obvious applications to finance.

Now, if only you could wave the wand of category theory and make this all pop out by magic :)

Can we dualize this and get something like information theoretic 1-forms? I’d like to see something like

This is what I was talking about earlier. A stochastic process has a Brownian motion term and a deterministic term

This is probably related somehow.

There certainly will be metric , at least when is nondegenerate as I’m assuming. But what does it mean?

I’ve been doing a lot of local coordinate calculations using observables

But note that in a global approach, we get observable-valued functions by differentiating the operator-valued function along any vector field . Say:

Then our metric is

So: we get observables from vector fields… what do we get from 1-forms? Not sure what the best answer is.

By the way, you posted your comment before I was done writing my article! I accidentally hit ‘Post’ — it’s annoyingly easy to do. You might want to reread my post now that it’s done, especially to see the Marx Brothers reference, but also maybe a bit of extra physics.

Wait a second. Which should REALLY be the variance? or ? I think it should be in order to relate to stochastic processes.

I’m getting observables from tangent vectors, and the metric from the covariance matrix on observables, so

is the covariance of the

ith observable and thejth one, in local coordinates.Sorry for being dense, but it’s not obvious to me you have tangent vectors projecting out elements of the covariance matrix. It almost looks like you’ve got components of a 1-form

Then

Maybe there is some Ito trick laying in here somewhere so that you get something like

and

Ok! A little latex in the morning is good for the soul.

Now that I write that out, some things have become clearer. I hope the above sparks some thoughts from those who know more than me about this stuff.

Anyway, I have found peace with the idea that

Progress! But in the process, I have also come to peace with . This is my current thought

These should (probably) be related by

and

And there is peace in the world (of my head) again.

I’m not very certain about

but the Ito thing (if there is anything to it) is kind of cute and leads directly to

Phorgyphynance writes:

Good, you shouldn’t be, because it doesn’t make sense!

First of all, are just names for the local coordinates on our manifold . They’re not observables, so it doesn’t make much sense to take their expectation values.

True, you can think of a number as an observable that always takes the same value: that is, an observable with zero variance. So a coordinate function

canbe seen as a special case of an observable-valued function… but one with the special property thatSo, by the only reasonable interpretation of the left-hand side of your equation, we get

Certainly nothing to do with the metric!

The following formulas you wrote also do not make sense:

Sorry to be rude, but I get the feeling that you’re making wild guesses in a rush, instead of waiting until everything is crystal clear. There’s not even anything called in my formalism! It has nothing to do with the Ito calculus, at least not yet.

Ouch.

You do have stochastic processes whether you choose to recognize them yet or not. Once you recognize the stochastic process, you must recognize .

Perhaps the notation needs some work, e.g. maybe we should write

with or something, but I’m confident the basic ideas I’m laying out can be made solid with some effort. And I’m also confident they are relevant to what you are doing, but if you don’t want me thinking out loud on your blog, that is understandable. I didn’t see the harm.

Sorry, I guess I have a low tolerance for equations that don’t parse: by avoiding these like the plague, I avoid certain kinds of mistakes. They make me grumpy. I have trained myself to be like that.

I wouldn’t be surprised if there’s a connection between this stuff and stochastic processes, though.

And just for the record, when I say I’m thinking out loud, I don’t mean to imply that I haven’t put serious thought into this before. The first time I became aware of the practical implications of the fact that the covariance matrix was like a metric tensor on a manifold was in the first half of 2005 when I was on Wall Street and I’ve given several presentations on the subject since then. Believe me, people are very familiar with these connections you’re reinventing.

The thing I haven’t personally thought of before is the relation to partition functions. That is very cool. The mathematical connection between what I’m talking about and what you’re doing is obvious to me, but instead of working out the details in the comments here, I’ll try to sort things out and let them bake a while on my own blog as well as possibly on my personal web space on the nLab.

And I’ll try to forget the comment about parsing of equations :P You will eventually see that everything I’ve written will parse (typos and minor notational adjustments aside) and is in fact fairly standard material.

I’m sorry to have hurt your feelings. I hope you realize what I’m doing. I’m trying to understand the subject called information geometry. A lot is known about it; I doubt I’m doing anything really new yet, but I feel the need to explain it and redevelop it in a slightly different way to understand it. I’m not trying to make connections to stochastic calculus, though they probably exist and are probably very interesting. I already have my hands full just trying to understand some basic concepts like the Fisher information metric.

I felt the need to point out that some equations you wrote made no sense given the definitions I had laid down. I did so in a rather rude way, because they actually made my brain hurt. But if you change something or other, you may wind up stating some interesting and/or well-known facts.

Eric writes:

I’m not sure what you mean here: is not a tangent vector, but and are. Maybe that’s what you were trying to say.

Anyway, starting from the

definitionof the metric in terms of a covariance matrixI showed that

and you can read the proof above.

Sure, and that’s a nice way of looking at it. Then, by the usual yoga of 1-forms and tangent vectors, this implies

But beware: is an

observable-valued1-form. I.e., it’s an operator-valued 1-form in the case of quantum statistical mechanics, and a 1-form taking values in functions on the phase space in the case of classical statistical mechanics.(If you prefer the language of probability theory to that of classical probability theory, say “random variable” instead of “function on phase space”, and emphasize that is a measure space. It’s just different terminology for the same thing, at least for what I’m doing here.)

Eric wrote:

There is a source of confusion I’d like to get out of the way, but we’ll need some definitions first:

Let’s say we talk about one dimensional stochastic processes in a continuous variable t. Let be Brownian motion, that is the only stochastic process with stationary independent increments with mean 0, starting at 0 (), and concentrated on continuous paths.

The problem is that one cannot make sense of as a differential form, AFAIK. It makes sense only as a short hand notation of the

integral equationwhere we have yet to choose an appropriate integral definition, let’s choose the Itô integral for now. (Note that not

allstochastic processes have such a representation, the martingale representation theorem tells us that exactly all adapted martingales have one).The problem is that the paths of Brownian motion are a.s. not of bounded variation, therefore the integral may not exist pathwise as a Riemann-Stieltjes integral.

What we

cando however is considering the stochastic processes that are solutions of a family of Itô stochastic integrals described by a finite set of parameters, e.g.where is a real polynomial of a degree of at most 4, and is any real number. Every such stochastic process defines a probability distribution on, say, , the continuous functions of the interval . These probability distributions form a stochastic manifold where one could, in principle, calculate the Fisher metric (ugh, I don’t think that I can do that).

You’re absolutely correct when viewing things from the traditional perspective. However, there IS a way to view the stochastic process as a 1-form, but you need to consider it in the context of noncommutative geometry.

I’m pretty sure I can claim to be the first person to ever apply noncommutative geometry to finance :)

Have a look at this paper I wrote back in 2002:

Noncommutative Geometry and Stochastic Calculus: Applications in Mathematical Finance

There, we find that the stochastic processes are indeed 1-forms and the Ito formula follows from noncommutativity of 0-forms and 1-forms, i.e.

This is reminiscent of the common heuristic used when defining Ito formula in elementary math finance texts

Then I let the idea rest for 2 years while I was at MIT Lincoln Lab, but came back to it in 2004 (just prior to moving to Wall Street) with a finite version suitable for numerical modeling:

Financial Modelling Using Discrete Stochastic Calculus

I sometimes feel a little apologetic for bringing up math finance here, but I hope it is clear how these techniques (could possibly) apply more generally including this information geometry stuff.

I see: Maybe I should be more careful when I talk about “differential forms”.

In classical differential geometry, given a real, smooth maifold , the differential of a real smooth function lives in the cotangential bundle of .

We can integrate along a path and get a real number.

Transforming this to Connes’ quantized calculus, becomes a selfadjoint operator, becomes the commutator of with , where is a fixed selfadjoint operator of square one. The integral becomes the trace. We can still integrate a differential of order one and get a real number.

Now I don’t see a way to fit “$latex dW_t” into Connes’ quantized calculus :-)

It is not a function, an operator, or an differential of order one and we cannot integrate it to get a real number (or a complex one) :-)

Of courseyou are free todefineit to be a basis vector of some abstract vector space and introduce additional algebraic structures that mimic Itô calculus.Hi Tim,

If you have a look at the paper I wrote with Urs

Discrete differential geometry on causal graphs

you’ll find that there is a particular class of spaces we call “diamonds” for which is the commutator of with the “graph operator” .

There is a particularly nice “diamond” we examine in Section 2.9 (which I expanded in the “Discrete Stochastic Calculus” paper), i.e. the binary tree or 2-diamond. The continuum stochastic calculus is the continuum limit of this discrete calculus on a binary tree. This is the bridge you’re looking for.

One more note:

This is not what happens though. For a given directed graph, there is a corresponding calculus. There is no choice in the matter at all.

The calculus that corresponds automatically to the binary tree is stochastic calculus. There is no arbitrariness.

The story how we came to this is kind of funny. Urs and I were having fun playing around with this stuff and we asked the inverse question. If I give you a graph, you can determine the corresponding calculus. What if I hand you a calculus, can you reverse engineer things and give me a graph that corresponds to this calculus? Just as Urs left for a bike ride across Europe we asked what graph corresponds to stochastic calculus. When he returned, we had both arrived at the answer: the binary tree is the graph that corresponds to stochastic calculus. It is obvious in retrospect.

Eric pointed out this paper:

Thanks! I hope I’ll have time to read it next weekend (the Monday is a holiday), I’ll report back then :-)

Hi Tim,

I’ve written a pretty simple explanation of this on my blog at

Discrete stochastic calculus and commutators

If you’re interested, please feel free to discuss it over there. I love discussing this stuff so feel free to lob over any questions and I’m happy to do my best to answer them.

I’m getting dizzy from all these interconnections.

I looked for additional reading material: Curiously I did not find an unified treatment using von Neumann algebras, although it seems to be pretty simple and elegant (then again everything looks simple and elegant if explained by John). Instead there is an introduction of quantum information theory using finite Hilbert spaces in

Amari, Shun-ichi; Nagaoka, Hiroshi: Methods of information geometry (available on google books, too).

Or see Denes Petz, Catalin Ghinea: Introduction to quantum Fisher information.

And to make the story even more interconnected, people use information geometry to study black hole thermodynamics, see e.g. Jan E. Aman, Narit Pidokrajt: Geometry of Higher-Dimensional Black Hole Thermodynamics.

Thanks for the references! I wanted to invent some of my own stuff before getting brainwashed by the existing literature, but I’ll look at these and see what they say.

Well, when it comes to this sort of thing, if it doesn’t make you dizzy it must not be done yet.

Is there sufficient gauge freedom to fix problems in A and/or states, or are the problems where the physics is? Can a dynamics of the gauge enable modelling wavefunction collapse?

John F wrote:

I don’t understand what ‘problems’ you’re referring to, but right now I’m happy that I can use the gauge freedom in A to choose A so that

that is, . This implies

and it also implies that our observables

have vanishing expectation values:

This makes them boring if you like nonzero expectation values, but their covariances are still plenty interesting: they are the metric .

This is the simplest gauge choice.

Sorry, I can’t make sense of that.

By problems I mostly meant zeroes of the weights. I guess this may not require anomalies or singularities in A, but maybe at least caustics. FWIW Berry (again) did nice work in multispectral holograms, e.g.

Sometimes it seems like everything is gauges, phases, conformal terms, etc; sometimes nothing.

Okay — yeah, I spent an hour yesterday trying to figure out what to do about the metric

when the density matrix has some zero eigenvalues. This question is important because we can think of pure states as mixed states with a lot of zero eigenvalues! I don’t think changing the gauge on helps at all, since the above formula for the metric is explicitly gauge-independent.

Right now I’d guess the metric becomes singular at pure states, so we need a different metric if we want to include pure states in our story. And right now my most promising lead comes from Uhlmann’s papers. He seems to be studying metrics on the space of density matrices in a very thorough, general way, synthesizing and extending other people’s work in a systematic framework.

Ok, Uhlmann’s 1995 paper is helping me understand his 1993 paper, after also reading his ’85 and ’86 (etc.) For some reason this reminds me of an old joke I’ve been wanting to repeat recently but haven’t found a venue for, even if it doesn’t quite fit. A preacher at my church was clowning and mentioned he was sure we’d agree that each of his sermons was better than the next.

Anyway two questions. 1) Do you agree with (Ulmann 1995) “in deviating from the pure to the mixed states … coherence and correlations will not be destroyed suddenly but gradually, continuously.”? 2) He uses lots of positive square roots (one specifically in Equation 37). Do they *have* to be positive?

Here is a thought…

You have discovered a gauge freedom. This gauge freedom simply shifts the mean. I am pretty confident this is a manifestation of the Girsanov theorem.

Ack! I posted the Wikipedia link before reading it assuming the page was decent, but the page actually sucks.

Given a stochastic process

Girsanov’s theorem says that changing the measure modifies the above to

i.e. the covariance structure is unchanged, but the change of measure changes the mean .

I strongly suspect your gauge freedom represents a change of measure.

PS: The trick you used to gauge transform the mean to zero is the same trick used in finance to convert a stochastic process into a martingale. So it seems that renormalizing the partition function via the gauge freedom turns your observable into a martingale.

My gauge freedom does indeed shift the mean; doing the gauge transform

changes the observable-valued function as follows:

At each point , this just takes the observable and adds to it the number . So, its mean gets shifted by , but ‘nothing else changes’.

However, there is no time variable in my formalism, so all remarks about stochastic processes, martingales, Ito’s formula, etc. are irrelevant here — or, optimistically speaking, ‘premature’.

Maybe you can introduce a time-dependence into my formalism and make those remarks meaningful somehow. For example, maybe you can take

and call the coordinate on this manifold for ‘time’. Then what I’ve been calling ‘observable-valued functions’ can be renamed ‘time-dependent random variables’, at least in the classical case.

Something inside me aches when I see beautiful mathematical physics laid out in front of me, but know I just don’t have the time to learn about it. And I would really like to, because thermodynamics was the one part of undergraduate physics that never really sat well with me.

Wow, sometimes I miss being a PhD student — no crummy grant applications to deal with, no pressure of collaborators waiting for me to write things. Three years ago, I would have been all over this.

Hi, Jamie! That’s sad, I’ve almost always felt I’ve had plenty of time to learn new stuff. The only really bad stretch was a couple years ago when I had way too many papers to finish writing.

The problem is working with coauthors. When I work by myself I write up ideas as I go, so the paper is done when it’s done. With coauthor it’s different: it’s lots of fun when we’re together, dreaming up ideas and working out the details, but not fun at all later on when I’m slowly writing them up. The real problem is feeling guilty that my procrastination may be hurting someone else’s CV — especially when it’s a grad student or postdoc, desperate for a job.

I’m trying to avoid new collaborative papers for this reason, and so far I’m succeeding. This has freed up a lot of time for learning new stuff, and explaining it here. And overall, I think that’s a better use of my time.

The first three posts on information geometry don’t assume much background, especially not much in statistical physics – although John used some physics mumbo jumbo that may scare mathematicians that see it for the first time.

But it’s possible to translate all of it to a pure mathematical language, which is easier to do of course in the presence of specific questions like “I don’t understand symbol X on line Z”.

I know Jamie; he’s good at mathematical physics, so I don’t think he’d have trouble understanding what I wrote. I think he’s just

too busy.Last night, I had some some fun working out all the formulas appearing in this series. John was right. I do feel smarter after doing that :)

However, I still do not have a perfect handle on the nature of all variables appearing. For instance, I got stuck when writing down

Should this be expanded in terms of , , , ,…?

In Crooks’ paper, I see the are functions of and are functions of .

Should

be expanded using Newtonian calculus or stochastic calculus (via Ito)?

Are you talking about

1) the formalism I’m presenting in this post,

2) the formalism in Gavin Crooks’ first paper, or

3) the formalism in Gavin Crooks’ second paper?

They’re different. I can’t tell you the rules of the game until you tell me which game you’re playing.

For example:

1) There are no variables or in anything I explained in this post. I use to coordinatize an arbitrary manifold . and are smooth operator-valued functions on this manifold. I’m using plain old derivatives, no Ito calculus.

2) In Crooks’ first paper and my first post explaining it, is used to denote a vector: a list of expectation values of a fixed set of observables . The variables thus serve to coordinatize an open set in . The probability measure is a function of , but the observables are not. The expectation values equal the coordinate functions on this open set. The are some other functions on this open set. The variable is used not for time — he’s doing equilibrium thermodynamics — but to parametrize a path in this open set. He’s using plain old derivatives, no Ito calculus.

3) I will not attempt to describe the rules of the game in Crooks’ second paper, since I haven’t read it carefully yet.

If you don’t mind, I will stick to formalism 1). In this formalism,

The second step is the main calculation I did in this post.

Thanks! This helps.

I haven’t looked at Crooks’ second paper yet and I thought 1) and 2) were intended to be the same (or at least consistent).

Then why did he say

? :)

I see now that are like spatial coordinates so 1-forms should have coordinate bases , but I’m still not quite sure whether

should contain a term.

This may not be relevant, but it is curious enough that I think I’ll share it. The Ito differential would look something like

The spatial component is the same as you had it, but note the second term of the temporal component. Since

you have a term

which seems curious. I need to learn the relation (if any) between operator-valued functions and stochastic processes, but I’ll work on that elsewhere.

Phorgyphynance wrote:

The current blog entry generalizes Crooks’ first paper in ways that require a significant shift in viewpoint. I wrote:

And the answer is to let be arbitrary local coordinates on ,

notLagrange multipliers as Crooks was using. Then, definewhere is a certain observable-valued function on , related to and the partition function as follows:

Note that these are

notfixed observables, as they were in Crooks’ formalism! Now they are observable-valuedfunctionson the manifold .We need these changes to think of the Fisher information metric as

alwayscoming from a covariance matrix.So, you gotta be careful here.

Thanks for explaining. I hope it is clear that I am very excited about what you are doing and trying my best to understand it. I’ve found that simple “geometry” from the covariance matrix is already very useful in applications and now seeing a deeper level coming from a density matrix is very very cool. I hope to incorporate it into my work.

[…] post is in response to a question from Tim van Beek over at the Azimuth Project blog hosted by my friend John Baez regarding my […]

John wrote:

Phorgyphynance wrote:

I think he uses to mean time only in this one paragraph. I hadn’t even noticed that.

Of course, Gavin Crooks can say for himself what he’s done, but here’s my take on it:

In the bulk of the paper he’s doing equilibrium thermodynamics, time plays no real role, and he uses as a parameter for a path in his manifold of thermodynamic states. The main point of the paper — which I didn’t even get around to discussing — is to provide an operational procedure for measuring the arclength of such a path. This involves changing the thermodynamic state, but so slowly that we may consider it as remaining in equilibrium all along.

So, I stand by what I said. But if you want to make some stuff time-dependent, to get some terms to show up, you can do that.

John wrote in the main post:

Two simple questions:

Isn’t the assumption that the densitiy matrix is positive definite correct for the grand canonical ensemble in “physically realistic” situations? I mean, every state contributes, because we fix the mean energy and mean particle count only, so that every state of the system – regardless of the energy and particle count needed – has a nonzero probability…

On the other hand, do we really need to assume that the density matrix is positive definite to take the logarithm? Let’s say we have a point on our manifold such that has a basis of eigenvectors, and we keep all eigenvectors with eigenvalue , let this set be . Then we can write

(I hope I get the latex correct, what I intend to write is the representation of the density matrix according to the spectral theorem of compact operators, as here on the nLab).

Now we can define eigenvector-wise, as you did in the first post on information geometry, setting .

We can further differentiate this logarithm iff we assume that has a neighborhood such that is invariant on this neighborhood, that is the set of eigenvectors with nonzero eigenvalue does not change.

Maybe there is some way to define the differential even in the case that

doeschange, I don’t know…but is it possible to relax the condition on from being positive definite to having an invariant ?Tim wrote:

Great question! Yes, that’s true.

Right. Or if we just consider the canonical ensemble, where each state with energy E shows up with probability proportional to exp(-E/kT), we’ll get a nonzero contribution from every state…

…

exceptin the unphysical but nonetheless incredibly interesting limiting case where T = 0. Then the system goes down to its ground state, or a mixture of its ground states if there’s more than one.It would be very nice to be able to understand the metric I’m discussing in this limiting case. Why? Well, I explained a metric on pure states in my post on the geometry of quantum phase transitions. It’s called the ‘fidelity metric’ or ‘Fubini-Study metric’, and it’s very nice. It would be cool to relate that metric to the one I’m discussing here!

Just so nobody gets the wrong idea: I did

notset .(I know you’re not saying I did: you’re just saying I defined eigenvector-wise. But people reading your sentence out of context might be confused. I don’t want ’em to think I’m even dumber than I actually am.)

I don’t think formally setting equal to zero will help me with the question I’m interested in. I want to understand what the metric on the interior of the set of mixed states does as we approach the boundary. I would like it to extend smoothly or at least continuously.

Alas, I suspect it becomes singular. Think about this formula:

Now I’m using this to define a Riemannian metric on the interior of the set of mixed states. (The pullback of this metric to my manifold is the metric I was talking about in my post.) Now and are tangent vectors to some density matrix .

Everything is fine when none of the eigenvalues of are zero. What happens when some eigenvalues approach zero? We have one factor of to help us out — but we’ve got two factors involving a derivative of . So, I’d expect behavior like

which is singular.

But this is just a hand-waving argument.

Yes, true. But that’s not very helpful for the kind of thing I’m interested in, e.g. extending the metric on the interior of the set of mixed states to the boundary. On the boundary we have pure states, and as we move around on that boundary, the eigenvectors your talking about change.

I think these papers should help a lot:

• Anna Jencova, Geodesic distances on density matrices.

• A. Uhlmann, Density operators as an arena for differential geometry,

Rep. Math. Phys.33(1993), 253–263.• A. Uhlmann, Geometric phases and related structures,

Rep. Math. Phys.36(1995), 461–481.I could also do some calculations. It ain’t rocket science, it’s basically just matrix multiplication and some calculus. But sometimes those things can be pretty tiring.

I’ have to at least skim those papers, but don’t have any time to do it now, so here is an unqualified response:

I’m very much convinced that the metric gets singular on the boundary, but I’m not sure about what that really means.

One interpretation, of course, is that it is possible to find out

for sureif your system is in a mixed state instead of a pure state. In the classical case, points on a statistical manifold withfiniteFisher distance cannot be distinguisehdfor sure, using afinite setof measurements.So, while one interesting question is “how do we get rid of singularities?”, another one may be “what statistical resp. physical interpretation does a singularity of the metric have?”.

Pure states may be described, in this context, as points with lower complexity in the sense that they are specified by fewer parameters than mixed states.

I wrote:

Ugh, that’s wrong, unless we assume the classical analogue of an invariant set of eigenvectors: Namely, that the probability distributions of our statistical manifold all have the same support.

Shouldn’t all this talk of setting really be setting (which really is the limit ).

Setting is fine, and that’s what people normally do when defining the entropy of a mixed state to be

When the density matrix has an eigenvalue equal to 0, they define the operator to equal 0 when applied to that eigenvector.

However, here Tim and I are trying to interpret the expression

and this is trickier. Indeed it appears, in general, to be ill-defined when has zero eignenvalues.

John wrote:

Yes, that’s the problem…we need to define the differential of . We could try to define the logarithm of to be eigenvector-wise, so that we don’t need to be positive definite. is defined to be 0 for and otherwise.

But if we have a path in our manifold such that there is an eigenvector with an eigenvalue with and I very much doubt we can find a way to define the differential of along this path at .

(Gee, hope the latex works out.)

Tim: I fixed a bunch of typos in your latex, but the big problem was this: in systems that mix latex and html, like this blog and also the

n-Category Café, you have to be incredibly careful about < and > signs, since these play an important role in html.In the

n-Café you have to be smart enough to use the latex codes \lt and \gt instead of < and > — otherwise you’ll get in trouble.In this blog you can’t even use \lt and \gt — apparently they get translated into < and > and

theythen cause trouble!What you have to do here is use the html codes < and >, even inside latex expressions!

Having told you this, I now expect that you and everyone reading this will remember it forever and never make that mistake again.

I made a big mistake in this blog entry — and possibly the preceding one!It can be seen most easily here:

A Riemannian metric must be symmetric:

On the other hand, the expression on the right-hand side is

notsymmetric:except in the

classicalcase where the observables all commute.Remember, while I was using quantum notation, my setup was supposed to work for both classical and quantum mechanics. Alas, it only works in the classical case.

You might think the the cyclic property of the trace saves the day:

But this property does

notmean everything commutes inside a trace:So what did I do wrong?

I think it was simply defining the ‘metric’ by

This is symmetric in the classical case, but not the quantum case!

So, I need to go back to the drawing board, at least when it comes to the quantum case.

By the way, for a while I thought my mistake lay here:

In fact we don’t have

unless and commute. However, I believe the cyclic property of the trace

isenough to showSo, it’s possible that all my calculations are correct, and the only problem is that I’m working with a bizarre asymmetric (and possibly degenerate) version of a ‘Riemannian metric’.

But I need to think more.

In your more general framework, could you still define

?

In the original case of Fisher information metric, we had

so that

and we have

In the more general case you’re considering, maybe you could have

in which case, it may make more sense to define

If we then use the gauge freedom to choose such that , then this reduces to

which is kind of neat and gives me a sense of deja vu.

Phorgyphynance made various suggestions, including:

This is a great idea! It’s automatically symmetric. To see when it’s positive definite, and understand what it

means, I will compute it in terms of the observables that I defined.This is not a great idea! This doesn’t parse if is supposed to be a tensor (and hopefully a Riemannian metric). Why? Because is an

observable-valuedfunction on . For example, in the quantum case — the only case I’m having problems with — it’s anoperator-valuedfunction.So then is an

operator-valued rank 2 symmetric tensoron , and while that’s probably good for something, it’s not what I’m looking for.In the classical case is an

random-variable-valued rank 2 symmetric tensor, which is again not what I’m after.So I prefer the first idea.

I’ve also figured out a bunch of stuff myself, but I’ll let it settle for a while before I write about it.

Ok. You’re probably right, but

“feels” better to me for some reason. The fact that it is operator-valued doesn’t bother me too much. For one reason, the metric in my paper with Urs was also a self-adjoint operator.

I’m looking forward to the next part of this series :)

John wrote:

In fact this mistake affected all three blog entries. Luckily, it’s easy to fix.

The problem was that this matrix is not symmetric in the quantum case:

The solution is to take the real part! So now I’ve redefined it:

I’ve tried to fix this problem everywhere it shows up in my blog entries — but not in the comments.

I’ve explained this in more detail in part 4, so if you have any questions, that’s the place to ask ’em!

[…] Part 1 • Part 2 • Part 3 • Part 4 • Part […]

Now I want to describe how the Fisher information metric is related to relative entropy […]

About zero eigenvalues in the density matrix…

— When there aren’t any, then it all seems to boil down to the statement that “all ensembles are gibbs ensembles”. In retrospect, this seems obvious, yet it also seems rather remarkable, because no one ever seems to come out and say it much (well, in physics, they do, but not in other contexts). So I think I learned something new here. (My definition of a “gibbs ensemble” here is “those ensembles which don’t have a zero in the density matrix”, which seems to be a correct definition, right? Or did I miss something?)

— When there are zero eigenvalues, then I have two knee-jerk reactions:

The first, naive one is “well, gee, you should leave those states out of your Hilbert space; it was an error to include them in the first place”.

That attitude fails because the density depends on lambda, and perhaps, as one moves around on the manifold, the density matrix goes from being positive-definite in “most” locations, to having zero eigenvalues in some. But if that’s the case, this too strikes me as being remarkable, at least from the physics point of view.

So, envisioning lambda as a proxy for the temperature, or the fugacity, or whatever, this is saying that, for some values of lambda, there is a state that mixes well into the ensemble, and for other lambdas, it fails to mix at all. I don’t know of any physical system that behaves like this (but maybe my experience is limited). Its like saying that, while manipulating the temperature, there is one state that suddenly completely stops interacting with the rest of the system. Wow! This seems like one heck of a discontinuity, suggesting some phase-transition-like behavior. Call me naive, but describing a model of some kind that exhibits this behaviour seems to be publication-worthy, to me. Unless there’s a “well-known” one you know…

At any rate, it still suggests a way for proceeding forward: you carve up the manifold into pieces, the edges of which are those values of lambda where the density matrix has a zero eigenvalue. As one crosses those edges, one should discard (or ad back in) the detached pure state(s) from your Hilbert space, and otherwise proceed with the usual Gibbs-state calculations.

It would be even more remarkable and amazing if one couldn’t carve up the manifold, say, for example, because the points (values of lambda) for which the density matrix had zeros and where it didn’t were “dense” in each other (in the general topology sense, like rationals dense in reals), or if there was an accumulation point. Of course, mathematically, I suppose this is possible, but as a physical situation, this would seem to be truly remarkable…