A Noether Theorem for Markov Processes

7 March, 2012

I’ll start you off with two puzzles. Their relevance should become clear by the end of this post:

Puzzle 1. Suppose I have a box of jewels. The average value of a jewel in the box is $10. I randomly pull one out of the box. What’s the probability that its value is at least $100?

Puzzle 2. Suppose I have a box full of numbers—they can be arbitrary real numbers. Their average is zero, and their standard deviation is 10. I randomly pull one out. What’s the probability that it’s at least 100?

Before you complain, I’ll admit: in both cases, you can’t actually tell me the probability. But you can say something about the probability! What’s the most you can say?

Noether theorems

Some good news: Brendan Fong, who worked here with me, has now gotten a scholarship to do his PhD at the University of Oxford! He’s talking to with people like Bob Coecke and Jamie Vicary, who work on diagrammatic and category-theoretic approaches to quantum theory.

But we’ve also finished a paper on good old-fashioned probability theory:

• John Baez and Brendan Fong, A Noether theorem for Markov processes.

This is based on a result Brendan proved in the network theory series on this blog. But we go further in a number of ways.

What’s the basic idea?

For months now I’ve been pushing the idea that we can take ideas from quantum mechanics and push them over to ‘stochastic mechanics’, which differs in that we work with probabilities rather than amplitudes. Here we do this for Noether’s theorem.

I should warn you: here I’m using ‘Noether’s theorem’ in an extremely general way to mean any result relating symmetries and conserved quantities. There are many versions. We prove a version that applies to Markov processes, which are random processes of the nicest sort: those where the rules don’t change with time, and the state of the system in the future only depends on its state now, not the past.

In quantum mechanics, there’s a very simple relation between symmetries and conserved quantities: an observable commutes with the Hamiltonian if and only if its expected value remains constant in time for every state. For Markov processes this is no longer true. But we show the next best thing: an observable commutes with the Hamiltonian if and only if both its expected value and standard deviation are constant in time for every state!

Now, we explained this stuff very simply and clearly back in Part 11 and Part 13 of the network theory series. We also tried to explain it clearly in the paper. So now let me explain it in a complicated, confusing way, for people who prefer that.

(Judging from the papers I read, that’s a lot of people!)

I’ll start by stating the quantum theorem we’re trying to mimic, and then state the version for Markov processes.

Noether’s theorem: quantum versions

For starters, suppose both our Hamiltonian H and the observable O are bounded self-adjoint operators. Then we have this:

Noether’s Theorem, Baby Quantum Version. Let H and O be bounded self-adjoint operators on some Hilbert space. Then

[H,O] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value \langle \psi(t), O \psi(t) \rangle is constant as a function of t.

What if O is an unbounded self-adjoint operator? That’s no big deal: we can get a bounded one by taking f(O) where f is any bounded measurable function. But Hamiltonians are rarely bounded for fully realistic quantum systems, and we can’t mess with the Hamiltonian without changing Schrödinger’s equation! So, we definitely want a version of Noether’s theorem that lets H be unbounded.

It’s a bit tough to make the equation [H,O] = 0 precise in a useful way when H is unbounded, because then H is only densely defined. If O doesn’t map the domain of H to itself, it’s hard to know what [H,O] = HO - OH even means! We could demand that H does preserve the domain of O, but a better workaround is instead to say that

[\mathrm{exp}(-itH), O] = 0

for all t. Then we get this:

Noether’s Theorem, Full-fledged Quantum Version. Let H and O be self-adjoint operators on some Hilbert space, with O being bounded. Then

[\mathrm{exp}(-itH),O] = 0

if and only if for all states

\psi(t) = \mathrm{exp}(-itH) \psi

the expected value \langle \psi(t), O \psi(t) \rangle is constant as a function of t.

Here of course we’re using the fact that \mathrm{exp}(-itH) \psi is what we get when we solve Schrödinger’s equation with initial data \psi.

But in fact, this version of Noether’s theorem follows instantly from a simpler one:

Noether’s Theorem, Simpler Quantum Version. Let U be a unitary operator and let O be a bounded self-adjoint operator on some Hilbert space. Then

[U,O] = 0

if and only if for all states \psi,

\langle U \psi, O U \psi \rangle = \langle \psi, O \psi \rangle.

This version applies to a single unitary operator U instead of the 1-parameter unitary group

U(t) = \exp(-i t H)

It’s incredibly easy to prove. And this is is the easiest version to copy over to the Markov case! However, the proof over there is not quite so easy.

Noether’s theorem: stochastic versions

In stochastic mechanics we describe states using probability distributions, not vectors in a Hilbert space. We also need a new concept of ‘observable’, and unitary operators will be replaced by ‘stochastic operators’.

Suppose that X is a \sigma-finite measure space with a measure we write simply as dx. Then probability distributions \psi on X lie in L^1(X). Let’s define an observable O to be any element of the dual space L^\infty(X), allowing us to define the expected valued of O in the probability distribution \psi to be

\langle O, \psi \rangle = \int_X O(x) \psi(x) \, dx

The angle brackets are supposed to remind you of quantum mechanics, but we don’t have an inner product on a Hilbert space anymore! Instead, we have a pairing between L^1(X) and L^\infty(X). Probability distributions live in L^1(X), while observables live in L^\infty(X). But we can also think of an observable O as a bounded operator on L^1(X), namely the operator of multiplying by the function O.

Let’s say an operator

U : L^1(X) \to L^1(X)

is stochastic if it’s bounded and it maps probability distributions to probability distributions. Equivalently, U is stochastic if it’s linear and it obeys

\psi \ge 0 \implies U \psi \ge 0

and

\int_X (U\psi)(x) \, dx = \int_X \psi(x) \, dx

for all \psi \in L^1(X).

A Markov process, or technically a Markov semigroup, is a collection of operators

U(t) : L^1(X) \to L^1(X)

for t \ge 0 such that:

U(t) is stochastic for all t \ge 0.

U(t) depends continuously on t.

U(s+t) = U(s)U(t) for all s,t \ge 0.

U(0) = I.

By the Hille–Yosida theorem, any Markov semigroup may be written as

U(t) = \exp(tH)

for some operator H, called its Hamiltonian. However, H is typically unbounded and only densely defined. This makes it difficult to work with the commutator [H,O]. So, we should borrow a trick from quantum mechanics and work with the commutator [\exp(tH),O] instead. This amounts to working directly with the Markov semigroup instead of its Hamiltonian. And then we have:

Noether’s Theorem, Full-fledged Stochastic Version. Suppose X is a \sigma-finite measure space and

U(t) : L^1(X) \to L^1(X)

is a Markov semigroup. Suppose O is an observable. Then

[U(t),O] = 0

for all t \ge 0 if and only if for all probability distributions \psi on X, \langle O, U(t) \psi \rangle and \langle O^2, U(t) \psi \rangle are constant as a function of t.

In plain English: time evolution commutes with an observable if the mean and standard deviation of that observable never change with time. As in the quantum case, this result follows instantly from a simpler one, which applies to a single stochastic operator:

Noether’s Theorem, Simpler Stochastic Version. Suppose X is a \sigma-finite measure space and

U : L^1(X) \to L^1(X)

is stochastic operator. Suppose O is an observable. Then

[U,O] = 0

if and only if for all probability distributions \psi on X,

\langle O, U \psi \rangle = \langle O, \psi \rangle

and

\langle O^2, U \psi \rangle = \langle O^2, \psi \rangle

It looks simple, but the proof is a bit tricky! It’s easy to see that [U,O] = 0 implies those other equations; the work lies in showing the converse. The reason is that [U,O] = 0 implies

\langle O^n, U \psi \rangle = \langle O^n, \psi \rangle

for all n, not just 1 and 2. The expected values of the powers of O are more or less what people call its moments. So, we’re saying all the moments of O are unchanged when we apply U to an arbitrary probability distribution, given that we know this fact for the first two.

The proof is fairly technical but also sort of cute: we use Chebyshev’s inequality, which says that the probability of a random variable taking a value at least k standard deviations away from its mean is less than or equal to 1/k^2. I’ve always found this to be an amazing fact, but now it seems utterly obvious. You can figure out the proof yourself if you do the puzzles at the start of this post.

But now I’ll let you read our paper! And I’m really hoping you’ll spot mistakes, or places it can be improved.


The Education of a Scientist

29 February, 2012

Why are scientists like me getting so worked up over Elsevier and other journal publishers? It must seem strange from the outside. This cartoon explains it very clearly. It’s hilarious—except that it’s TRUE!!! This is why we need a revolution.

(It’s true except for one small thing: in math and physics, Elsevier and Springer let us put our papers on our websites and free electronic archives… though not the final version, only the near-final draft. This is a concession we had to fight for.)

What can you do? Two easy things:

• If you’re an academic, add your name to the boycott of Elsevier.

• If you’re a US citizen, sign this White House petition before March 9.

Why the problem is hard

Why is it so hard it is to solve the journal problem? Here’s a quick simplified explanation for outsiders—people who don’t live in the world of university professors.

There are lots of open-access journals that are free to read but the author needs to pay a fee. There are even lots that are free to read and free for the author. Why doesn’t everyone switch to publishing in these? Lots of us have. But most haven’t. Two reasons:

1) These journals aren’t as “prestigious” as the journals owned by the evil Big Three publishers: Elsevier, Springer, and Wiley-Blackwell. In the last 30 years the Big Three bought most of the really “prestigious” journals – and a journal can’t become “prestigious” overnight, so while things are changing, they’re changing slowly.

Publishing in a “prestigious” journal helps you get hired, promoted, and get grants. “Prestige” is not a vague thing: it’s even measured numerically using something called the Impact Factor. It may be baloney, but it is collectively agreed-upon baloney. Trying to make it go away is like trying to make money go away: people would not know what to do without it.

2) It’s not the professors who pay the outrageous subscription fees for journals – it’s the university libraries. So nothing instantly punishes the professors for publishing in “prestigious” but highly expensive journals, except the nasty rules about resharing journal articles, which however are invisible if you live in a world of professors where everyone has library access!

So, the problem is hard to solve. The fight will be hard.

But we’ll win anyway, because the current situation is just too outrageous to tolerate. We have strategies and we’re pursuing lots of them. You can help by doing those two easy things.


Research Work Act Dead — What Next?

28 February, 2012

A larger victory for the rebel forces!

One day after Elsevier dropped its support for the Research Works Act, the people pushing this ugly bill—who coincidentally get regular contributions of cash from Elsevier—have decided to let it die! At least for now.

Here’s the joint statement from Representatives Darrell Issa and Carolyn B. Maloney, who proposed this bill… together with my translation into plain English:

“The introduction of HR 3699 has spurred a robust, expansive debate on the topics of scientific and scholarly publishing, intellectual property protection, and public access to federally funded research. Since its introduction, we have heard from numerous stakeholders and interested parties on both sides of this important issue.

Translation: the Association of American Publishers supported this bill because it would crush the Public Access Policy that makes taxpayer-funded medical research freely accessible online… and stop this practice from spreading to other kinds of research.

But then, scholars and ordinary people worldwide erupted in a wave of revulsion, even using this bill as an extra reason for boycotting the publisher Elsevier—the most vocal supporter of this bill.

For example, all way across the Atlantic, an editorial in the Guardian shouted: “The result would be an ethical disaster: preventable deaths in developing countries, and an incalculable loss for science in the USA and worldwide. The only winners would be publishing corporations such as Elsevier (£724m profits on revenues of £2b in 2010—an astounding 36% of revenue taken as profit).”

Since Elsevier is a global corporation, this is not what they want people to read in British newspapers.

As the costs of publishing continue to be driven down by new technology, we will continue to see a growth in open access publishers. This new and innovative model appears to be the wave of the future.

Translation: the big publishers are doomed in the long run.

The transition must be collaborative…

Translation: but Elsevier makes $100 million in profits every month now, so let's not move too fast.

… and must respect copyright law and the principles of open access. The American people deserve to have access to research for which they have paid.

Translation: we’re not evil. We’re for everything that sounds good, even things in direct contradiction to the bill we proposed!

This conversation needs to continue and we have come to the conclusion that the Research Works Act has exhausted the useful role it can play in the debate.

We’ve beaten, for now.

As such, we want Americans concerned about access to research and other participants in this debate to know we will not be taking legislative action on HR 3699, the Research Works Act. We do intend to remain involved in efforts to examine and study the protection of intellectual property rights and open access to publicly funded research.

But watch out: we’re not giving up.

So, we need to heed the words of open-access advocate Peter Suber:

This is a victory for what The Economist called the Academic Spring. It shows that academic discontent—expressed in blogs, social media, mainstream news media, and open letters to Congress—can defeat legislation supported by a determined and well-funded lobby. Let’s remember that, and let’s prove that this political force can go beyond defeating bad legislation, like the Research Works Act, to enacting good legislation, like the Federal Research Public Access Act.

So, folks, please:

1) Learn about the Federal Research Public Access Act. Blog about it, tweet about it: this is one of the few really good bills that Congress has considered for a long time.

2) If you’re a US citizen, sign the White House petition supporting the Federal Research Public Access Act. If 25,000 sign by March 9th, the president will review it.

3) No matter where you are, add a comment supporting the Federal Research Public Access Act to the Alliance for Taxpayer Access petition. And if you live in the US, sign the petition!

4) If you teach or study at a university, click on the picture below to get a PDF file of a poster explaining the boycott. Print it out and put it on your office door. While for some of us the Elsevier boycott is old news, a surprising number of people who should know haven’t heard of it yet! Within the field of mathematics, professional associations are planning a PR blitz to solve that problem. But if you’re in one of the bigger sciences, like biologist and chemistry, we really need you to help publicize what’s going on.


Elsevier Gives Up On Research Work Act

27 February, 2012

A small victory for the rebel forces:

Elsevier Withdraws Support for the Research Works Act.

Yay! Let’s keep up the pressure, crush the Research Works Act, and move to take the offensive!

In case you haven’t heard yet: this nasty bill would stop the National Institute of Health from making taxpayer-funded research freely available to US taxpayers. It’s supported by the Association of American Publishers (AAP)—but various AAP members, including MIT Press, Rockefeller University Press, Nature Publishing Group, and the American Association for the Advancement of Science have already come out against it. Now, put under pressure by the spreading boycott, Elsevier has dropped its support.

However, they make it crystal clear that this is just a tactical retreat:

… while withdrawing support for the Research Works Act, we will continue to join with those many other nonprofit and commercial publishers and scholarly societies that oppose repeated efforts to extend mandates through legislation.

I don’t know what position Springer and Wiley-Blackwell take on this bill: besides Elsevier they’re the biggest science publishers. If they all drop their support, the bill may die. And then we can take the offensive and push for the Federal Research Public Access Act.

This bill would make sure the people who pay for U.S. government-funded research—us, the taxpayers—don’t have to pay again just to see what we bought. It would do this by expanding what’s already standard practice at the National Institute of Health to some other big funding agencies, like the National Science Foundation.

On Google+, open-access hero Peter Suber writes:

This is a victory for what The Economist called the Academic Spring. It shows that academic discontent—expressed in blogs, social media, mainstream news media, and open letters to Congress—can defeat legislation supported by a determined and well-funded lobby. Let’s remember that, and let’s prove that this political force can go beyond defeating bad legislation, like the Research Works Act, to enacting good legislation, like the Federal Research Public Access Act.

Indeed, for companies like Elsevier, the great thing about bills like the Research Work Act is that they make us work hard just to keep the status quo, instead of what we really want: changing the status quo for the better. And they’re perfectly happy to stage a tactical retreat in a little skirmish like this if it distracts us from our real goals.

So, let’s keep at it! For starters, if you teach or study at a university, you can click on the picture below, get a PDF file of a poster that explains the boycott, print it out, and put it on your door. While for some of us the Elsevier boycott is old news, a surprising number of people who should know haven’t heard of it yet!

Luckily, a PR blitz in various math journals will start to change that, at least in the field of mathematics. And soon I’ll talk about some exciting plans being developed on Math 2.0. But if you’re a biologist or chemist, for example, you really need to start the revolution over in your field.


Elsevier and Springer Sue University Library

20 February, 2012

The battle is heating up! Now Elsevier, Springer and a smaller third publisher are suing a major university in Switzerland, the Eidgenössische Technische Hochschule Zürich, or ETH Zürich for short. Why? Because this university’s library is distributing copies of their journal articles at a lower cost than the publishers themselves.

Aren’t university libraries supposed to make journal articles available? Over on Google+, Willie Wong explains:

My guess is that they are complaining about how the ETH Library (as well as many other libraries in the NEBIS system) offers Electronic Document Delivery.

It is free for staff and researchers, and private individuals who purchase a library membership can ask for articles for a fee. It is a nice service: otherwise most of us would just go to the library, borrow the printed journal, and scan it ourselves (when the electronic copy is not part of the library’s subscription). This way the library does the scanning for us (so we benefit from time better used) and the library benefits from less undesired wear-and-tear and loss from their paper copies.

The publishers probably think the library is illegally reselling their journal articles! But here’s an article by the head of the ETH library, making his side of the case:

• Wolfram Neubauer, A thorn in the side for science publishers, ETH Life, 17 February 2012.

He says the delivery of electronic copies of documents is allowed by the Swiss Copyright Act. He also makes a broader moral case:

• More or less all scientifically relevant journals rely on the results of publicly funded research.

• The brunt of evaluating scientific findings (i.e. peer reviewing) is borne by the scientific community, with the publishers playing only a supporting role.

• By far the most important customers for all major science publishers are academic libraries, the vast majority of which are themselves supported by public funding.

He concludes:

In the legal proceedings, the aim must therefore be to strike a balance between the services provided by the ETH-Bibliothek for the benefit of science and research on the one hand and the commercial interests of the publishers on the other.

It’ll be interesting to see how this goes in court. Either way, a kind of precedent will be set.


Quantropy (Part 3)

18 February, 2012

I’ve been talking a lot about ‘quantropy’. Last time we figured out a trick for how to compute it starting from the partition function of a quantum system. But it’s hard to get a feeling for this concept without some examples.

So, let’s compute the partition function of a free particle on a line, and see what happens…

The partition function of a free particle

Suppose we have a free particle on a line tracing out some path as time goes by:

q: [0,T] \to \mathbb{R}

Then its action is just the time integral of its kinetic energy:

\displaystyle{ A(q) = \int_0^T \frac{mv(t)^2}{2} \; dt }

where

\displaystyle{ v(t) = \frac{d q(t)}{d t} }

is its velocity. The partition function is then

Z = \displaystyle{\int e^{i A(q) / \hbar} \; Dq }

where we integrate an exponential involving the action over the space of all paths q. Unfortunately, the space of all paths is infinite-dimensional, and the thing we’re integrating oscillates wildly. Integrals like this tend to make mathematicians run from the room screaming. For example, nobody is quite sure what Dq means in this expresson. There is no ‘Lebesgue measure’ on an infinite-dimensional vector space.

There is a lot to say about this, but if we just want to get some answers, it’s best to sneak up on the problem gradually.

Discretizing time

We’ll start by treating time as discrete—a trick Feynman used in his original work. We’ll consider n time intervals of length \Delta t. Say the position of our particle at the ith time step is q_i \in \mathbb{R}. We’ll require that the particle keeps a constant velocity between these time steps. This will reduce the problem of integrating over ‘all’ paths—whatever that means, exactly—to the more manageable problem of integrating over a finite-dimensional space of paths. Later we can study what happens as the time steps get shorter and more numerous.

Let’s call the particle’s velocity between the (i-1)st and ith time steps v_i.

\displaystyle{ v_i = \frac{q_i - q_{i-1}}{\Delta t} }

The action, defined as an integral, is now equal to a finite sum:

\displaystyle{ A(q) = \sum_{i = 1}^n \frac{mv_i^2}{2} \; \Delta t }

We’ll consider histories of the particle where its initial position is

q_0 = 0

but its final position q_n is arbitrary. Why? If we don’t ‘nail down’ the particle at some particular time, our path integrals will diverge. So, our space of histories is

X = \mathbb{R}^n

and now we’re ready to apply the formulas we developed last time!

We saw last time that the partition function is the key to all wisdom, so let’s start with that. Naively, it’s

\displaystyle{  Z = \int_X e^{- \beta A(q)} Dq }

where

\displaystyle{ \beta = \frac{1}{i \hbar} }

But there’s a subtlety here. Doing this integral requires a measure on our space of histories. Since the space of histories is just \mathbb{R}^n with coordinates q_1, \dots, q_n, an obvious guess for a measure would be

Dq = dq_1 \cdots dq_n    \qquad \qquad \qquad \qquad \quad \textrm{(obvious first guess)}

However, the partition function should be dimensionless! You can see why from the discussion of units last time. But the quantity \beta A(q) and thus its exponential is dimensionless, so our mesasure had better be dimensionless too. But dq_1 \cdots dq_n has units of lengthn. To deal with this we can introduce a length scale, which I’ll call \Delta x, and use the measure

Dq = \displaystyle{ \frac{1}{(\Delta x)^n} \, dq_1 \cdots dq_n }   \qquad \qquad \qquad  \textrm{(what we'll actually use)}

I should however emphasize that despite the notation \Delta x, I’m not discretizing space, just time. We could also discretize space, but it would make the calculation a lot harder. I’m only introducing this length scale \Delta x to make our measure on the space of histories dimensionless.

Now let’s compute the partition function. For starters, we have

\begin{array}{ccl} Z &=& \displaystyle{ \int_X e^{-\beta A(q)} \; Dq } \\  \\ &=& \displaystyle{  \frac{1}{(\Delta x)^n} \int e^{-\beta \sum_{i=1}^n m \, \Delta t \, v_i^2 /2} \; dq_1 \cdots dq_n } \end{array}

Normally when I see an integral bristling with annoying constants like this, I switch to a system of units where most of them equal 1. But I’m trying to get a physical feel for quantropy, so I’ll leave them all in. That way, we can see how they affect the final answer.

Since

\displaystyle{ v_i = \frac{q_i - q_{i-1}}{\Delta t} }

we can show that

dq_1 \cdots dq_n = (\Delta t)^n \; dv_1 \cdots dv_n

To show this, we need to work out the Jacobian of the transformation from the q_i coordinates to the v_i coordinates on our space of histories—but this is easy to do, since the determinant of a triangular matrix is the product of its diagonal entries.

We can rewrite the path integral using this change of variables:

Z = \displaystyle{\left(\frac{\Delta t}{\Delta x}\right)^n \int e^{-\beta \sum_{i=1}^n m \, \Delta t \, v_i^2 /2}  \; dv_1 \cdots dv_n }

But since an exponential of a sum is a product of exponentials, this big fat n-tuple integral is really just a product of n ordinary integrals. And all these integrals are equal, so we just get some integral to the nth power! Let’s call the variable in this integral v, since it could be any of the v_i:

Z =  \displaystyle{ \left(\frac{\Delta t}{\Delta x}  \int_{-\infty}^\infty e^{-\beta \, m \, \Delta t \, v^2 /2} \; dv \right)^n }

How do we do the integral here? Well, that’s easy…

Integrating Gaussians

We should all know the integral of our favorite Gaussian. As a kid, my favorite was this:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2} \; d x = \sqrt{\pi} }

because this looks the simplest. But now, I prefer this:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \; d x = \sqrt{2 \pi} }

They’re both true, so why did my preference change? First, I now like 2\pi better than \pi. There’s a whole manifesto about this, and I agree with it. Second, x^2/2 is better than x^2 for what we’re doing, since kinetic energy is one half the mass times the velocity squared. Originally physicists like Descartes and Leibniz defined kinetic energy to be m v^2, but the factor of 1/2 turns out to make everything work better. Nowadays every Hamiltonian or Lagrangian with a quadratic term in it tends to have a 1/2 in front—basically because the first thing you do with it is differentiate it, and the 1/2 cancels the resulting 2. The factor of 1/2 is just a convention, even in the definition of kinetic energy, but if we didn’t make that convention we’d be punished with lots of factors of 2 all over.

Or course it doesn’t matter much: you just need to remember the integral of some Gaussian, or at least know how to calculate it. And you’ve probably read this quote:

A mathematician is someone to whom

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \; d x = \sqrt{2 \pi} }

is as obvious as 2+2=4 is to you and me. – Lord Kelvin

So, you probably learned the trick for doing this integral, so you can call yourself a mathematician.

Stretching the above Gaussian by a factor of \sqrt{\alpha} increases the integral by a factor of \sqrt{\alpha}, so we get

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2\alpha} \; d x = \sqrt{2 \pi \alpha}  }

This is clear when \alpha is positive, but soon we’ll apply it when \alpha is imaginary! That makes some mathematicians sweaty and nervous. For example, we’re saying that

\displaystyle{ \int_{-\infty}^\infty e^{i x^2 / 2} \, dx = \sqrt{2 \pi i}}

But this integral doesn’t converge if you slap absolute values on the function inside: in math jargon, the function inside isn’t ‘Lebesgue integrable’. But we can tame it in various ways. We can impose a ‘cutoff’ and then let it go to infinity:

\displaystyle{ \lim_{M \to + \infty} \int_{-M}^M e^{i x^2 / 2} \, dx = \sqrt{2 \pi i} }

or we can damp the oscillations, and then let the amount of damping go to zero:

\displaystyle{ \lim_{\epsilon \downarrow 0} \int_{-\infty}^\infty e^{(i - \epsilon) x^2 / 2} \, dx = \sqrt{2 \pi i} }

We get the same answer either way, or indeed using many other methods. Since such tricks work for all the integrals I’ll write down, I won’t engage in further hand-wringing over this issue. We’ve got bigger things to worry about, like: what’s the physical meaning of quantropy?

Computing the partition function

Where were we? We had this formula for the partition function:

Z =  \displaystyle{ \left( \frac{\Delta t}{\Delta x} \int_{-\infty}^\infty e^{-\beta \, m \, \Delta t \, v^2 /2}  \; dv \right)^n }

and now we’re letting ourselves use this formula:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2\alpha} \; d x = \sqrt{2 \pi \alpha}  }

even when \alpha is imaginary, so we get

Z = \displaystyle{ \left( \frac{\Delta t}{\Delta x} \sqrt{ \frac{2 \pi}{\beta m \, \Delta t}} \right)^n =  \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right)^{n/2}  }

And a nice thing about keeping all these constants floating around is that we can use dimensional analysis to check our work. The partition function should be dimensionless, and it is! To see this, just remember that \beta = 1/i\hbar has dimensions of inverse action, or T/M L^2.

Expected action

Now that we’ve got the partition function, what do we do with it? We can compute everything we care about. Remember, in statistical mechanics there’s a famous formula:

free energy = expected energy – temperature × entropy

and last time we saw that similarly, in quantum mechanics we have:

free action = expected action – classicality × quantropy

where the classicality is

1/\beta = 1/i \hbar

In other words:

\displaystyle{ F = \langle A \rangle - \frac{1}{\beta}\, Q }

Last time I showed you how to compute F and \langle A \rangle starting from the partition function. So, we can use the above formula to work out the quantropy as well:

Expected action \langle A \rangle = - \frac{d}{d \beta} \ln Z
Free action F = -\frac{1}{\beta} \ln Z
Quantropy Q = \ln Z - \beta \,\frac{d }{d \beta}\ln Z

But let’s start with the expected action. The answer will be so amazingly simple, yet strange, that I’ll want to spend the rest of this post discussing it.

Using our hard-won formula

\displaystyle{ Z = \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right)^{n/2}  }

we get

\begin{array}{ccl} \langle A \rangle &=& \displaystyle{ -\frac{d}{d \beta} \ln Z } \\  \\  &=& \displaystyle{ -\frac{n}{2}  \frac{d}{d \beta}  \ln \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right) } \\  \\ &=& \displaystyle{ -\frac{n}{2}  \frac{d}{d \beta} \left( \ln \left(\frac{2 \pi \Delta t}{m \, (\Delta x)^2}\right) - \ln \beta \right) } \\   \\  &=& \displaystyle{ \frac{n}{2} \; \frac{1}{\beta} }  \\  \\ &=& \displaystyle{ n\;  \frac{i \hbar}{2} }  \end{array}

Wow! When get an answer this simple, it must mean something! This formula is saying that the expected action of our freely moving quantum particle is proportional to n, the number of time steps. Each time step contributes i \hbar / 2 to the expected action. The mass of the particle, the time step \Delta t, and the length scale \Delta x don’t matter at all!

Why don’t they matter? Well, you can see from the above calculation that they just disappear when we take the derivative of the logarithm containing them. That’s not a profound philosophical explanation, but it implies that our action could be any quadratic function like this:

A : \mathbb{R}^n \to \mathbb{R}

\displaystyle{ A(x) = \sum_{i = 1}^n \frac{c_i x_i^2}{2} }

where c_i are positive numbers, and we’d still get the same expected action:

\langle A \rangle = \displaystyle{ n\; \frac{i \hbar}{2} }

The numbers c_i don’t matter!

The quadratic function we’re talking about here is an example of a quadratic form. Because the numbers c_i are positive, it’s a positive definite quadratic form. And since we can diagonalize any positive definite quadratic form, we can state our result in a fancier, more elegant way:

Whenever the action is a positive definite quadratic form on an n-dimensional vector space of histories, the expected action is n times i \hbar / 2.

For example, take a free particle in 3d Euclidean space, and discretize time into n steps as we’ve done here. Then the action is a positive definite quadratic form on a 3n-dimensional vector space:

\displaystyle{ A(q) = \sum_{i = 1}^n \frac{m \vec{v}_i \cdot \vec{v}_i}{2} \; \Delta t }

since now each velocity \vec{v}_i is a vector with 3 components. So, the expected action is 3n times i \hbar / 2.

Poetically speaking, 3n is the total number of ‘decisions’ our particle makes throughout its history. What do I mean by that? In the path integral approach to quantum mechanics, a system can trace out any history it wants. But takes a bunch of real numbers to determine a specific history. Each number counts as one ‘decision’. And in the situation we’ve described, each decision contributes i \hbar / 2 to the expected action.

So here’s a more intuitive way to think about our result:

In the path integral approach to quantum theory, each ‘decision’ made by the system contributes i \hbar / 2 to the expected action… as long as the action is given by a positive definite quadratic form on some vector space of histories.

There’s a lot more to say about this. For example, in the harmonic oscillator the action is a quadratic form, but it’s not positive definite. What happens then? But three more immediate questions leap to my mind:

1) Why is the expected action imaginary?

2) Should we worry that it diverges as n \to \infty?

3) Is this related to the heat capacity of an ideal gas?

So, let me conclude this post by trying to answer those.

Why is the expected action imaginary?

The action A is real. How in the world can its expected value be imaginary?

The reason is that we’re not taking its expected value with respect to an probability measure, but instead, with respect to a complex-valued measure. Last time we gave this very general definition:

\langle A \rangle = \displaystyle{  \frac{\int_X A(x) e^{-\beta A(x)} \, dx }{\int_X e^{-\beta A(x)} \, dx }}

The action A is real, but \beta = 1 / i \hbar is imaginary, so it’s not surprising that this ‘expected value’ is complex-valued.

Later we’ll see a good reason why it has to be purely imaginary.

Why does it diverge as n → ∞?

Consider our particle on a line, with time discretized into n time steps. Its expected action is

\langle A \rangle = \displaystyle{ n\; \frac{i \hbar}{2} }

To take the continuum limit we must let n \to \infty while simultaneously letting \Delta t \to 0 in such a way that n \Delta t stays constant. Some quantities will converge when we take this limit, but the expected action will not. It will go to infinity!

That’s a bit sad, but not unexpected. It’s a lot like how the expected length of the path of a particle carrying out Brownian motion is infinite. In 3 dimensions, a typical Brownian path looks like this:


In fact the free quantum particle is just a ‘Wick-rotated’ version of Brownian motion, where we replace time by imaginary time, so the analogy is fairly close. The action we’re considering now is not exactly analogous to the arclength of a path:

\displaystyle{ \int_0^T \left| \frac{d q}{d t} \right| \; dt }

Instead, it’s proportional to this quadratic form:

\displaystyle{ \int_0^T \left| \frac{d q}{d t} \right|^2 \; dt }

However, both these quantities diverge when we discretize Brownian motion and then take the continuum limit.

How sad should we be that the expected action is infinite in the continuum limit? Not too sad, I think. Any result that applies to all discretizations of a continuum problem should, I think, say something about that continuum problem. For us the expected action diverges, but the ‘expected action per decision’ is constant, and that’s something we can hope to understand even in the continuum limit!

Is this related to the heat capacity of an ideal gas?

That may seem like a strange question, unless you remember some formulas about the thermodynamics of an ideal gas!

Let’s say we’re in 3d Euclidean space. (Most of us already are, but some of my more spacy friends will need to pretend.) If we have an ideal gas made of n point particles at temperature T, its expected energy is

\frac{3}{2} n k T

where k is Boltzmann’s constant. This is a famous fact, which lets people compute the heat capacity of a monatomic ideal gas.

On the other hand, we’ve seen that in quantum mechanics, a single point particle will have an expected action of

\frac{3}{2} n i \hbar

after n time steps.

These results look awfully similar. Are they related?

Yes! These are just two special cases of the same result! The energy of the ideal gas is a quadratic form on a 3n-dimensional vector space; so is the action of our discretized point particle. The ideal gas is a problem in statistical mechanics; the point particle is a problem in quantum mechanics. In statistical mechanics we have

\displaystyle{ \beta = \frac{1}{k T} }

while in quantum mechanics we have

\displaystyle{ \beta = i \hbar }

Mathematically, they are the exact same problem except that \beta is real in one case, imaginary in the other. This is another example of the analogy between statistical mechanics and quantum mechanics—the analogy that motivated quantropy in the first place!

And this makes it even more obvious that the expected action must be imaginary… at least when the action is a positive definite quadratic form.


Math 2.0

16 February, 2012

Building on the Elsevier boycott, a lot of people are working on positive steps to make expensive journals obsolete. My email is flooded with discussions, different groups making different plans.

Email is great, but not for everything. So Andrew Stacey (the technical mastermind behind the nLab, Azimuth Wiki and Azimuth Forum):

and Scott Morrison (one of the brains behind MathOverflow, an important math question-and-answer website):

have started a forum to talk about the many issues involved:

Math 2.0.

That’s good, because these guys actually do stuff, not just talk! Andrew describes the idea here:

The purpose of Math 2.0 is to provide a forum for discussion of the future of mathematical publishing. It’s something that I’ve viewed as an important issue for years, and have had many, many interesting conversations about, but somehow nothing much seems to happen. I’m hoping that the momentum from Tim Gowers’ recent blog posts might lead to something and I’d like to capitalise on that.

However, most of the discussion currently is happening in the comments on blog posts. This is hard to follow, and hard to separate out the new suggestions from the discussions on old ones. I think that forums are much better for discussion, hence this one.

The name, Math2.0, is intended to signify two things: that it’s time for an upgrade of the mathematical environment and that I think we can learn a lot from looking at how software—particularly open source software—works. By “mathematical environment”, I don’t mean how we actually do the mathematics but what happens next, particularly communicating the ideas that we create. This is where the internet can really change things for the better (as it has started to do with the arXiv), but where I think that we have yet to figure out how to make best use of it.

This doesn’t just include journals, but I think that that’s an obvious place to start.

So: welcome to Math2.0. Please join in. It’s important.

Andrew Stacey has also emphasized a principle that’s good for reducing chat about starry-eyed visions and focusing on what we can do now:

In all these discussions, there is one point that I would like to make at the start and which I think is relevant to any proposal to set up something new for mathematicians (or more generally, for academics). That is that whatever system is set up it must be:

Useful at the point of use

This is something that I’ve learnt from administering the nLab over the past few years. It keeps going and there is no sign of it slowing down. The secret of its success, I maintain, is that it is useful at the point of use. When I write something on the nLab, I benefit immediately. I can link to previous things I’ve written, to definitions that others have written, and so link my ideas to many others. It means that if I want to talk to someone about something, the thing we are talking about is easily visible and accessible to both (or all) of us. If I want to remember what it was I was thinking about a year ago, I can easily find it. The fact that when I come back the next day, whatever I’ve added has been improved, polished, and added to, is a bonus—but it would still be useful if that didn’t happen.

For other things, then I need more of an incentive to participate. MathOverflow was a lot of fun in the beginning, but now I find that a question needs to be such that it’s fairly clear that I’m one of the few people in the world who can answer. It’s not that my enthusiasm for the site has gone down, just that everything else keeps pushing it out of the way. So a new system has to be useful to those who use it, and ideally the usefulness should be proportional to the amount of effort that one puts in.

A corollary of this is that it should be useful even if only a small number of people use it. The number of core users of the nLab is not large, but nevertheless the nLab is still extremely useful to us. I can imagine that when a proposal for something new is made, there will be a variety of reactions ranging from “That’ll never work” through to “Sounds interesting, but …” with only a few saying “Count me in!”. To have a chance of succeeding, it has to be the case that those few can get it off the ground and demonstrate that it works, without the input of the wider sceptical community.

So: if you’re a mathematician or programmer interested in revolutionizing the future of math publishing, go to Math 2.0, register, and join the conversation! You’ll see there are a number of concrete proposals on the table, including one by Chris Lee, and Marc Harper and myself.

I’ll say more about those later. But I want to add a principle of my own to Andrew’s ‘useful at the point of use’. The goal is not to get a universal consensus on the future of math publishing! Instead, we need a healthy dissensus in which different groups of people develop different systems—so we can see which ones work.

In biology, evolution happens when some change is useful at the point of use—and it doesn’t happen by consensus, either. When some fish gradually became amphibians, they didn’t wait for all fish to agree this was a good move. And indeed it’s good that we still have fish.

Jan Velterop has some interesting thoughts on the evolution of scholarly publishing, which you can read here:

• Richard Poynder, The open access interviews: Jan Velterop, February 2012.

Velterop writes:

As a geologist I go so far as to say that I see analogies with the Permian-Triassic boundary and the Cretaceous-Tertiary boundary, when life on Earth changed dramatically due to fundamental and sudden changes in the environment.

Those boundary events, as they are known, resulted in mass extinctions, and that’s an unavoidable evolutionary consequence of sudden dramatic environmental changes.

But they also open up ecological niches for new, or hitherto less successful, forms of life. In this regard, it is interesting to see the recent announcement of F1000 Research, which intends to address the major issues afflicting scientific publishing.

[…]

The evolution of scientific communication will go on, without any doubt, and although that may not mean the total demise of the traditional models, these models will necessarily change. After all, some dinosaur lineages survived as well. We call them birds. And there are some very attractive ones. They are smaller than the dinosaurs they evolved from, though. Much smaller.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers