Fluid Flows and Infinite-Dimensional Manifolds (Part 2)

12 May, 2012

Or: ideal fluids—dry water?

guest post by Tim van Beek

Last time in this series, we set the stage by explaining infinite dimensional manifolds. Then we looked at a simple example: the inviscid Burgers equation. We saw this was the equation for geodesics in the diffeomorphism group of the circle.

Now let’s look at a more interesting example! It will still be a simplified model of fluid flow: it will describe an ideal fluid that is incompressible. I’ll start by explaining these concepts. We will then see how the equation of motion for ideal incompressible fluids can be interpreted as a geodesic equation.

En route I will also repeat some stuff from classical vector analysis, mostly for my own sake. The last time I seriously had to calculate with it was when I attended a class on “classical electrodynamics”, which was almost 15 years ago!

When we delve into differential geometry, it is always a good idea to look both at the “coordinate free” formulation using abstract concepts like differential forms, and also at the “classical vector analysis” part, that is best for calculating stuff once suitable coordinates have been chosen. Our fluid flows will take place in a smooth, orientable, compact, n-dimensional Riemannian manifold M, possibly with a smooth boundary \partial M.

I will frequently think of M as an open set in \mathbb{R}^2 or \mathbb{R}^3, so I will use the globally defined coordinate chart of Euclidean coordinates on \mathbb{R}^n denoted by x, y (and z, if needed) without further warning.

Before we continue: Last time our reader “nick” pointed out a blog post by Terence Tao about the same topic as ours, but—as could be expected—assuming a little bit more of a mathematical background: The Euler-Arnold equation. If you are into math, you might like to take a look at it.

So, let us start with the first important concept: the ‘ideal fluid’.

What is an ideal fluid?

When you are a small parcel in a fluid flow, you will feel two kinds of forces:

external forces like gravity that are there whether or not your fellow fluid parcels surround you or are absent,

internal forces that come from your interaction with the other fluid parcels.

If there is friction between you and other fluid parcels, for example, then there will be a force slowing down faster parcels and speeding up slower parcels. This is called viscosity. I already explained it back in the post Eddy Who? High viscosity means that there is a lot of friction: think of honey.

The presence of viscosity leads to shear stress whenever there are differences in the velocities of nearby fluid parcels. These lead to the formation of eddies and therefore to turbulence. This complicates matters considerably! For this reason, sometimes people like to simplify matters and to assume that the fluid flow that they consider has zero viscosity. This leads us to the physics definition of an ideal fluid:

An ideal fluid (as physicists say) is a fluid with zero viscosity.

As you can guess, I have also a mathematical definition in store for you:

An ideal fluid (as mathematicians say) is a fluid with the following property: For any motion of the fluid there is a (real valued) function p(x, t) called the pressure such that if S is a surface in the fluid with a chosen unit normal n, the force of stress exerted across the surface S per unit area at x \in S at time t is p(x,t) n.

This implies that there is no force acting tangentially to the surface S:

pressure in an ideal fluid

This picture is from

• Alexandre Chorin and Jerrold E. Marsden, A Mathematical Introduction to Fluid Mechanics, 3rd edition, Springer, New York 1993.

An ideal fluid cannot form eddies by itself without the help of external forces, nor can eddies vanish once they are present. So this simplification exclude a lot of very interesting phenomena, including everything that is usually associated with the term ‘turbulence’. But it is a necessary simplification for describing fluid flow using geodesic equations, because something moving along a geodesic doesn’t lose energy due to friction! So we will have to stick with it for now.

Historically, ideal fluids were almost exclusively studied during the 19th century, because the mathematics of viscous fluids seemed to be too hard—which it still is, although there has been a lot of progress. T his led to a schism of theoretical hydrodynamics and engineering hydrodynamics, because engineers had to handle effects like turbulence that ideal fluids cannot model. A very problematic aspect is that no body with a subsonic velocity feels any drag force in an ideal fluid. This is known as D’Alembert’s paradox. This means that one cannot find out anything about optimal design of ships or aircrafts or cars using ideal fluids as a model. This situation was overcome by the invention of ‘boundary layer techniques’ by the physicist Ludwig Prandtl at the beginning of the 20th century.

John von Neumann is cited by Richard Feynman in his physics lectures as having said that ideal fluids are like “dry water”, because they are so unlike real water. This is what the subtitle of this post alludes to. I don’t think this is quite fair to say. Along these lines one could say that quantum mechanics is the theory of stagnant light, because it does not include relativistic effects like quantum field theory does. Of course every mathematical model is always just an approximation to a Gedankenexperiment. And ideal fluids still have their role to play.

Maybe I will tell you more about this in a follow-up post, but before this one gets too long, let us move on to our second topic: incompressible fluids and ‘volume preserving’ diffeomorphisms.

What is an incompressible fluid flow?

If you are a parcel of an incompressible fluid, this means that your volume does not change over time. But your shape may, so if you start out as a sphere, after some time you may end up as an ellipsoid. Let’s make this mathematically precise.

But first note, that “incompressible” in the sense above means that the density of a given fluid parcel does not change over time. It does not mean that the density of the whole fluid is everywhere the same. A fluid like that is actually called homogeneous. So we have two different notions:

incompressible means that the volume of an infinitesimal fluid parcel does not change as it moves along the fluid flow,

homogeneous means that the density at a given time is everywhere the same, that is: constant in space.

This distinction is important, but for now we will study fluid flows that are both homogeneous and incompressible.

Let us see how we can make the notion of “incompressible” mathematically precise:

Remember from the last post: The flow of each fluid parcel is described by a path on M parametrized by time, so that for every time t \ge t_0 there is a diffeomorphism

g^t : M \to M

defined by the requirement that it maps the initial position x of each fluid parcel to its position g^t(x) at time t:

schematic fluid flow

Now let’s assume our fluid flow is incompressible. What does that mean for the diffeomorphisms that describe the flow? Assuming that we have a volume form \mu on M, these diffeomorphisms must conserve it:

\mathrm{SDiff}(M) := \{ f \in \mathrm{Diff}(M): f^* \mu = \mu \}

For people who need a reminder of the concepts involved (which includes me), here it is:

Remember that M is a smooth orientable Riemannian manifold of dimension n. A volume form \mu is a n-form that vanishes nowhere. In \mathbb{R}^3 with Cartesian coordinates x, y, z the canonical example would be

\mu = d x \wedge  d y \wedge  d z

The dual basis of d x, d y, d z is denoted by \partial_x, \partial_y, \partial_z in our example.

Given two manifolds M, N and a differentiable map f: M \to N, we can pull back a differential form \mu on N to one on M via

f^{*} \mu_p (v_1, ..., v_n) = \mu_{f(p)} (d f(v_1), ..., d f(v_n))

For the übernerds out there: remember that we see the group of diffeomorpisms \mathrm{Diff}(M) as a Fréchet Lie group modelled on the Fréchet space of vector fields on M, \mathrm{Vec}(M). For those who would like to read more about this concept, try this:

• Karl-Hermann Neeb, Monastir Summer School: infinite-dimensional Lie groups.

\mathrm{SDiff}(M) is clearly a subgroup of \mathrm{Diff}(M). It is less obvious, but true, that it is a closed subgroup and therefore itself a Lie group. What about its Lie algebra? For a vector field to give a flow that’s volume preserving, it must have zero divergence. So, the vector fields that form the tangent space T_{\mathrm{id}} \mathrm{SDiff}(M) consist of all smooth vector fields V with zero divergence:

\mathrm{div}(V) = 0

These vector fields form a vector space we denote by \mathrm{SVec}(M). Remember T_{\mathrm{id}} stands for the tangent space at the identity element of the group \mathrm{SDiff}(M), which is the identity diffeomorphism \mathrm{id} of M. The tangent space at the identity of a Lie group is a Lie algebra, so \mathrm{SVec}(M) is a Lie algebra.

I will need a little refresher about the definition of divergence. Then I will point you to a proof of the claim above, namely that zero-divergence vector fields form the Lie algebra of volume preserving diffeomorphism. This may seem obvious on an intuitive level, if you ever learned that the zero-divergence vector fields have ‘no sinks and no sources’, for example in a course on classical electromagnetism.

So, what is the divergence, again? You’ve probably seen it somewhere if you’ve survived reading this so far, but you may not have seen it in full generality.

The divergence of a vector field V with respect to a volume form \mu is the unique scalar function \mathrm{div}(V) such that:

\mathrm{div}(V)\, \mu = d (i_V \mu)

Here, i_X is the contraction with X. Contraction means that you feed the vector X in the first slot of the differential form \mu, and therefore reduce the function \mu of n vector fields to one of n-1 vector fields.

When we use our standard example M = \mathbb{R}^3, we of course write a vector field as

V = V_x \partial_x + V_y\partial_y + V_z \partial_z

where V_x, V_y and V_z are smooth real-valued functions. The divergence of V is then

\mathrm{div}(V) = \partial_x  V_x + \partial_y V_y + \partial_z V_z

which we get if we plug in the expression for V into the formula d(i_V \mu).

So, how does one see that ‘zero divergence’ of a vector field is equivalent to ‘volume preserving’ for the flow it generates?

If we write

\phi(t) = (x(t), y(t), z(t))

for the path of a fluid particle and $u$ for its velocity, then of course we have:

\displaystyle{ \frac{d \phi}{d t} = u }

For a scalar function f(t, x(t), y(t), z(t)) we get

\displaystyle{ \frac{d f}{d t} = \frac{\partial f}{\partial t} + u \cdot \mathrm{grad}(f) }

Here \cdot is the inner product. The latter part is often written with the help of the nabla operator \nabla as

u \cdot \mathrm{grad}(f) = u \cdot \nabla \; f

This is really just a handy short notation, there is no mystery behind it: it’s just like how we write the divergence as \mathrm{div}(X) = \nabla \cdot X and the curl as \mathrm{curl}(X) = \nabla \times X.

The operator

D_t = \partial_t + u \cdot \nabla

appears so often that it has its own name: it is called the material derivative.

Why ‘material’? Because if we follow a little bit of material—what we’re calling a parcel of fluid—something about it can change with time for two different reasons. First, this quantity can explicitly depend on time: that’s what the first term, \partial_t, is about. Second, this quantity can depend on where you are, so it changes as the parcel moves: that’s what u \cdot \nabla is about.

Now suppose we have a little parcel of fluid. We’ve been talking about it intuitively, but mathematically we can describe it at time zero as an open set W_0 in our manifold. After a time t, it will be mapped by the fluid flow g^t to

W_t :=  g^t (W_0)

This describes how our parcel moves. We define the fluid to be incompressible if the volume of W_t for all choices of W_0 is constant, that is:

\displaystyle{ 0 = \frac{d}{d t} \int_{W_t} d \mu }

If we write J^t for the Jacobian determinant of g^t, then we have

\displaystyle{ 0 = \frac{d}{d t} \int_{W_t} d \mu = \frac{d}{d t} \int_{W_0} J^t d \mu }

So in a first step we get that a fluid flow is incompressible iff the Jacobian determinant J is 1 for all times, which is true iff g^t is volume preserving.

It is not that hard to show by a direct calculation that

\displaystyle{ \left. \partial_t J\right|_{t=0} = \mathrm{div}(u) J }

If you don’t want to do it yourself, you can look it up in a book that I already mentioned:

• Alexandre Chorin and Jerrold E. Marsden, A Mathematical Introduction to Fluid Mechanics, 3rd edition, Springer-Verlag, New York 1993.

This is the connection between ‘volume preserving’ and ‘zero divergence’! Inserting this into our equation of incompressibility, we finally get:

\begin{array}{ccl}   0 &=& \displaystyle{ \frac{d}{d t} \int_{W_t} d \mu } \\ \\  &=& \displaystyle{\frac{d}{d t} \int_{W_0} J^t d \mu } \\ \\  &=& \displaystyle{\int_{W_0} \mathrm{div}(u) J d \mu  }  \end{array}

which is true for all open sets W_0 iff \mathrm{div}(u) = 0. The equation of continuity for a fluid flow is:

\displaystyle{ \frac{\partial \rho}{\partial t} + \mathrm{div}(\rho u) = 0 }

This says that mass is conserved. Written with the material derivative it is:

\displaystyle{ \frac{D \rho}{D t} + \rho \, \mathrm{div}(u) = 0 }

So, since we’re assuming \mathrm{div}(u) = 0, we get

\displaystyle{  \frac{D \rho}{D t} = 0 }

which is what we intuitively expect, namely that the density is constant for a fluid parcel following the fluid flow.

Euler’s equation for the ideal incompressible fluid

The equation of motion for an ideal incompressible fluid is Euler’s equation:

\partial_t u + (u \cdot \nabla) u = - \nabla p

p is the pressure function mentioned in the mathematical definition of an ideal fluid above. As I already mentioned, to be precise I should say that we also assume that the fluid is homogeneous. This means that the density \rho is constant both in space and time and therefore can be cancelled from the equation of motion.

If M has a nonempty (smooth) boundary \partial M, the equation is supplemented by the boundary condition that u is tangential to \partial M.

How can we turn this equation into a geodesic equation on \mathrm{SDiff}(M)? Our strategy will be the same as last time when we handled the diffeomorphism group of the circle. We will define the necessary gadgets of differential geometry on \mathrm{SDiff}(M) using the already existing ones on M. First we define them on T_{\mathrm{id}}\mathrm{SDiff}(M). Then, for any diffeomorphism \phi \in \mathrm{SDiff}(M), we use right translation by \phi to define them on T_{\phi}\mathrm{SDiff}(M). After that, we can use the version of the abstract version of the geodesic equation for right invariant metrics to calculate the explicit differential equation behind it.

Let us start with defining right invariant vector fields on \mathrm{SDiff}(M). A right invariant vector field U is a vector field such that there is a u \in \mathrm{SVec}(M) such that U_{\phi} = u \circ \phi. In the following, we restrict ourselves to right invariant vector fields only.

We define the usual L^2 inner product of vector fields u, v on M just as last time:

\displaystyle{ \langle u, v \rangle = \int_M \langle u_x, v_x \rangle \; d \mu (x) }

The inner product used on the right is of course the one on M.

For two right invariant vector fields U, V with U_{\phi} = u \circ \phi and V_{\phi} = v \circ \phi, we define the inner product on T_{\phi}\mathrm{SDiff}(M) by

\langle U, V \rangle_{\phi} = \langle u, v \rangle

This definition induces a right invariant metric on \mathrm{SDiff}(M). Note that it is right invariant because we are only considering volume preserving diffeomorphisms. It is not right invariant on the larger group of all diffeomorphims \mathrm{Diff}(M)!

For an incompressible ideal fluid without external fields the only kind of energy one has to consider is the kinetic energy. The inner product that we use is actually proportional to the kinetic energy of the whole fluid flow at a fixed time. So geodesics with respect to the induced metric will correspond to Hamilton’s extremal principle. In fact it is possible to formulate all this in the language of Hamiltonian systems, but I will stop here and return to the quest of calculating the geodesic equation.

Last but not least, we define the following right invariant connection:

\nabla_{U_{\phi}} V_{\phi} = (\nabla_{u} v) \circ \phi

Here \nabla on the right is the connection on M—sorry, this is not quite the same as the \nabla we’d been using earlier! But in \mathbb{R}^3 or Euclidean space of any other dimension, \nabla_u v is just another name for (u \cdot \nabla) v, so don’t get scared.

Remember from last time that the geodesic equation says

\nabla_u u = 0

where u is the velocity vector of our geodesic, say

\displaystyle{ u(t) = \frac{d}{d t} \gamma(t) }

where \gamma is the curve describing our geodesic. We saw that for a right-invariant metric on a Lie group, this equation says

\partial_t u = \mathrm{ad}^*_u u

where the coadjoint operator \mathrm{ad}^* is defined by

\langle \mathrm{ad}^*_u v, w \rangle = \langle v, \mathrm{ad}_u w \rangle = \langle v, [u, w] \rangle

For simplicity, let us specialize to \mathbb{R}^3, or an open set in there. What can we say about the right hand side of the above equation in this case? First, we have the vector identity

\nabla \times (u \times w) = - [u, w] + u \; \nabla \cdot w - w \; \nabla \cdot u

Since we are talking about divergence-free vector fields, we actually have

[u, w] = - \nabla \times (u \times w)

Also note that for a scalar function f and the divergence-free vector field u we have

\begin{array}{ccl} \langle u, \nabla f \rangle &=& \int_M \langle u(x), \nabla f(x) \rangle \; d \mu (x) \\ \\ &=& \int_M \nabla \cdot (f(x) u(x)) \; d \mu (x) \\ \\ &=& \int_{\partial M} f(x) \; \langle u, n \rangle \; d S (x) \\ \\ &=& 0 \end{array}

The last term is zero because of our boundary condition that says that the velocity field u is tangent to \partial M.

So, now I am ready to formulate my claim that

\mathrm{ad}^*_u v = - (\nabla \times v) \times u + \nabla f

for some yet undetermined scalar function f. This can be verified by a direct calculation:

\begin{array}{ccl} \langle \mathrm{ad}^*_u v, w \rangle &=& \langle v, \mathrm{ad}_u w \rangle \\ \\ &=& \langle v, [u, w] \rangle \\  \\  &=&  \int_M \langle v_x, [u, w]_x \rangle \;d\mu(x)  \\ \\ &=& - \int_M \langle v_x, (\nabla \times (u \times w))_x \rangle \;d \mu(x)  \end{array}

What next? We can use the following 3 dimensional version of Green’s theorem for the curl operator:

\int_M ( \langle \nabla \times a, b  \rangle - \langle a, \nabla \times b \rangle ) d \mu = \int_{\partial M} \langle a \times b, n \rangle d S

That is, the curl operator is symmetric when acting on vector fields that have no component that is tangent to \partial M. Note that I deliberately forgot to talk about function spaces that our vector fields need to belong to and the regularity assumptions on the domain M and its boundary, because this is a blog post and not a math lecture. tongue But the operators we use on vector fields obviously depend on such assumptions.

If you are interested in how to extend the symmetric curl operator to a self-adjoint operator, for example, you could look it up here:

• R. Hiptmair, P. R. Kotiuga, S. Tordeux, Self-adjoint curl operators.

Since our vector fields are supposed to be tangent to \partial M, we have that the boundary term in our case is

\int_{\partial M} \langle u_x \times w_x \times v_x, n \rangle \; dS = 0

because u_x \times w_x is normal, and therefore u_x \times w_x \times v_x is tangent to \partial M, so its inner product with the normal vector n is zero.

So we can shift the curl operator from right to left like this:

\begin{array}{ccl} - \int_M \langle v_x, (\nabla \times (u \times w))_x \rangle \;d \mu(x) &=& - \int_M \langle (\nabla \times v)_x, (u \times w)_x \rangle \;d \mu(x) \\ \\ &=& - \int_M \langle (\nabla \times v)_x \times u_x, w_x \rangle \;d \mu(x) \end{array}

In the last step we used the cyclicity of the relation of the vector product and the volume spanned by three vectors:

\langle a \times b, c \rangle = \mu(a, b, c) = \mu (c, a, b) = \langle c \times a, b \rangle

This verifies the claim, since the part \nabla f does not contribute, as stated above.

And now, yet another vector identity comes to our rescue:

(\nabla \times v) \times u = (u \cdot \nabla) v - u_k \nabla v_k

So, we finally end up with this:

\begin{array}{ccl} \mathrm{ad}^*_u u &=& - (u \cdot \nabla) u - u_k \nabla u_k + \nabla f \\ \\ &=& - (u \cdot \nabla) u + \nabla g \end{array}

for some function g. Why? Since the middle term u_k \nabla u_k = \frac{1}{2} \nabla u^2 is actually a gradient, we can absorb this summand and \nabla f into one summand with a new function, \nabla g.

Thanks to this formula we derived, the abstract and elegant equation for a geodesic on any Lie group

\partial_t u = \mathrm{ad}^*_u u

becomes, in this special case

\partial_t u = - (u \cdot \nabla) u + \nabla g

If we can convince ourselves that -g is the pressure p of our fluid, we get Euler’s equation:

\partial_t u + (u \cdot \nabla) u = - \nabla p

Wow! Starting with abstract stuff about infinite-dimensional Lie groups, we’ve almost managed to derive Euler’s equation as the geodesic equation on \mathrm{SDiff}(M)! We’re not quite done: we still need to talk about the role of the function g, and why it’s minus the pressure. But that will have to wait for another post.


A Noether Theorem for Markov Processes

7 March, 2012

I’ll start you off with two puzzles. Their relevance should become clear by the end of this post:

Puzzle 1. Suppose I have a box of jewels. The average value of a jewel in the box is $10. I randomly pull one out of the box. What’s the probability that its value is at least $100?

Puzzle 2. Suppose I have a box full of numbers—they can be arbitrary real numbers. Their average is zero, and their standard deviation is 10. I randomly pull one out. What’s the probability that it’s at least 100?

Before you complain, I’ll admit: in both cases, you can’t actually tell me the probability. But you can say something about the probability! What’s the most you can say?

Noether theorems

Some good news: Brendan Fong, who worked here with me, has now gotten a scholarship to do his PhD at the University of Oxford! He’s talking to with people like Bob Coecke and Jamie Vicary, who work on diagrammatic and category-theoretic approaches to quantum theory.

But we’ve also finished a paper on good old-fashioned probability theory:

• John Baez and Brendan Fong, A Noether theorem for Markov processes.

This is based on a result Brendan proved in the network theory series on this blog. But we go further in a number of ways.

What’s the basic idea?

For months now I’ve been pushing the idea that we can take ideas from quantum mechanics and push them over to ‘stochastic mechanics’, which differs in that we work with probabilities rather than amplitudes. Here we do this for Noether’s theorem.

I should warn you: here I’m using ‘Noether’s theorem’ in an extremely general way to mean any result relating symmetries and conserved quantities. There are many versions. We prove a version that applies to Markov processes, which are random processes of the nicest sort: those where the rules don’t change with time, and the state of the system in the future only depends on its state now, not the past.

In quantum mechanics, there’s a very simple relation between symmetries and conserved quantities: an observable commutes with the Hamiltonian if and only if its expected value remains constant in time for every state. For Markov processes this is no longer true. But we show the next best thing: an observable commutes with the Hamiltonian if and only if both its expected value and standard deviation are constant in time for every state!

Now, we explained this stuff very simply and clearly back in Part 11 and Part 13 of the network theory series. We also tried to explain it clearly in the paper. So now let me explain it in a complicated, confusing way, for people who prefer that.

(Judging from the papers I read, that’s a lot of people!)

I’ll start by stating the quantum theorem we’re trying to mimic, and then state the version for Markov processes.

Noether’s theorem: quantum versions

For starters, suppose both our Hamiltonian H and the observable O are bounded self-adjoint operators. Then we have this:

Noether’s Theorem, Baby Quantum Version. Let H and O be bounded self-adjoint operators on some Hilbert space. Then

[H,O] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value \langle \psi(t), O \psi(t) \rangle is constant as a function of t.

What if O is an unbounded self-adjoint operator? That’s no big deal: we can get a bounded one by taking f(O) where f is any bounded measurable function. But Hamiltonians are rarely bounded for fully realistic quantum systems, and we can’t mess with the Hamiltonian without changing Schrödinger’s equation! So, we definitely want a version of Noether’s theorem that lets H be unbounded.

It’s a bit tough to make the equation [H,O] = 0 precise in a useful way when H is unbounded, because then H is only densely defined. If O doesn’t map the domain of H to itself, it’s hard to know what [H,O] = HO - OH even means! We could demand that H does preserve the domain of O, but a better workaround is instead to say that

[\mathrm{exp}(-itH), O] = 0

for all t. Then we get this:

Noether’s Theorem, Full-fledged Quantum Version. Let H and O be self-adjoint operators on some Hilbert space, with O being bounded. Then

[\mathrm{exp}(-itH),O] = 0

if and only if for all states

\psi(t) = \mathrm{exp}(-itH) \psi

the expected value \langle \psi(t), O \psi(t) \rangle is constant as a function of t.

Here of course we’re using the fact that \mathrm{exp}(-itH) \psi is what we get when we solve Schrödinger’s equation with initial data \psi.

But in fact, this version of Noether’s theorem follows instantly from a simpler one:

Noether’s Theorem, Simpler Quantum Version. Let U be a unitary operator and let O be a bounded self-adjoint operator on some Hilbert space. Then

[U,O] = 0

if and only if for all states \psi,

\langle U \psi, O U \psi \rangle = \langle \psi, O \psi \rangle.

This version applies to a single unitary operator U instead of the 1-parameter unitary group

U(t) = \exp(-i t H)

It’s incredibly easy to prove. And this is is the easiest version to copy over to the Markov case! However, the proof over there is not quite so easy.

Noether’s theorem: stochastic versions

In stochastic mechanics we describe states using probability distributions, not vectors in a Hilbert space. We also need a new concept of ‘observable’, and unitary operators will be replaced by ‘stochastic operators’.

Suppose that X is a \sigma-finite measure space with a measure we write simply as dx. Then probability distributions \psi on X lie in L^1(X). Let’s define an observable O to be any element of the dual space L^\infty(X), allowing us to define the expected valued of O in the probability distribution \psi to be

\langle O, \psi \rangle = \int_X O(x) \psi(x) \, dx

The angle brackets are supposed to remind you of quantum mechanics, but we don’t have an inner product on a Hilbert space anymore! Instead, we have a pairing between L^1(X) and L^\infty(X). Probability distributions live in L^1(X), while observables live in L^\infty(X). But we can also think of an observable O as a bounded operator on L^1(X), namely the operator of multiplying by the function O.

Let’s say an operator

U : L^1(X) \to L^1(X)

is stochastic if it’s bounded and it maps probability distributions to probability distributions. Equivalently, U is stochastic if it’s linear and it obeys

\psi \ge 0 \implies U \psi \ge 0

and

\int_X (U\psi)(x) \, dx = \int_X \psi(x) \, dx

for all \psi \in L^1(X).

A Markov process, or technically a Markov semigroup, is a collection of operators

U(t) : L^1(X) \to L^1(X)

for t \ge 0 such that:

U(t) is stochastic for all t \ge 0.

U(t) depends continuously on t.

U(s+t) = U(s)U(t) for all s,t \ge 0.

U(0) = I.

By the Hille–Yosida theorem, any Markov semigroup may be written as

U(t) = \exp(tH)

for some operator H, called its Hamiltonian. However, H is typically unbounded and only densely defined. This makes it difficult to work with the commutator [H,O]. So, we should borrow a trick from quantum mechanics and work with the commutator [\exp(tH),O] instead. This amounts to working directly with the Markov semigroup instead of its Hamiltonian. And then we have:

Noether’s Theorem, Full-fledged Stochastic Version. Suppose X is a \sigma-finite measure space and

U(t) : L^1(X) \to L^1(X)

is a Markov semigroup. Suppose O is an observable. Then

[U(t),O] = 0

for all t \ge 0 if and only if for all probability distributions \psi on X, \langle O, U(t) \psi \rangle and \langle O^2, U(t) \psi \rangle are constant as a function of t.

In plain English: time evolution commutes with an observable if the mean and standard deviation of that observable never change with time. As in the quantum case, this result follows instantly from a simpler one, which applies to a single stochastic operator:

Noether’s Theorem, Simpler Stochastic Version. Suppose X is a \sigma-finite measure space and

U : L^1(X) \to L^1(X)

is stochastic operator. Suppose O is an observable. Then

[U,O] = 0

if and only if for all probability distributions \psi on X,

\langle O, U \psi \rangle = \langle O, \psi \rangle

and

\langle O^2, U \psi \rangle = \langle O^2, \psi \rangle

It looks simple, but the proof is a bit tricky! It’s easy to see that [U,O] = 0 implies those other equations; the work lies in showing the converse. The reason is that [U,O] = 0 implies

\langle O^n, U \psi \rangle = \langle O^n, \psi \rangle

for all n, not just 1 and 2. The expected values of the powers of O are more or less what people call its moments. So, we’re saying all the moments of O are unchanged when we apply U to an arbitrary probability distribution, given that we know this fact for the first two.

The proof is fairly technical but also sort of cute: we use Chebyshev’s inequality, which says that the probability of a random variable taking a value at least k standard deviations away from its mean is less than or equal to 1/k^2. I’ve always found this to be an amazing fact, but now it seems utterly obvious. You can figure out the proof yourself if you do the puzzles at the start of this post.

But now I’ll let you read our paper! And I’m really hoping you’ll spot mistakes, or places it can be improved.


Quantropy (Part 3)

18 February, 2012

I’ve been talking a lot about ‘quantropy’. Last time we figured out a trick for how to compute it starting from the partition function of a quantum system. But it’s hard to get a feeling for this concept without some examples.

So, let’s compute the partition function of a free particle on a line, and see what happens…

The partition function of a free particle

Suppose we have a free particle on a line tracing out some path as time goes by:

q: [0,T] \to \mathbb{R}

Then its action is just the time integral of its kinetic energy:

\displaystyle{ A(q) = \int_0^T \frac{mv(t)^2}{2} \; dt }

where

\displaystyle{ v(t) = \frac{d q(t)}{d t} }

is its velocity. The partition function is then

Z = \displaystyle{\int e^{i A(q) / \hbar} \; Dq }

where we integrate an exponential involving the action over the space of all paths q. Unfortunately, the space of all paths is infinite-dimensional, and the thing we’re integrating oscillates wildly. Integrals like this tend to make mathematicians run from the room screaming. For example, nobody is quite sure what Dq means in this expresson. There is no ‘Lebesgue measure’ on an infinite-dimensional vector space.

There is a lot to say about this, but if we just want to get some answers, it’s best to sneak up on the problem gradually.

Discretizing time

We’ll start by treating time as discrete—a trick Feynman used in his original work. We’ll consider n time intervals of length \Delta t. Say the position of our particle at the ith time step is q_i \in \mathbb{R}. We’ll require that the particle keeps a constant velocity between these time steps. This will reduce the problem of integrating over ‘all’ paths—whatever that means, exactly—to the more manageable problem of integrating over a finite-dimensional space of paths. Later we can study what happens as the time steps get shorter and more numerous.

Let’s call the particle’s velocity between the (i-1)st and ith time steps v_i.

\displaystyle{ v_i = \frac{q_i - q_{i-1}}{\Delta t} }

The action, defined as an integral, is now equal to a finite sum:

\displaystyle{ A(q) = \sum_{i = 1}^n \frac{mv_i^2}{2} \; \Delta t }

We’ll consider histories of the particle where its initial position is

q_0 = 0

but its final position q_n is arbitrary. Why? If we don’t ‘nail down’ the particle at some particular time, our path integrals will diverge. So, our space of histories is

X = \mathbb{R}^n

and now we’re ready to apply the formulas we developed last time!

We saw last time that the partition function is the key to all wisdom, so let’s start with that. Naively, it’s

\displaystyle{  Z = \int_X e^{- \beta A(q)} Dq }

where

\displaystyle{ \beta = \frac{1}{i \hbar} }

But there’s a subtlety here. Doing this integral requires a measure on our space of histories. Since the space of histories is just \mathbb{R}^n with coordinates q_1, \dots, q_n, an obvious guess for a measure would be

Dq = dq_1 \cdots dq_n    \qquad \qquad \qquad \qquad \quad \textrm{(obvious first guess)}

However, the partition function should be dimensionless! You can see why from the discussion of units last time. But the quantity \beta A(q) and thus its exponential is dimensionless, so our mesasure had better be dimensionless too. But dq_1 \cdots dq_n has units of lengthn. To deal with this we can introduce a length scale, which I’ll call \Delta x, and use the measure

Dq = \displaystyle{ \frac{1}{(\Delta x)^n} \, dq_1 \cdots dq_n }   \qquad \qquad \qquad  \textrm{(what we'll actually use)}

I should however emphasize that despite the notation \Delta x, I’m not discretizing space, just time. We could also discretize space, but it would make the calculation a lot harder. I’m only introducing this length scale \Delta x to make our measure on the space of histories dimensionless.

Now let’s compute the partition function. For starters, we have

\begin{array}{ccl} Z &=& \displaystyle{ \int_X e^{-\beta A(q)} \; Dq } \\  \\ &=& \displaystyle{  \frac{1}{(\Delta x)^n} \int e^{-\beta \sum_{i=1}^n m \, \Delta t \, v_i^2 /2} \; dq_1 \cdots dq_n } \end{array}

Normally when I see an integral bristling with annoying constants like this, I switch to a system of units where most of them equal 1. But I’m trying to get a physical feel for quantropy, so I’ll leave them all in. That way, we can see how they affect the final answer.

Since

\displaystyle{ v_i = \frac{q_i - q_{i-1}}{\Delta t} }

we can show that

dq_1 \cdots dq_n = (\Delta t)^n \; dv_1 \cdots dv_n

To show this, we need to work out the Jacobian of the transformation from the q_i coordinates to the v_i coordinates on our space of histories—but this is easy to do, since the determinant of a triangular matrix is the product of its diagonal entries.

We can rewrite the path integral using this change of variables:

Z = \displaystyle{\left(\frac{\Delta t}{\Delta x}\right)^n \int e^{-\beta \sum_{i=1}^n m \, \Delta t \, v_i^2 /2}  \; dv_1 \cdots dv_n }

But since an exponential of a sum is a product of exponentials, this big fat n-tuple integral is really just a product of n ordinary integrals. And all these integrals are equal, so we just get some integral to the nth power! Let’s call the variable in this integral v, since it could be any of the v_i:

Z =  \displaystyle{ \left(\frac{\Delta t}{\Delta x}  \int_{-\infty}^\infty e^{-\beta \, m \, \Delta t \, v^2 /2} \; dv \right)^n }

How do we do the integral here? Well, that’s easy…

Integrating Gaussians

We should all know the integral of our favorite Gaussian. As a kid, my favorite was this:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2} \; d x = \sqrt{\pi} }

because this looks the simplest. But now, I prefer this:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \; d x = \sqrt{2 \pi} }

They’re both true, so why did my preference change? First, I now like 2\pi better than \pi. There’s a whole manifesto about this, and I agree with it. Second, x^2/2 is better than x^2 for what we’re doing, since kinetic energy is one half the mass times the velocity squared. Originally physicists like Descartes and Leibniz defined kinetic energy to be m v^2, but the factor of 1/2 turns out to make everything work better. Nowadays every Hamiltonian or Lagrangian with a quadratic term in it tends to have a 1/2 in front—basically because the first thing you do with it is differentiate it, and the 1/2 cancels the resulting 2. The factor of 1/2 is just a convention, even in the definition of kinetic energy, but if we didn’t make that convention we’d be punished with lots of factors of 2 all over.

Or course it doesn’t matter much: you just need to remember the integral of some Gaussian, or at least know how to calculate it. And you’ve probably read this quote:

A mathematician is someone to whom

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2} \; d x = \sqrt{2 \pi} }

is as obvious as 2+2=4 is to you and me. – Lord Kelvin

So, you probably learned the trick for doing this integral, so you can call yourself a mathematician.

Stretching the above Gaussian by a factor of \sqrt{\alpha} increases the integral by a factor of \sqrt{\alpha}, so we get

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2\alpha} \; d x = \sqrt{2 \pi \alpha}  }

This is clear when \alpha is positive, but soon we’ll apply it when \alpha is imaginary! That makes some mathematicians sweaty and nervous. For example, we’re saying that

\displaystyle{ \int_{-\infty}^\infty e^{i x^2 / 2} \, dx = \sqrt{2 \pi i}}

But this integral doesn’t converge if you slap absolute values on the function inside: in math jargon, the function inside isn’t ‘Lebesgue integrable’. But we can tame it in various ways. We can impose a ‘cutoff’ and then let it go to infinity:

\displaystyle{ \lim_{M \to + \infty} \int_{-M}^M e^{i x^2 / 2} \, dx = \sqrt{2 \pi i} }

or we can damp the oscillations, and then let the amount of damping go to zero:

\displaystyle{ \lim_{\epsilon \downarrow 0} \int_{-\infty}^\infty e^{(i - \epsilon) x^2 / 2} \, dx = \sqrt{2 \pi i} }

We get the same answer either way, or indeed using many other methods. Since such tricks work for all the integrals I’ll write down, I won’t engage in further hand-wringing over this issue. We’ve got bigger things to worry about, like: what’s the physical meaning of quantropy?

Computing the partition function

Where were we? We had this formula for the partition function:

Z =  \displaystyle{ \left( \frac{\Delta t}{\Delta x} \int_{-\infty}^\infty e^{-\beta \, m \, \Delta t \, v^2 /2}  \; dv \right)^n }

and now we’re letting ourselves use this formula:

\displaystyle{ \int_{-\infty}^\infty e^{-x^2/2\alpha} \; d x = \sqrt{2 \pi \alpha}  }

even when \alpha is imaginary, so we get

Z = \displaystyle{ \left( \frac{\Delta t}{\Delta x} \sqrt{ \frac{2 \pi}{\beta m \, \Delta t}} \right)^n =  \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right)^{n/2}  }

And a nice thing about keeping all these constants floating around is that we can use dimensional analysis to check our work. The partition function should be dimensionless, and it is! To see this, just remember that \beta = 1/i\hbar has dimensions of inverse action, or T/M L^2.

Expected action

Now that we’ve got the partition function, what do we do with it? We can compute everything we care about. Remember, in statistical mechanics there’s a famous formula:

free energy = expected energy – temperature × entropy

and last time we saw that similarly, in quantum mechanics we have:

free action = expected action – classicality × quantropy

where the classicality is

1/\beta = 1/i \hbar

In other words:

\displaystyle{ F = \langle A \rangle - \frac{1}{\beta}\, Q }

Last time I showed you how to compute F and \langle A \rangle starting from the partition function. So, we can use the above formula to work out the quantropy as well:

Expected action \langle A \rangle = - \frac{d}{d \beta} \ln Z
Free action F = -\frac{1}{\beta} \ln Z
Quantropy Q = \ln Z - \beta \,\frac{d }{d \beta}\ln Z

But let’s start with the expected action. The answer will be so amazingly simple, yet strange, that I’ll want to spend the rest of this post discussing it.

Using our hard-won formula

\displaystyle{ Z = \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right)^{n/2}  }

we get

\begin{array}{ccl} \langle A \rangle &=& \displaystyle{ -\frac{d}{d \beta} \ln Z } \\  \\  &=& \displaystyle{ -\frac{n}{2}  \frac{d}{d \beta}  \ln \left(\frac{2 \pi \Delta t}{\beta m \, (\Delta x)^2}\right) } \\  \\ &=& \displaystyle{ -\frac{n}{2}  \frac{d}{d \beta} \left( \ln \left(\frac{2 \pi \Delta t}{m \, (\Delta x)^2}\right) - \ln \beta \right) } \\   \\  &=& \displaystyle{ \frac{n}{2} \; \frac{1}{\beta} }  \\  \\ &=& \displaystyle{ n\;  \frac{i \hbar}{2} }  \end{array}

Wow! When get an answer this simple, it must mean something! This formula is saying that the expected action of our freely moving quantum particle is proportional to n, the number of time steps. Each time step contributes i \hbar / 2 to the expected action. The mass of the particle, the time step \Delta t, and the length scale \Delta x don’t matter at all!

Why don’t they matter? Well, you can see from the above calculation that they just disappear when we take the derivative of the logarithm containing them. That’s not a profound philosophical explanation, but it implies that our action could be any quadratic function like this:

A : \mathbb{R}^n \to \mathbb{R}

\displaystyle{ A(x) = \sum_{i = 1}^n \frac{c_i x_i^2}{2} }

where c_i are positive numbers, and we’d still get the same expected action:

\langle A \rangle = \displaystyle{ n\; \frac{i \hbar}{2} }

The numbers c_i don’t matter!

The quadratic function we’re talking about here is an example of a quadratic form. Because the numbers c_i are positive, it’s a positive definite quadratic form. And since we can diagonalize any positive definite quadratic form, we can state our result in a fancier, more elegant way:

Whenever the action is a positive definite quadratic form on an n-dimensional vector space of histories, the expected action is n times i \hbar / 2.

For example, take a free particle in 3d Euclidean space, and discretize time into n steps as we’ve done here. Then the action is a positive definite quadratic form on a 3n-dimensional vector space:

\displaystyle{ A(q) = \sum_{i = 1}^n \frac{m \vec{v}_i \cdot \vec{v}_i}{2} \; \Delta t }

since now each velocity \vec{v}_i is a vector with 3 components. So, the expected action is 3n times i \hbar / 2.

Poetically speaking, 3n is the total number of ‘decisions’ our particle makes throughout its history. What do I mean by that? In the path integral approach to quantum mechanics, a system can trace out any history it wants. But takes a bunch of real numbers to determine a specific history. Each number counts as one ‘decision’. And in the situation we’ve described, each decision contributes i \hbar / 2 to the expected action.

So here’s a more intuitive way to think about our result:

In the path integral approach to quantum theory, each ‘decision’ made by the system contributes i \hbar / 2 to the expected action… as long as the action is given by a positive definite quadratic form on some vector space of histories.

There’s a lot more to say about this. For example, in the harmonic oscillator the action is a quadratic form, but it’s not positive definite. What happens then? But three more immediate questions leap to my mind:

1) Why is the expected action imaginary?

2) Should we worry that it diverges as n \to \infty?

3) Is this related to the heat capacity of an ideal gas?

So, let me conclude this post by trying to answer those.

Why is the expected action imaginary?

The action A is real. How in the world can its expected value be imaginary?

The reason is that we’re not taking its expected value with respect to an probability measure, but instead, with respect to a complex-valued measure. Last time we gave this very general definition:

\langle A \rangle = \displaystyle{  \frac{\int_X A(x) e^{-\beta A(x)} \, dx }{\int_X e^{-\beta A(x)} \, dx }}

The action A is real, but \beta = 1 / i \hbar is imaginary, so it’s not surprising that this ‘expected value’ is complex-valued.

Later we’ll see a good reason why it has to be purely imaginary.

Why does it diverge as n → ∞?

Consider our particle on a line, with time discretized into n time steps. Its expected action is

\langle A \rangle = \displaystyle{ n\; \frac{i \hbar}{2} }

To take the continuum limit we must let n \to \infty while simultaneously letting \Delta t \to 0 in such a way that n \Delta t stays constant. Some quantities will converge when we take this limit, but the expected action will not. It will go to infinity!

That’s a bit sad, but not unexpected. It’s a lot like how the expected length of the path of a particle carrying out Brownian motion is infinite. In 3 dimensions, a typical Brownian path looks like this:


In fact the free quantum particle is just a ‘Wick-rotated’ version of Brownian motion, where we replace time by imaginary time, so the analogy is fairly close. The action we’re considering now is not exactly analogous to the arclength of a path:

\displaystyle{ \int_0^T \left| \frac{d q}{d t} \right| \; dt }

Instead, it’s proportional to this quadratic form:

\displaystyle{ \int_0^T \left| \frac{d q}{d t} \right|^2 \; dt }

However, both these quantities diverge when we discretize Brownian motion and then take the continuum limit.

How sad should we be that the expected action is infinite in the continuum limit? Not too sad, I think. Any result that applies to all discretizations of a continuum problem should, I think, say something about that continuum problem. For us the expected action diverges, but the ‘expected action per decision’ is constant, and that’s something we can hope to understand even in the continuum limit!

Is this related to the heat capacity of an ideal gas?

That may seem like a strange question, unless you remember some formulas about the thermodynamics of an ideal gas!

Let’s say we’re in 3d Euclidean space. (Most of us already are, but some of my more spacy friends will need to pretend.) If we have an ideal gas made of n point particles at temperature T, its expected energy is

\frac{3}{2} n k T

where k is Boltzmann’s constant. This is a famous fact, which lets people compute the heat capacity of a monatomic ideal gas.

On the other hand, we’ve seen that in quantum mechanics, a single point particle will have an expected action of

\frac{3}{2} n i \hbar

after n time steps.

These results look awfully similar. Are they related?

Yes! These are just two special cases of the same result! The energy of the ideal gas is a quadratic form on a 3n-dimensional vector space; so is the action of our discretized point particle. The ideal gas is a problem in statistical mechanics; the point particle is a problem in quantum mechanics. In statistical mechanics we have

\displaystyle{ \beta = \frac{1}{k T} }

while in quantum mechanics we have

\displaystyle{ \beta = i \hbar }

Mathematically, they are the exact same problem except that \beta is real in one case, imaginary in the other. This is another example of the analogy between statistical mechanics and quantum mechanics—the analogy that motivated quantropy in the first place!

And this makes it even more obvious that the expected action must be imaginary… at least when the action is a positive definite quadratic form.


Quantropy (Part 2)

10 February, 2012

In my first post in this series, we saw that filling in a well-known analogy between statistical mechanics and quantum mechanics requires a new concept: ‘quantropy’. To get some feeling for this concept, we should look at some examples. But to do that, we need to develop some tools to compute quantropy. That’s what we’ll do today.

All these tools will be borrowed from statistical mechanics. So, let me remind you how to compute the entropy of a system in thermal equilibrium starting if we know the energy of every state. Then we’ll copy this and get a formula for the quantropy of a system if we know the action of every history.

Computing entropy

Everything in this section is bog-standard. In case you don’t know, that’s British slang for ‘extremely, perhaps even depressingly, familiar’. Apparently it rains so much in England that bogs are not only standard, they’re the standard of what counts as standard!

Let X be a measure space: physically, the set of states of some system. In statistical mechanics we suppose the system occupies states with probabilities given by some probability distribution

p : X \to [0,\infty)

where of course

\int_X p(x) \, dx = 1

The entropy of this probability distribution is

S = - \int_X p(x) \ln(p(x)) \, dx

There’s a nice way to compute the entropy when our system is in thermal equilibrium. This idea makes sense when we have a function

H : X \to \mathbb{R}

saying the energy of each state. Our system is in thermal equilibrium when p maximizes entropy subject to a constraint on the expected value of energy:

\langle H \rangle = \int_X H(x) p(x) \, dx

A famous calculation shows that thermal equilibrium occurs precisely when p is the so-called Gibbs state:

\displaystyle{ p(x) = \frac{e^{-\beta H(x)}}{Z} }

for some real number \beta, where Z is a normalization factor called the partition function:

Z = \int_X e^{-\beta H(x)} \, dx

The number \beta is called the coolness, since physical considerations say that

\displaystyle{ \beta = \frac{1}{T} }

where T is the temperature in units where Boltzmann’s constant is 1.

There’s a famous way to compute the entropy of the Gibbs state; I don’t know who did it first, but it’s both straightforward and tremendously useful. We take the formula for entropy

S = - \int_X p(x) \ln(p(x)) \, dx

and substitute the Gibbs state

\displaystyle{ p(x) = \frac{e^{-\beta H(x)}}{Z} }

getting

\begin{array}{ccl} S &=& \int_X p(x) \left( \beta H(x) - \ln Z \right)\, dx \\   \\  &=& \beta \, \langle H \rangle - \ln Z \end{array}

Reshuffling this a little bit, we obtain:

- T \ln Z = \langle H \rangle - T S

If we define the free energy by

F = - T \ln Z

then we’ve shown that

F = \langle H \rangle - T S

This justifies the term ‘free energy’: it’s the expected energy minus the energy in the form of heat, namely T S.

It’s nice that we can compute the free energy purely in terms of the partition function and the temperature, or equivalently the coolness \beta:

\displaystyle{ F = - \frac{1}{\beta} \ln Z }

Can we also do this for the entropy? Yes! First we’ll do it for the expected energy:

\begin{array}{ccl} \langle H \rangle &=& \displaystyle{ \int_X H(x) p(x) \, dx } \\   \\  &=& \displaystyle{ \frac{1}{Z} \int_X H(x) e^{-\beta H(x)} \, dx } \\   \\  &=& \displaystyle{ -\frac{1}{Z} \frac{d}{d \beta} \int_X e^{-\beta H(x)} \, dx } \\ \\  &=& \displaystyle{ -\frac{1}{Z} \frac{dZ}{d \beta} } \\ \\  &=& \displaystyle{ - \frac{d}{d \beta} \ln Z } \end{array}

This gives

\begin{array}{ccl} S &=& \beta \, \langle H \rangle - \ln Z \\ \\ &=& \displaystyle{ - \beta \, \frac{d \ln Z}{d \beta} - \ln Z }\end{array}

So, if we know the partition function of a system in thermal equilibrium as a function of the temperature, we can work out its entropy, expected energy and free energy.

Computing quantropy

Now we’ll repeat everything for quantropy! The idea is simply to replace the energy by action and the temperature T by i \hbar where \hbar is Planck’s constant. It’s harder to get the integrals to converge in interesting examples. But we’ll worry about that next time, that when we actually do an example!

It’s annoying that in physics S stands for both entropy and action, since in this article we need to think about both. People also use H to stand for entropy, but that’s no better, since that letter also stands for ‘Hamiltonian’! To avoid this let’s use A to stand for action. This letter is also used to mean ‘Helmholtz free energy’, but we’ll just have to live with that. It would be real bummer if we failed to unify physics just because we ran out of letters.

Let X be a measure space: physically, the set of histories of some system. In quantum mechanics we suppose the system carries out histories with amplitudes given by some function

a : X \to \mathbb{C}

where perhaps surprisingly

\int_X a(x) \, dx = 1

The quantropy of this function is

Q = - \int_X a(x) \ln(a(x)) \, dx

There’s a nice way to compute the entropy in Feynman’s path integral formalism. This formalism makes sense when we have a function

A : X \to \mathbb{R}

saying the action of each history. Feynman proclaimed that in this case we have

\displaystyle{ a(x) = \frac{e^{i A(x)/\hbar}}{Z} }

where \hbar is Planck’s constant and Z is a normalization factor called the partition function:

Z = \int_X e^{i A(x)/\hbar} \, dx

Last time I showed that we obtain Feynman’s prescription for a by demanding that it’s a stationary point for the quantropy

Q = - \int_X a(x) \, \ln (a(x)) \, dx

subject to a constraint on the expected action:

\langle A \rangle = \int_X A(x) a(x) \, dx

As I mentioned last time, the formula for quantropy is dangerous, since we’re taking the logarithm of a complex-valued function. There’s not really a ‘best’ logarithm for a complex number: if we have one choice we can add any multiple of 2 \pi i and get another. So in general, to define quantropy we need to pick a choice of \ln (a(x)) for each point x \in X. That’s a lot of ambiguity!

Luckily, the ambiguity is much less when we use Feynman’s prescription for a. Why? Because then a(x) is defined in terms of an exponential, and it’s easy to take the logarithm of an exponential! So, we can declare that

\ln (a(x)) = \displaystyle{ \ln \left( \frac{e^{iA(x)/\hbar}}{Z}\right) } = \frac{i}{\hbar} A(x) - \ln Z

Once we choose a logarithm for Z, this formula will let us define \ln (a(x)) and thus the quantropy.

So let’s do this, and say the quantropy is

\displaystyle{ Q = - \int_X a(x) \left( \frac{i}{\hbar} A(x) - \ln Z \right)\, dx }

We can simplify this a bit, since the integral of a is 1:

\displaystyle{ Q = \frac{1}{i \hbar} \langle A \rangle + \ln Z }

Reshuffling this a little bit, we obtain:

- i \hbar \ln Z = \langle A \rangle - i \hbar Q

By analogy to free energy in statistical mechanics, let’s define the free action by

F = - i \hbar \ln Z

I’m using the same letter for free energy and free action, but they play exactly analogous roles, so it’s not so bad. Indeed we now have

F = \langle A \rangle - i \hbar Q

which is the analogue of a formula we saw for free energy in thermodynamics.

It’s nice that we can compute the free action purely in terms of the partition function and Planck’s constant. Can we also do this for the quantropy? Yes!

It’ll be convenient to introduce a parameter

\displaystyle{ \beta = \frac{1}{i \hbar} }

which is analogous to ‘coolness’. We could call it ‘quantum coolness’, but a better name might be classicality, since it’s big when our system is close to classical. Whatever we call it, the main thing is that unlike ordinary coolness, it’s imaginary!

In terms of classicality, we have

\displaystyle{ a(x) = \frac{e^{- \beta A(x)}}{Z} }

Now we can compute the expected action just as we computed the expected energy in thermodynamics:

\begin{array}{ccl} \langle A \rangle &=& \displaystyle{ \int_X A(x) a(x) \, dx } \\ \\  &=& \displaystyle{ \frac{1}{Z} \int_X A(x) e^{-\beta A(x)} \, dx } \\   \\  &=& \displaystyle{ -\frac{1}{Z} \frac{d}{d \beta} \int_X e^{-\beta A(x)} \, dx } \\ \\  &=& \displaystyle{ -\frac{1}{Z} \frac{dZ}{d \beta} } \\ \\  &=& \displaystyle{ - \frac{d}{d \beta} \ln Z } \end{array}

This gives:

\begin{array}{ccl} Q &=& \beta \,\langle A \rangle - \ln Z \\ \\ &=& \displaystyle{ - \beta \,\frac{d \ln Z}{d \beta} - \ln Z } \end{array}

So, if we can compute the partition function in the path integral approach to quantum mechanics, we can also work out the quantropy, expected action and free action!

Next time I’ll use these formulas to compute quantropy in an example: the free particle. We’ll see some strange and interesting things.

Summary

Here’s where our analogy stands now:

Statistical Mechanics Quantum Mechanics
states: x \in X histories: x \in X
probabilities: p: X \to [0,\infty) amplitudes: a: X \to \mathbb{C}
energy: H: X \to \mathbb{R} action: A: X \to \mathbb{R}
temperature: T Planck’s constant times i: i \hbar
coolness: \beta = 1/T classicality: \beta = 1/i \hbar
partition function: Z = \sum_{x \in X} e^{-\beta H(x)} partition function: Z = \sum_{x \in X} e^{-\beta A(x)}
Boltzmann distribution: p(x) = e^{-\beta H(x)}/Z Feynman sum over histories: a(x) = e^{-\beta A(x)}/Z
entropy: S = - \sum_{x \in X} p(x) \ln(p(x)) quantropy: Q = - \sum_{x \in X} a(x) \ln(a(x))
expected energy: \langle H \rangle = \sum_{x \in X} p(x) H(x) expected action: \langle A \rangle = \sum_{x \in X} a(x) A(x)
free energy: F = \langle H \rangle - TS free action: F = \langle A \rangle - i \hbar Q
\langle H \rangle = - \frac{d}{d \beta} \ln Z \langle A \rangle = - \frac{d}{d \beta} \ln Z
F = -\frac{1}{\beta} \ln Z F = -\frac{1}{\beta} \ln Z
S =  \ln Z - \beta \,\frac{d}{d \beta}\ln Z Q = \ln Z - \beta \,\frac{d }{d \beta}\ln Z

I should also say a word about units and dimensional analysis. There’s enormous flexibility in how we do dimensional analysis. Amateurs often don’t realize this, because they’ve just learned one system, but experts take full advantage of this flexibility to pick a setup that’s convenient for what they’re doing. The fewer independent units you use, the fewer dimensionful constants like the speed of light, Planck’s constant and Boltzmann’s constant you see in your formulas. That’s often good. But here I don’t want to set Planck’s constant equal to 1 because I’m treating it as analogous to temperature—so it’s important, and I want to see it. I’m also finding dimensional analysis useful to check my formulas.

So, I’m using units where mass, length and time count as independent dimensions in the sense of dimensional analysis. On the other hand, I’m not treating temperature as an independent dimension: instead, I’m setting Boltzmann’s constant to 1 and using that to translate from temperature into energy. This is fairly common in some circles. And for me, treating temperature as an independent dimension would be analogous to treating Planck’s constant as having its own independent dimension! I don’t feel like doing that.

So, here’s how the dimensional analysis works in my setup:

Statistical Mechanics Quantum Mechanics
probabilities: dimensionless amplitudes: dimensionless
energy: ML/T^2 action: ML/T
temperature: ML/T^2 Planck’s constant: ML/T
coolness: T^2/ML classicality: T/ML
partition function: dimensionless partition function: dimensionless
entropy: dimensionless quantropy: dimensionless
expected energy: ML/T^2 expected action: ML/T
free energy: ML/T^2 free action: ML/T

I like this setup because I often think of entropy as closely allied to information, measured in bits or nats depending on whether I’m using base 2 or base e. From this viewpoint, it should be dimensionless.

Of course, in thermodynamics it’s common to put a factor of Boltzmann’s constant in front of the formula for entropy. Then entropy has units of energy/temperature. But I’m using units where Boltzmann’s constant is 1 and temperature has the same units as energy! So for me, entropy is dimensionless.


Quantizing Electrical Circuits

2 February, 2012

As you may know, there’s a wonderful and famous analogy between classical mechanics and electrical circuit theory. I explained it back in “week288”, so I won’t repeat that story now. If you don’t know what I’m talking about, take a look!

This analogy opens up the possibility of quantizing electrical circuits by straightforwardly copying the way we quantize classical mechanics problems. I’d often wondered if this would be useful.

It is, and people have done it:

• Michel H. Devoret, Quantum fluctuations in electrical circuits.

Michel Devoret, Rob Schoelkopf and others call this idea quantronics: the study of mesoscopic electronic effects in which collective degrees of freedom like currents and voltages behave quantum mechanically.

I just learned about this from a talk by Sean Barrett here in Coogee. There are lots of cool applications, but right now I’m mainly interested in how this extends the set of analogies between different physical theories.

One interesting thing is how they quantize circuits with resistors. Over in classical mechanics, this corresponds to systems with friction. These systems, called ‘dissipative’ systems, don’t have a conserved energy. More precisely, energy leaks out of the system under consideration and gets transferred to the environment in the form of heat. It’s hard to quantize systems where energy isn’t conserved, so people in quantronics model resistors as infinite chains of inductors and capacitors: see the ‘LC ladder circuit’ on page 15 of Devoret’s notes. This idea is also the basis of the Caldeira–Leggett model of a particle coupled to a heat bath made of harmonic oscillators: it amounts to including the environment as part of the system being studied.


Entropic Forces

1 February, 2012

 

In 2009, Erik Verlinde argued that gravity is an entropic force. This created a big stir—and it helped him win about $6,500,000 in prize money and grants! But what the heck is an ‘entropic force’, anyway?

Entropic forces are nothing unusual: you’ve felt one if you’ve ever stretched a rubber band. Why does a rubber band pull back when you stretch it? You might think it’s because a stretched rubber band has more energy than an unstretched one. That would indeed be a fine explanation for a metal spring. But rubber doesn’t work that way. Instead, a stretched rubber band mainly has less entropy than an unstretched one—and this too can cause a force.

You see, molecules of rubber are like long chains. When unstretched, these chains can curl up in lots of random wiggly ways. ‘Lots of random ways’ means lots of entropy. But when you stretch one of these chains, the number of ways it can be shaped decreases, until it’s pulled taut and there’s just one way! Only past that point does stretching the molecule take a lot of energy; before that, you’re mainly decreasing its entropy.

So, the force of a stretched rubber band is an entropic force.

But how can changes in either energy or entropy give rise to forces? That’s what I want to explain. But instead of talking about force, I’ll start out talking about pressure. This too arises both from changes in energy and changes in entropy.

Entropic pressure — a sloppy derivation

If you’ve ever studied thermodynamics you’ve probably heard about an ideal gas. You can think of this as a gas consisting of point particles that almost never collide with each other—because they’re just points—and bounce elastically off the walls of the container they’re in. If you have a box of gas like this, it’ll push on the walls with some pressure. But the cause of this pressure is not that slowly making the box smaller increases the energy of the gas inside: in fact, it doesn’t! The cause is that making the box smaller decreases the entropy of the gas.

To understand how pressure has an ‘energetic’ part and an ‘entropic’ part, let’s start with the basic equation of thermodynamics:

d U = T d S - P d V

What does this mean? It means the internal energy U of a box of stuff changes when you heat or cool it, meaning that you change its entropy S, but also when you shrink or expand it, meaning that you change its volume V. Increasing its entropy raises its internal energy at a rate proportional to its temperature T. Increasing its volume lowers its internal energy at a rate proportional to its pressure P.

We can already see that both changes in energy, U, and entropy, S, can affect P d V. Pressure is like force—indeed it’s just force per area—so we should try to solve for P.

First let’s do it in a sloppy way. One reason people don’t like thermodynamics is that they don’t understand partial derivatives when there are lots different coordinate systems floating around—which is what thermodynamics is all about! So, they manipulate these partial derivatives sloppily, feeling a sense of guilt and unease, and sometimes it works, but other times it fails disastrously. The cure is not to learn more thermodynamics; the cure is to learn about differential forms. All the expressions in the basic equation d U = T d S - P d V are differential forms. If you learn what they are and how to work with them, you’ll never get in trouble with partial derivatives in thermodynamics—as long as you proceed slowly and carefully.

But let’s act like we don’t know this! Let’s start with the basic equation

d U = T d S - P d V

and solve for P. First we get

P d V = T d S - d U

This is fine. Then we divide by d V and get

\displaystyle{ P = T \frac{d S}{d V} - \frac{d U}{d V} }

This is not so fine: here the guilt starts to set in. After all, we’ve been told that we need to use ‘partial derivatives’ when we have functions of several variables—and the main fact about partial derivatives, the one that everybody remembers, is that these are written with with curly d’s, not ordinary letter d’s. So we must have done something wrong. So, we make the d’s curly:

\displaystyle{ P = T \frac{\partial S}{\partial V} - \frac{\partial U}{\partial V} }

But we still feel guilty. First of all, who gave us the right to make those d’s curly? Second of all, a partial derivative like \frac{\partial S}{\partial V} makes no sense unless V is one of a set of coordinate functions: only then we can talk about how much some function changes as we change V while keeping the other coordinates fixed. The value of \frac{\partial S}{\partial V} actually depends on what other coordinates we’re keeping fixed! So what coordinates are we using?

Well, it seems like one of them is V, and the other is… we don’t know! It could be S, or P, or T, or perhaps even P. This is where real unease sets in. If we’re taking a test, we might in desperation think something like this: “Since the easiest things to control about our box of stuff are its volume and its temperature, let’s take these as our coordinates!” And then we might write

\displaystyle{ P = T \left.\frac{\partial S}{\partial V}\right|_T - \left.\frac{\partial U}{\partial V}\right|_T }

And then we might do okay on this problem, because this formula is in fact correct! But I hope you agree that this is an unsatisfactory way to manipulate partial derivatives: we’re shooting in the dark and hoping for luck.

Entropic pressure and entropic force

So, I want to show you a better way to get this result. But first let’s take a break and think about what it means. It means there are two possible reasons a box of gas may push back with pressure as we try to squeeze it smaller while keeping its temperature constant. One is that the energy may go up:

\displaystyle{ -\left.\frac{\partial U}{\partial V}\right|_T }

will be positive if the internal energy goes up as we squeeze the box smaller. But the other reason is that entropy may go down:

\displaystyle{  T \left.\frac{\partial S}{\partial V}\right|_T }

will be positive if the entropy goes down as we squeeze the box smaller, assuming T > 0.

Let’s turn this fact into a result about force. Remember that pressure is just force per area. Say we have some stuff in a cylinder with a piston on top. Say the the position of the piston is given by some coordinate x, and its area is A. Then the stuff will push on the piston with a force

F = P A

and the change in the cylinder’s volume as the piston moves is

d V = A d x

Then

\displaystyle{  P = T \left.\frac{\partial S}{\partial V}\right|_T - \left.\frac{\partial U}{\partial V}\right|_T }

gives us

\displaystyle{ F = T \left.\frac{\partial S}{\partial x}\right|_T - \left.\frac{\partial U}{\partial x}\right|_T }

So, the force consists of two parts: the energetic force

\displaystyle{ F_{\mathrm{energetic}} = - \left.\frac{\partial U}{\partial x}\right|_T }

and the entropic force:

\displaystyle{ F_{\mathrm{entropic}} =  T \left.\frac{\partial S}{\partial x}\right|_T}

Energetic forces are familiar from classical statics: for example, a rock pushes down on the table because its energy would decrease if it could go down. Entropic forces enter the game when we generalize to thermal statics, as we’re doing now. But when we set T = 0, these entropic forces go away and we’re back to classical statics!

Entropic pressure—a better derivation

Okay, enough philosophizing. To conclude, let’s derive

\displaystyle{ P = T \left.\frac{\partial S}{\partial V}\right|_T - \left.\frac{\partial U}{\partial V}\right|_T }

in a less sloppy way. We start with

d U = T d S - P d V

which is true no matter what coordinates we use. We can choose 2 of the 5 variables here as local coordinates, generically at least, so let’s choose V and T. Then

\displaystyle{ d U = \left.\frac{\partial U}{\partial V}\right|_T d V + \left.\frac{\partial U}{\partial T}\right|_V d T }

and similarly

\displaystyle{ d S = \left.\frac{\partial S}{\partial V}\right|_T d V + \left.\frac{\partial S}{\partial T}\right|_V d T }

Using these, our equation

d U = T d S - P d V

becomes

\displaystyle{ \left.\frac{\partial U}{\partial V}\right|_T d V + \left.\frac{\partial U}{\partial T}\right|_V d T = T \left(\left.\frac{\partial S}{\partial V}\right|_T d V + \left.\frac{\partial S}{\partial T}\right|_V d T \right) - P dV }

If you know about differential forms, you know that the differentials of the coordinate functions, namely d T and d V, form a basis of 1-forms. Thus we can equate the coefficients of d V in the equation above and get:

\displaystyle{ \left.\frac{\partial U}{\partial V}\right|_T = T \left.\frac{\partial S}{\partial V}\right|_T - P }

and thus:

\displaystyle{ P = T \left.\frac{\partial S}{\partial V}\right|_T - \left.\frac{\partial U}{\partial V}\right|_T }

which is what we wanted! There should be no bitter aftertaste of guilt this time.

The big picture

That’s almost all I want to say: a simple exposition of well-known stuff that’s not quite as well-known as it should be. If you know some thermodynamics and are feeling mildly ambitious, you can now work out the pressure of an ideal gas and show that it’s completely entropic in origin: only the first term in the right-hand side above is nonzero. If you’re feeling a lot more ambitious, you can try to read Verlinde’s papers and explain them to me. But my own goal was not to think about gravity. Instead, it was to ponder a question raised by Allen Knutson: how does the ‘entropic force’ idea fit into my ruminations on classical mechanics versus thermodynamics?

It seems to fit in this way: as we go from classical statics (governed by the principle of least energy) to thermal statics at fixed temperature (governed by the principle of least free energy), the definition of force familiar in classical statics must be adjusted. In classical statics we have

\displaystyle{ F_i = - \frac{\partial U}{\partial q^i}}

where

U: Q \to \mathbb{R}

is the energy as a function of some coordinates q^i on the configuration space of our system, some manifold Q. But in thermal statics at temperature T our system will try to minimize, not the energy U, but the Helmholtz free energy

A = U - T S

where

S : Q \to \mathbb{R}

is the entropy. So now we should define force by

\displaystyle{ F_i = - \frac{\partial A}{\partial q^i}}

and we see that force has an entropic part and an energetic part:

\displaystyle{  F_i = T \frac{\partial S}{\partial q^i}} -  \frac{\partial U}{\partial q^i}

When T = 0, the entropic part goes away and we’re back to classical statics!


I’m subject to the natural forces.Lyle Lovett


A Quantum Hammersley–Clifford Theorem

29 January, 2012

I’m at this workshop:

Sydney Quantum Information Theory Workshop: Coogee 2012, 30 January – 2 February 2012, Coogee Bay Hotel, Coogee, Sydney, organized by Stephen Bartlett, Gavin Brennen, Andrew Doherty and Tom Stace.

Right now David Poulin is speaking about a quantum version of the Hammersley–Clifford theorem, which is a theorem about Markov networks. Let me quickly say a bit about what he proved! This will be a bit rough, since I’m doing it live…

The mutual information between two random variables is

I(A:B) = S(A) + S(B) - S(A,B)

The conditional mutual information between three random variables C is

I(A:B|C) = \sum_c p(C=c) I(A:B|C=c)

It’s the average amount of information about B learned by measuring A when you already knew C.

All this works for both classical (Shannon) and quantum (von Neumann) entropy. So, when we say ‘random variable’ above, we
could mean it in the traditional classical sense or in the quantum sense.

If I(A:B|C) = 0 then A, C, B has the following Markov property: if you know C, learning A tells you nothing new about B. In condensed matter physics, say a spin system, we get (quantum) random variables from measuring what’s going on in regions, and we have short range entanglement if I(A:B|C) = 0 when C corresponds to some sufficiently thick region separating the regions A and B. We’ll get this in any Gibbs state of a spin chain with a local Hamiltonian.

A Markov network is a graph with random variables at vertices (and thus subsets of vertices) such that I(A:B|C) = 0 whenever C is a subset of vertices that completely ‘shields’ the subset A from the subset B: any path from A to B goes through a vertex in a C.

The Hammersley–Clifford theorem says that in the classical case we can get any Markov network from the Gibbs state

\exp(-\beta H)

of a local Hamiltonian H, and vice versa. Here a Hamiltonian is local if it is a sum of terms, one depending on the degrees of freedom in each clique in the graph:

H = \sum_{C \in \mathrm{cliques}} h_C

Hayden, Jozsa, Petz and Winter gave a quantum generalization of one direction of this result to graphs that are just ‘chains’, like this:

o—o—o—o—o—o—o—o—o—o—o—o

Namely: for such graphs, any quantum Markov network is the Gibbs state of some local Hamiltonian. Now Poulin has shown the same for all graphs. But the converse is, in general, false. If the different terms h_C in a local Hamiltonian all commute, its Gibbs state will have the Markov property. But otherwise, it may not.

For some related material, see:

• David Poulin, Quantum graphical models and belief propagation.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers