Network Theory (Part 12)

Last time we proved a version of Noether’s theorem for stochastic mechanics. Now I want to compare that to the more familiar quantum version.

But to do this, I need to say more about the analogy between stochastic mechanics and quantum mechanics. And whenever I try, I get pulled toward explaining some technical issues involving analysis: whether sums converge, whether derivatives exist, and so on. I’ve been trying to avoid such stuff—not because I dislike it, but because I’m afraid you might. But the more I put off discussing these issues, the more they fester and make me unhappy. In fact, that’s why it’s taken so long for me to write this post!

So, this time I will gently explore some of these issues. But don’t be scared: I’ll mainly talk about some simple big ideas. Next time I’ll discuss Noether’s theorem. I hope that by getting the technicalities out of my system, I’ll feel okay about hand-waving whenever I want.

And if you’re an expert on analysis, maybe you can help me with a question.

Stochastic mechanics versus quantum mechanics

First, we need to recall the analogy we began sketching in Part 5, and push it a bit further. The idea is that stochastic mechanics differs from quantum mechanics in two big ways:

• First, instead of complex amplitudes, stochastic mechanics uses nonnegative real probabilities. The complex numbers form a ring; the nonnegative real numbers form a mere rig, which is a ‘ring without negatives’. Rigs are much neglected in the typical math curriculum, but unjustly so: they’re almost as good as rings in many ways, and there are lots of important examples, like the natural numbers \mathbb{N} and the nonnegative real numbers, [0,\infty). For probability theory, we should learn to love rigs.

But there are, alas, situations where we need to subtract probabilities, even when the answer comes out negative: namely when we’re taking the time derivative of a probability. So sometimes we need \mathbb{R} instead of just [0,\infty).

• Second, while in quantum mechanics a state is described using a ‘wavefunction’, meaning a complex-valued function obeying

\int |\psi|^2 = 1

in stochastic mechanics it’s described using a ‘probability distribution’, meaning a nonnegative real function obeying

\int \psi = 1

So, let’s try our best to present the theories in close analogy, while respecting these two differences.

States

We’ll start with a set X whose points are states that a system can be in. Last time I assumed X was a finite set, but this post is so mathematical I might as well let my hair down and assume it’s a measure space. A measure space lets you do integrals, but a finite set is a special case, and then these integrals are just sums. So, I’ll write things like

\int f

and mean the integral of the function f over the measure space X, but if X is a finite set this just means

\sum_{x \in X} f(x)

Now, I’ve already defined the word ‘state’, but both quantum and stochastic mechanics need a more general concept of state. Let’s call these ‘quantum states’ and ‘stochastic states’:

• In quantum mechanics, the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

\int | \psi |^2 = 1

We call \psi: X \to \mathbb{C} obeying this equation a quantum state.

• In stochastic mechanics, the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

\int \psi = 1

We call \psi: X \to [0,\infty) obeying this equation a stochastic state.

In quantum mechanics we often use this abbreviation:

\langle \phi, \psi \rangle = \int \overline{\phi} \psi

so that a quantum state has

\langle \psi, \psi \rangle = 1

Similarly, we could introduce this notation in stochastic mechanics:

\langle \psi \rangle = \int \psi

so that a stochastic state has

\langle \psi \rangle = 1

But this notation is a bit risky, since angle brackets of this sort often stand for expectation values of observables. So, I’ve been writing \int \psi, and I’ll keep on doing this.

In quantum mechanics, \langle \phi, \psi \rangle is well-defined whenever both \phi and \psi live in the vector space

L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int |\psi|^2 < \infty \}

In stochastic mechanics, \langle \psi \rangle is well-defined whenever \psi lives in the vector space

L^1(X) =  \{ \psi: X \to \mathbb{R} \; : \; \int |\psi| < \infty \}

You’ll notice I wrote \mathbb{R} rather than [0,\infty) here. That’s because in some calculations we’ll need functions that take negative values, even though our stochastic states are nonnegative.

Observables

A state is a way our system can be. An observable is something we can measure about our system. They fit together: we can measure an observable when our system is in some state. If we repeat this we may get different answers, but there’s a nice formula for average or ‘expected’ answer.

• In quantum mechanics, an observable is a self-adjoint operator A on L^2(X). The expected value of A in the state \psi is

\langle \psi, A \psi \rangle

Here I’m assuming that we can apply A to \psi and get a new vector A \psi \in L^2(X). This is automatically true when X is a finite set, but in general we need to be more careful.

• In stochastic mechanics, an observable is a real-valued function A on X. The expected value of A in the state \psi is

\int A \psi

Here we’re using the fact that we can multiply A and \psi and get a new vector A \psi \in L^1(X), at least if A is bounded. Again, this is automatic if X is a finite set, but not otherwise.

Symmetries

Besides states and observables, we need ‘symmetries’, which are transformations that map states to states. We use these to describe how our system changes when we wait a while, for example.

• In quantum mechanics, an isometry is a linear map U: L^2(X) \to L^2(X) such that

\langle U \phi, U \psi \rangle = \langle \phi, \psi \rangle

for all \psi, \phi \in L^2(X). If U is an isometry and \psi is a quantum state, then U \psi is again a quantum state.

• In stochastic mechanics, a stochastic operator is a linear map U: L^1(X) \to L^1(X) such that

\int U \psi = \int \psi

and

\psi \ge 0 \; \; \Rightarrow \; \; U \psi \ge 0

for all \psi \in L^1(X). If U is stochastic and \psi is a stochastic state, then U \psi is again a stochastic state.

In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these, and their inverses are always isometries. There are, however, very few stochastic operators whose inverses are stochastic:

Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible, and its inverse is again an isometry.

Puzzle 2. Suppose X is a finite set. Which stochastic operators U: L^1(X) \to L^1(X) have stochastic inverses?

This is why we usually think of time evolution as being reversible quantum mechanics, but not in stochastic mechanics! In quantum mechanics we often describe time evolution using a ‘1-parameter group’, while in stochastic mechanics we describe it using a 1-parameter semigroup… meaning that we can run time forwards, but not backwards.

But let’s see how this works in detail!

Time evolution in quantum mechanics

In quantum mechanics there’s a beautiful relation between observables and symmetries, which goes like this. Suppose that for each time t we want a unitary operator U(t) :  L^2(X) \to L^2(X) that describes time evolution. Then it makes a lot of sense to demand that these operators form a 1-parameter group:

Definition. A collection of linear operators U(t) (t \in \mathbb{R}) on some vector space forms a 1-parameter group if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s,t \in \mathbb{R}.

Note that these conditions force all the operators U(t) to be invertible.

Now suppose our vector space is a Hilbert space, like L^2(X). Then we call a 1-parameter group a 1-parameter unitary group if the operators involved are all unitary.

It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice! So, we always focus on the continuous case:

Definition. A 1-parameter unitary group is strongly continuous if U(t) \psi depends continuously on t for all \psi, in this sense:

t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0

Then we get a classic result proved by Marshall Stone back in the early 1930s. You may not know him, but he was so influential at the University of Chicago during this period that it’s often called the “Stone Age”. And here’s one reason why:

Stone’s Theorem. There is a one-to-one correspondence between strongly continuous 1-parameter unitary groups on a Hilbert space and self-adjoint operators on that Hilbert space, given as follows. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(-i t H)

for a unique self-adjoint operator H. Conversely, any self-adjoint operator determines a strongly continuous 1-parameter group this way. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

When U(t) = \exp(-i t H) describes the evolution of a system in time, H is is called the Hamiltonian, and it has the physical meaning of ‘energy’. The equation I just wrote down is then called Schrödinger’s equation.

So, simply put, in quantum mechanics we have a correspondence between observables and nice one-parameter groups of symmetries. Not surprisingly, our favorite observable, energy, corresponds to our favorite symmetry: time evolution!

However, if you were paying attention, you noticed that I carefully avoided explaining how we define \exp(- i t H). I didn’t even say what a self-adjoint operator is. This is where the technicalities come in: they arise when H is unbounded, and not defined on all vectors in our Hilbert space.

Luckily, these technicalities evaporate for finite-dimensional Hilbert spaces, such as L^2(X) for a finite set X. Then we get:

Stone’s Theorem (Baby Version). Suppose we are given a finite-dimensional Hilbert space. In this case, a linear operator H on this space is self-adjoint iff it’s defined on the whole space and

\langle \phi , H \psi \rangle = \langle H \phi, \psi \rangle

for all vectors \phi, \psi. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(- i t H)

for a unique self-adjoint operator H, where

\displaystyle{ \exp(-i t H) \psi = \sum_{n = 0}^\infty \frac{(-i t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any self-adjoint operator on our space determines a strongly continuous 1-parameter group this way. For all vectors \psi in our space we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

and if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

Time evolution in stochastic mechanics

We’ve seen that in quantum mechanics, time evolution is usually described by a 1-parameter group of operators that comes from an observable: the Hamiltonian. Stochastic mechanics is different!

First, since stochastic operators aren’t usually invertible, we typically describe time evolution by a mere ‘semigroup’:

Definition. A collection of linear operators U(t) (t \in [0,\infty)) on some vector space forms a 1-parameter semigroup if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s, t \ge 0.

Now suppose this vector space is L^1(X) for some measure space X. We want to focus on the case where the operators U(t) are stochastic and depend continuously on t in the same sense we discussed earlier.

Definition. A 1-parameter strongly continuous semigroup of stochastic operators U(t) : L^1(X) \to L^1(X) is called a Markov semigroup.

What’s the analogue of Stone’s theorem for Markov semigroups? I don’t know a fully satisfactory answer! If you know, please tell me.

Later I’ll say what I do know—I’m not completely clueless—but for now let’s look at the ‘baby’ case where X is a finite set. Then the story is neat and complete:

Theorem. Suppose we are given a finite set X. In this case, a linear operator H on L^1(X) is infinitesimal stochastic iff it’s defined on the whole space,

\int H \psi = 0

for all \psi \in L^1(X), and the matrix of H in terms of the obvious basis obeys

H_{i j} \ge 0

for all j \ne i. Given a Markov semigroup U(t) on L^1(X), we can always write

U(t) = \exp(t H)

for a unique infinitesimal stochastic operator H, where

\displaystyle{ \exp(t H) \psi = \sum_{n = 0}^\infty \frac{(t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any infinitesimal stochastic operator on our space determines a Markov semigroup this way. For all \psi \in L^1(X) we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

and if we set

\psi(t) = \exp(t H) \psi

we have the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

In short, time evolution in stochastic mechanics is a lot like time evolution in quantum mechanics, except it’s typically not invertible, and the Hamiltonian is typically not an observable.

Why not? Because we defined an observable to be a function A: X \to \mathbb{R}. We can think of this as giving an operator on L^1(X), namely the operator of multiplication by A. That’s a nice trick, which we used to good effect last time. However, at least when X is a finite set, this operator will be diagonal in the obvious basis consisting of functions that equal 1 at one point of X and zero elsewhere. So, it can only be infinitesimal stochastic if it’s zero!

Puzzle 3. If X is a finite set, show that any operator on L^1(X) that’s both diagonal and infinitesimal stochastic must be zero.

The Hille–Yosida theorem

I’ve now told you everything you really need to know… but not everything I want to say. What happens when X is not a finite set? What are Markov semigroups like then? I can’t abide letting this question go unresolved! Unfortunately I only know a partial answer.

We can get a certain distance using the Hille-Yosida theorem, which is much more general.

Definition. A Banach space is vector space with a norm such that any Cauchy sequence converges.

Examples include Hilbert spaces like L^2(X) for any measure space, but also other spaces like L^1(X) for any measure space!

Definition. If V is a Banach space, a 1-parameter semigroup of operators U(t) : V \to V is called a contraction semigroup if it’s strongly continuous and

\| U(t) \psi \| \le \| \psi \|

for all t \ge 0 and all \psi \in V.

Examples include strongly continuous 1-parameter unitary groups, but also Markov semigroups!

Puzzle 4. Show any Markov semigroup is a contraction semigroup.

The Hille–Yosida theorem generalizes Stone’s theorem to contraction semigroups. In my misspent youth, I spent a lot of time carrying around Yosida’s book Functional Analysis. Furthermore, Einar Hille was the advisor of my thesis advisor, Irving Segal. Segal generalized the Hille–Yosida theorem to nonlinear operators, and I used this generalization a lot back when I studied nonlinear partial differential equations. So, I feel compelled to tell you this theorem:

Hille-Yosida Theorem. Given a contraction semigroup U(t) we can always write

U(t) = \exp(t H)

for some densely defined operator H such that H - \lambda I has an inverse and

\displaystyle{ \| (H - \lambda I)^{-1} \psi \| \le \frac{1}{\lambda} \| \psi \| }

for all \lambda > 0 and \psi \in V. Conversely, any such operator determines a strongly continuous 1-parameter group. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = U(t) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

If you like, you can take the stuff at the end of this theorem to be what we mean by saying U(t) = \exp(t H). When U(t) = \exp(t H), we say that H generates the semigroup U(t).

But now suppose V = L^1(X). Besides the conditions in the Hille–Yosida theorem, what extra conditions on H are necessary and sufficient for it to generate a Markov semigroup? In other words, what’s a definition of ‘infinitesimal stochastic operator’ that’s suitable not only when X is a finite set, but an arbitrary measure space?

I asked this question on Mathoverflow a few months ago, and so far the answers have not been completely satisfactory.

Some people mentioned the Hille–Yosida theorem, which is surely a step in the right direction, but not the full answer.

Others discussed the special case when \exp(t H) extends to a bounded self-adjoint operator on L^2(X). When X is a finite set, this special case happens precisely when the matrix H_{i j} is symmetric: the probability of hopping from j to i equals the probability of hopping from i to j. This is a fascinating special case, not least because when H is both infinitesimal stochastic and self-adjoint, we can use it as a Hamiltonian for both stochastic mechanics and quantum mechanics! Someday I want to discuss this. However, it’s just a special case.

After grabbing people by the collar and insisting that I wanted to know the answer to the question I actually asked—not some vaguely similar question—the best answer seems to be Martin Gisser’s reference to this book:

• Zhi-Ming Ma and Michael Röckner, Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer, Berlin, 1992.

This book provides a very nice self-contained proof of the Hille-Yosida theorem. On the other hand, it does not answer my question in general, but only when the skew-symmetric part of H is dominated (in a certain sense) by the symmetric part.

So, I’m stuck on this front, but that needn’t bring the whole project to a halt. We’ll just sidestep this question.

For a good well-rounded introduction to Markov semigroups and what they’re good for, try:

• Ryszard Rudnicki, Katarzyna Pichór and Marta Tyran-Kamínska, Markov semigroups and their applications.

16 Responses to Network Theory (Part 12)

  1. John Baez says:

    Continuous time versus discrete time—that’s one of those big issues that keeps coming up in many guises.

    Suppose you have a Markov chain, where time is discrete, and you want to ’embed’ it in a Markov process, where time is continuous. Then mathematically you have a stochastic operator U and you’re trying to write it as \exp(t H) for some infinitesimal stochastic H. If you can, we say U is embeddable.

    Not every stochastic operator is embeddable! This was discussed over on over on Math Stackexchange. They gave this reference:

    • E. B. Davies, Embeddable Markov matrices.

    Abstract: We give an account of some results, both old and new, about any n\times n Markov matrix that is embeddable in a one-parameter Markov semigroup. These include the fact that its eigenvalues must lie in a certain region in the unit ball. We prove that a well-known procedure for approximating a non-embeddable Markov matrix by an embeddable one is optimal in a certain sense.

    A Markov matrix is essentially the same as what I’m calling a stochastic operator L^1(X) when X is a finite set.

    So that’s the bad news: not every stochastic operator is of the form \exp(t H) for H infinitesimal stochastic. But here’s a wee bit of good news, provided by Jacob Biamonte:

    • Inheung Chong, Infinitesimally generated stochastic totally positive matrices, Comm. Korean Math. Soc. 12 (1997), 269–273.

    Abstract: We show that each element in the semigroup of all n \times n nonsingular stochastic totally positive matrices is generated by the infinitesimal elements, which form a cone consisting of all n \times n Jacobi intensity matrices.

    An Jacobi intensity matrix is essentially the same as what I’m calling an infinitesimal stochastic operator on L^1(X) when X is a finite set.

    So, Chong is saying that when X is a finite set, every stochastic operator on L^1(X) is a finite product of those of the form \exp(t H) where H is infinitesimal stochastic.

    I’m not sure if that’s good for anything, but it’s the kind of thing mathematicians like to know!

    • Florifulgurator says:

      Another notion of embedding is Skorokhod embedding or stopping. First it represents a random variable as Brownian motion evaluated at some stopping time. That can be iterated to embed a discrete time martingale in Brownian motion by an increasing series of stopping times.

      In that special form Skorokhod embedding is fun stuff. The general Markovian view almost inevitably leads to grueling French-school technicalities about measure and potential theory of general processes – which make your “technical issues involving analysis” pale. (E.g Fitzsimmons)

      –Martin Gisser

      • Florifulgurator says:

        Oops, bad HTML: My second link goes to Obloj’s magnificient overview The Skorokhod embedding problem and its offspring.

      • John Baez says:

        The link works now, but I’ll keep your comment since it tells our readers that the link called “stopping” actually leads to a “magnificent overview”.

        Thanks! By the way, I got ahold of Introduction to the Theory of (Non-Symmetric) Dirichlet Forms today, and it seems very nice, but I’m sad that it doesn’t contain a characterization of the generators of arbitrary Markov semigroups. Can it really be so difficult? Hille–Yosida tells us there’s a generator H with certain properties, so then we need to find the right generalization of the conditions

        \sum_i H_{i j} = 0

        i \ne j  \Rightarrow H_{i j} \ge 0

        from n \times n matrices to operators on L^1(X) where X an arbitrary measure space (or perhaps a ‘nice’ one, like a $\sigma$-finite one or something, if necessary).

  2. John Baez says:

    You can’t unflip a coin. While the unitary operators form a group, the stochastic operators only form a semigroup: they rarely have inverses!

    But the definition of ‘stochastic operator’ has two clauses. In terms of matrices, one clause says

    \sum_i U_{i j} = 1

    while the other says

    i \ne j \; \Rightarrow U_{i j} \ge 0

    And if we consider invertible matrices obeying only the first property, we get a Lie group: the Markov group M(n,\mathbb{R}). Someone has studied this:

    • Joseph E. Johnson, Markov-type Lie groups in GL(n,\mathbb{R}), Jour. Math. Phys. 26 (1985), 252–257.

    Again I’m not sure if it’s good for anything… but you never know.

  3. To get Schrödinger’s equation you could proceed as follows: Use the space shift symmetry group (which is strongly continuous) to get a commutation relation for momentum and displacement which are both observables. Next use the Stone-von Neumann theorem to get a representation for these observables. Plug this into the action and minimize. You get a (semi-)group with generator the Legendre transform of the action (the energy). This is probably equivalent to your approach however it is not clear to me how you avoid the usage of an action.

    Puzzle 2 seems to be permutation matrices however my incompetency to prove the nontrivial direction (in finite time) indicates that this might be false :-(

    In your Mathoverflow question you also deal with the finite dimensional case. To that purpose consider (\lambda - A)^{-1}e=\int_0^\infty e^{-\lambda t} T(t)e \, d t for large positive \lambda and with e being the vector with entries all equal to 1. Thus A generates a stochastic semigroup T if and only if \lambda-A is an M-Matrix (for large \lambda) and A e=0. That is what you believe to be true at MathOverflow. The above proof might carry over to the infinite dimensional case for those spaces with an interior point (= e). I think this is true for compact X, however you should consult the references given by Andras Batkai.

    • John Baez says:

      Uwe wrote:

      Puzzle 2 seems to be permutation matrices however my incompetency to prove the nontrivial direction (in finite time) indicates that this might be false :-(

      Actually Graham Jones pointed out a mistake in Puzzle 2! I had written:

      In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these. There are, however, very few invertible stochastic operators:

      Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible.

      Puzzle 2. Suppose X is a finite set. What are the invertible stochastic operators U: L^1(X) \to L^1(X)?

      In fact there are lots of invertible stochastic operators whose inverses are not stochastic! I didn’t mean to include these in the discussion. So, I’ve corrected my puzzles as follows:

      In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these, and their inverses are always isometries. There are, however, very few stochastic operators whose inverses are stochastic:

      Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible, and its inverse is again an isometry.

      Puzzle 2. Suppose X is a finite set. Which stochastic operators U: L^1(X) \to L^1(X) have stochastic inverses?

      I won’t give away the answers to these reformulated puzzles yet, but here are two proofs that there are lots of invertible stochastic operators.

      First, every stochastic operator U that’s ‘close to the identity’ in this sense:

      \| U - I \| < 1

      (where the norm is the operator norm) will be invertible, simply because every operator obeying this inequality is invertible! After all, if this inequality holds, we have a convergent geometric series:

      \displaystyle{ U^{-1} = \frac{1}{I - (I - U)} = \sum_{n = 0}^\infty (I - U)^n }

      Second, suppose X is a finite set and H is infinitesimal stochastic operator on L^1(X). Then H is bounded, so the stochastic operator

      \exp(t H)  \qquad t > 0

      will always have an inverse, namely

      \exp(-t H)

      But for t sufficiently small, this inverse \exp(-tH) will only be stochastic if -H is infinitesimal stochastic, and that’s only true if H = 0.

      In something more like plain English: when you’ve got a finite set of states, you can formally run any Markov process backwards in time, but a lot of those ‘backwards-in-time’ operators will involve negative probabilities for the system to hop from one state to another!

  4. Said paper is also here:

    * Charles J. K. Batty, Derek W. Robinson, Positive one-parameter semigroups on ordered banach spaces, Acta Appl. Math 2 (1984), 221-296

    (And Scotty has kindly provided me with a virtual tunnel to read it.)

    Theorem 2.2.1 on p. 261 there is not what you want (being simply about positivity preserving semigroups). But it’s a start: Methinks the start is to study positivity preserving semigroups first – perhaps not necessarily the paper.

    I’ll read more of the paper later.

    Plus, I’ll check that yellowish wrinkled piece of penciled paper promising a generalized Kato-Simon-Shigekawa criterion for semigroup domination. Alas it is in L^2 and symmetric – but there’s a simple formula for the generators which could be generalized to other duality. It’s all about the distributional Laplacian of the norm. (Whoa, currently I’m not even sure about what the dual of L^1 is (or vice versa). That more than a dozen years back.)

    Which reminds me of the rig in the stochastics vs. quantum picture: Perhaps the point is the ring’s “involution”. E.g. the absolute value in the reals (a nonlinear involution) vs. complex involution. The absolute value involution leads to positivity preserving semigroups. Another involution, cutting off outside the unit interval leads to Markovian semigroups. (Perhaps cf. Reed-Simon, where I guess it’s split up that way.)

    • ((Oops, this stuff has induced/coincided a major flashback in my poor brain. (And its math demon had been sleeping almost all summertime, until a few hours ago.) I’m still clearing up confusion: On first reading there was still the picture in my mind that I got indoctrinated in stochastic analysis, long ago: Markovian being positivity preserving and contractive on L^\infty.))

      It looks you are “only” interested in Markov semigroups which are the dual of a positive contraction semigroup on L^1. So, said Theorem 2.2.1 possibly is it? But the “dissipative conditions” look suspicious. (Plus, I don’t recall ever having known about the paper.)

      ((Need postpone further flashback and reading by a day or two.))

    • John Baez says:

      Thanks for the comments, Martin.

      Since probability distributions on a measure space live in L^1, it seems natural to think of Markov semigroups as consisting operators on L^1. And indeed this is how various papers seem to define them. For example:

      • Ryszard Rudnicki, Katarzyna Pichór and Marta Tyran-Kamínska, Markov semigroups and their applications.

      Defining them on L^\infty seems weird – that’s really how they did it when you were a kid?

      Whoa, currently I’m not even sure about what the dual of L^1 is (or vice versa).

      For a \sigma-finite measure space the dual of L^1 is L^\infty, but not vice versa. If you have trouble remembering this—and don’t feel bad, I do too!—just remember that a Banach limit is a strange sort of continuous linear functional that lets you define a kind of ‘limit’ for any sequence in \ell^\infty. This shows that the dual of L^\infty contains elements that aren’t in L^1.

      I say that Banach limits are ‘strange’ because you can’t actually construct them: you need to use the axiom of choice, or at least some weaker but still nonconstructive principle, to get your hands on them! In fact, I have a vague memory that no elements of the dual of \ell^\infty can be ‘explicitly given’, other than elements of \ell^1. Does anyone remember?

      But I’m digressing. I’ll look at the paper you pointed me to. Thanks!

  5. John Baez says:

    Here are the answers to puzzles 3 and 4. The first is so easy that probably nobody wanted to do it! This second is an easy exercise in the triangle inequality if you’ve done your time in a real analysis class, but otherwise it might seem tricky.

    Puzzle 3. If X is a finite set, show that any operator on L^1(X) that’s both diagonal and infinitesimal stochastic must be zero.

    Answer. We are thinking of operators on L^1(X) as matrices with respect to the obvious basis of functions that equal 1 at one point and 0 elsewhere. If H_{i j} is an infinitesimal stochastic matrix, the sum of the entries in each column is zero. If it’s diagonal, there’s at most one nonzero entry in each column. So, we must have H = 0.

    Puzzle 4. Show any Markov semigroup U(t): L^1(X) \to L^1(X) is a contraction semigroup.

    Answer. We need to show

    \|U(t) \psi\| \le \| \psi \|

    for all t \ge 0 and \psi \in L^1(X). Here the norm is the L^1 norm, so more explicitly we need to show

    \int |U(t) \psi | \le \int |\psi|

    We can split \psi into its positive and negative parts:

    \psi = \psi_+ - \psi_-

    where

    \psi_{\pm} \ge 0

    Since U(t) is stochastic we have

    U(t) \psi_{\pm} \ge 0

    and

    \int U(t) \psi_\pm = \int \psi_\pm

    so

    \begin{array}{ccl}  \int |U(t) \psi | &=& \int |U(t) \psi_+ - U(t) \psi_-| \\  &\leq & \int |U(t) \psi_+| + |U(t) \psi_-| \\  &=& \int  U(t) \psi_+ + U(t) \psi_-  \\ &=& \int \psi_+ + \psi_- \\ &=&  \int |\psi|  \end{array}

  6. John Baez says:

    Okay, here are the answers to Puzzles 1 and 2. The first is easy linear algebra, while Graham Jones cracked the second over on the Azimuth Forum:

    Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible, and its inverse is again an isometry.

    Answer. Remember that U being an isometry means that it preserves the inner product:

    \langle U \psi, U \phi \rangle = \langle \psi, \phi \rangle

    and thus it preserves the L^2 norm

    \|U \psi \| = \| \psi \|

    given by \| \psi \| = \langle \psi, \psi \rangle^{1/2}. It follows that if U\psi = 0, then \psi = 0, so U is one-to-one. Since U is a linear operator from a finite-dimensional vector space to itself, U must therefore also be onto. Thus U is invertible, and because U preserves the inner product, so does its inverse: given \psi, \phi \in L^2(X) we have

    \langle U^{-1} \phi, U^{-1} \psi \rangle = \langle \phi, \psi \rangle

    since we can write \phi' = U^{-1} \phi, \psi' = U^{-1} \psi and then the above equation says

    \langle \phi' , \psi' \rangle = \langle U \phi' , U \psi' \rangle

    Puzzle 2. Suppose X is a finite set. Which stochastic operators U: L^1(X) \to L^1(X) have stochastic inverses?

    Answer. Graham wrote:

    OK, a sketch proof: A stochastic operator U must map the unit simplex S into itself. If its inverse is also stochastic operator this must be a bijection. A bijective linear map S \to S must take a vertex to a vertex. So U must be a permutation matrix.

    Expanding that a bit, suppose the set X has n points. Then the set of stochastic states

    S = \{ \psi : X \to \mathbb{R} \; : \; \psi \ge 0, \quad \int \psi = 1 \}

    is a simplex. It’s an equilateral triangle when n = 3, a regular tetrahedron when n = 4, and so on.

    In general, S has n corners, which are the functions \psi that equal 1 at one point of S and zero elsewhere. Mathematically speaking, S is a convex set, and its corners are its extreme points: the points that can’t be written as convex combinations of other points of S in a nontrivial way.

    Any stochastic operator U must map S into itself, so if U has an inverse that’s also a stochastic operator, it must give a bijection U : S \to S. Any linear transformation acting as a bijection between convex sets must map extreme points to extreme points (this is easy to check), so U must map corners to corners in a bijective way. This implies that it comes from a permutation of the points in X.

    In other words, any stochastic matrix with an inverse that’s also stochastic is a permutation matrix: a square matrix with every entry 0 except for a single 1 in each row and each column. So, Uwe Stroinski‘s intuition was right!

  7. Arjun Jain says:

    As in part 5, I am having some difficulties understanding this post. I will try to be as precise as I can and ask questions in parts.

    1. What is the difference between an operator and a function ?

    2. As I have understood, X is the set of states. Then \psi: X \to [0,\infty) is the set of ordered pairs (x,\psi (x)), which in the case of finite X, can be summarized as \sum \psi_x x or x.\psi (where x is a matrix with the xs as columns, and \psi is a column vector with the \psi (x)s as elements). Now L^1(X) is the vector space with those \psis as elements for which

    \sum \psi_x <\infty

    If this sum is 1, the corresponding \psi is called a stochastic state. Is all this correct?

    3. Like U, shouldn't the observable O be L^1(X) \to L^1(X) instead of being a function on X? Even if O: X \to \mathbb{R}, then (O\psi) (x) has to be defined as O(x)\psi (x) instead of the usual definition of function composition. Instead for an o: X \to \mathbb{R}, if (O\psi) (x)=o(x)\psi (x), we can say that O is diagonal.

    • John Baez says:

      Arjun wrote:

      1. What is the difference between an operator and a function?

      Given sets X and Y, function f : X \to Y assigns to each element x of the set X a unique element f(x) of the set Y.

      When we talk about a function on a set X and don’t specify the set Y, we usually mean Y = \mathbb{R} or Y = \mathbb{C}. These are called ‘real-valued functions’ and ‘complex-valued functions’ if we want to be more clear.

      An operator is a linear function from a vector space to a vector space.

      By the way, definitions of standard math terms can be looked up on Wikipedia (see the links). It’s better if you ask me questions about my work, instead of questions about this standard stuff.

      2. […] Now L^1(X) is the vector space with those \psis as elements for which \sum \psi_x < \infty. If this sum is 1, the corresponding \psi is called a stochastic state. Is all this correct?

      It’s almost all correct. One mistake is that since we want L^1(X) to be a vector space, it consists of all functions \psi: X \to \mathbb{R} with

      \sum_{x \in X} |\psi_x| < \infty.

      We need the absolute value here!

      But stochastic states are elements of L^1(X) with \psi_x \ge 0 and

      \sum_{x \in X} |\psi_x| = 1

      just as you said.

      Another possible mistake is that you said:

      or x.\psi (where x is a matrix with the xs as columns, and \psi is a column vector with the \psi (x)s as elements).

      I don’t understand this, because I don’t know how you’re making elements x of a set X into columns of a matrix. More importantly, I never use this way of thinking in any of these blog articles, so it’s probably best if you don’t think this way when trying to understand what I’m saying.

      Like U, shouldn’t the observable O be L^1(X) \to L^1(X) instead of being a function on X?

      In quantum mechanics, an observable is an operator

      O:  L^2(X) \to L^2(X),

      but not just any operator: it needs to be self-adjoint! There are good and well-known reasons for this.

      Now we’re talking about stochastic mechanics, which is a subject I’m just inventing. It makes sense to guess that an observable should be an operator

      O:  L^1(X) \to L^1(X),

      but probably not just any operator!

      What kind of operator can be an observable in stochastic mechanics? My guess, made for very good reasons, is that it’s one of this form:

      (O \psi)_x = o(x) \psi_x

      for some measurable function o: X \to \mathbb{R}.

      In Part 11 Brendan and I explained this in the special case where X was a finite set. In this case any operator

      O:  L^1(X) \to L^1(X)

      can be described by a matrix

      O_{ij}

      where i,j \in X. And the special operators called observables correspond to diagonal matrices. So for an observable we have

      O_{i j} = \left\{ \begin{array}{ccl}  o(i) & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

      for some function o: X \to \mathbb{R}. Then we have

      (O\psi)_x = o(x) \psi_x

      as desired.

      But in Part 11 we called the function O instead of o, and we write O_x instead of o(x) for this function’s values. This may be confusing, but it’s efficient—it’s annoying to have two names O and o for two ways of thinking about the same thing! We explained this as follows:

      In stochastic mechanics an observable is simply a function assigning a number O_i to each state i \in X.

      However, in quantum mechanics we often think of observables as matrices, so it’s nice to do that here, too. It’s easy: we just create a matrix whose diagonal entries are the values of the function O. And just to confuse you, we’ll also call this matrix O. So:

      O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

      Whenever I say “just to confuse you”, it’s a joke: I’m actually warning that some notation might be confusing, so pay careful attention!

      You can see a more detailed explanation in our paper.

  8. Arjun Jain says:

    Continued..

    4. Will we be only considering time independent observables here?

    5. You wrote: ” It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice!”. Can you please expand on this ?, in plainer English- I’m interested.

    6. If a strongly continuous 1-parameter group is defined by this additional property: t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0, how is a continuous 1-parameter group defined?

    7. Why is L^2(X) finite dimensional for a finite set X?

    8. How do we extend the results for finite sets to countably infinite sets, as is true for all the examples in the previous posts?

    • John Baez says:

      Arjun wrote:

      4. Will we be only considering time independent observables here?

      Yes, more precisely only those without ‘explicit time dependence’. Clearly the values of most interesting observables change with time because the state changes with time. For example, in quantum mechanics the expected value \langle \Psi(t) , O \Psi(t) \rangle changes with time even though O does not.

      5. You wrote: “It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice!”. Can you please expand on this ?, in plainer English – I’m interested.

      Plainer than what?

      You can see this most simply as follows. Suppose you want a function

      f: \mathbb{R} \to \mathbb{R}

      obeying this equation

      f(s + t) = f(s) + f(t)

      for all s, t \in \mathbb{R}. There are some obvious solutions:

      f(t) = c t

      for any real number c. All these solutions are continuous. But if you use the axiom of choice, it’s easy to prove there are infinitely many other solutions, that are not continuous.

      However, it is impossible to write down a formula for any of these other solutions! And if the axiom of choice is false, they might not really exist!

      For more details, read this:

      Hamel basis and additive functions.

      See Theorem 5.

      This immediately has consequences for one-parameter unitary groups. Now suppose you want a function

      U: \mathbb{R} \to \mathbb{C}

      that obeys these three equations:

      |U(s)| = 1

      and

      U(s+t) = U(s) U(t)

      for all s, t, and also

      U(0) = 1

      (The third equation follows from the second, but I include it just so you can more easily see that U can be seen as a 1-parameter unitary group of 1×1 matrices.)

      Which functions U obey all three equations? The obvious solutions are

      f(t) = \exp(i c t)

      where c \in \mathbb{R}. All these functions are continuous. But the axiom of choice implies there are infinitely many other solutions, that are not continuous. Again, it’s impossible to write down a formula for any of them.

      So, if you use the axiom of choice, you can prove there are lots of 1-parameter unitary groups that are not continuous. However, you can’t write down a formula for any of them, they’re not even measurable (see Theorem 7 in that paper), and if you use some other axioms you can prove they don’t exist. So it’s best to ignore them.

      6. If a strongly continuous 1-parameter group is defined by this additional property: t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0, how is a continuous 1-parameter group defined?

      There are various different topologies on the set of operators on a Hilbert space or Banach space, so there are different kinds of continuity for 1-parameter groups. The only important ones are strong continuity and uniform continuity, also known as norm continuity. A uniformly continuous 1-parameter group has this property:

      t_i \to t \;\; \Rightarrow \; \;\|U(t_i) - U(t) \| \to 0

      where the norm here is the operator norm.

      Ironically, norm continuity is stronger than strong continuity.

      If you want to learn about this, read Reed and Simon’s book Functional Analysis, which is full of the analysis that mathematical physicists need to know.

      7. Why is L^2(X) finite dimensional for a finite set X?

      Because the vector space of complex-valued functions on an n-element set is n-dimensional and L^2(X) is just the vector space of complex functions on X when X is finite.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s