The Complexity Barrier

28 October, 2011

Could we grow the whole universe with all its seeming complexity starting from a little seed? How much can you do with just a little information?

People have contests about this. My programmer friend Bruce Smith points out this animation, produced using a program less than 4 kilobytes long:

As he notes:

…to be fair, the complexity of some of the OS, graphics drivers, and hardware should be included, but this is a lot less than you might think if you imagine rewriting it purely for compactness rather than for speed, and only including what this sort of program needs to produce output.

Sampo Syreeni pointed me to this video, all generated from under 64 kilobytes of x86 code:

As he points out, one trick is to use symmetry.

These fancy images produced from tiny amounts of information are examples of the ‘demoscene’. a computer art subculture where people produce demos: non-interactive audio-visual computer presentations that run in real time.

According to the Wikipedia article on the demoscene:

Recent computer hardware advancements include faster processors, more memory, faster video graphics processors, and hardware 3D acceleration. With many of the past’s challenges removed, the focus in making demos has moved from squeezing as much out of the computer as possible to making stylish, beautiful, well-designed real time artwork […]

The old tradition still lives on, though. Demo parties have competitions with varying limitations in program size or platform (different series are called compos). On a modern computer the executable size may be limited to 64 kB or 4 kB. Programs of limited size are usually called intros. In other compos the choice of platform is restricted; only old computers, like the 8-bit Commodore 64, or the 16-bit Amiga, or Atari ST, or mobile devices like handheld phones or PDAs are allowed. Such restrictions provide a challenge for coders, musicians and graphics artists and bring back the old motive of making a device do more than was intended in its original design.

What else can you do with just a little information? Bruce listed a couple more things:

• Bill Gates’ first commercial success was an implementation of a useful version of BASIC in about 4000 bytes;

• the complete genetic code of an organism can be as short as a few hundred thousand bytes, and that has to be encoded in a way that doesn’t allow for highly clever compression schemes.

So if quite complex things can be compressed into fairly little information, you can’t help but wonder: how complex can something be?

The answer: arbitrarily complex! At least that’s true if we’re talking about the Kolmogorov complexity of a string of bits: namely, the length of the shortest computer program that prints it out. Lots of long strings of bits can’t be compressed. You can’t print most of them out using short programs, since there aren’t enough short programs to go around.

Of course, we need to fix a computer language ahead of time, so this is well-defined. And we need to make sure the programs are written in binary, so the comparison is fair.

So, things can be arbitrarily complex. But here’s a more interesting question: how complex can we prove something to be?

The answer is one of the most astounding facts I know. It’s called Chaitin’s incompleteness theorem. It says, very roughly:

There’s a number L such that we can’t prove the Kolmogorov complexity of any specific string of bits is bigger than L.

Make sure you understand this. For any number,
we can prove there are infinitely many bit strings with Kolmogorov complexity bigger than that. But we can’t point to any particular bit string and prove its Kolmogorov complexity is bigger than L!

Over on Google+, Allen Knutson wrote:

That’s an incredibly disturbing theorem, like driving to the edge of the universe and finding a wall.

I call L the complexity barrier. So one question is, how big is L? It’s hard, or perhaps even impossible, to find the smallest L that does the job. But we can certainly find numbers L that work. And they’re surprisingly small!

My friend Bruce estimates that the complexity barrier is a few kilobytes.

I’d like to see a program a few kilobytes long that produces a video showing a big bang, the formation of stars and galaxies, then planets, including one where life evolves, then intelligent life, then the development of computers… and finally someone writing the very same program.

I can’t prove it’s possible… but you can’t prove it isn’t!

Let’s see why.

For starters, we need to choose some axioms for our system of math, so we know what ‘provable’ means. We need a system that’s powerful enough to prove a bunch of basic facts about arithmetic, but simple enough that a computer program can check if a proof in this system is valid.

There are lots of systems like this. Three famous ones are Peano arithmetic, Robinson arithmetic (which is less powerful) and Zermelo-Fraenkel set theory (which is more powerful).

When you have a system of math like this, Gödel’s first incompleteness theorem kicks in: if the system is consistent, it can’t be complete. In other words, there are some questions it leaves unsettled. This is why we shouldn’t be utterly shocked that while a bunch of bit strings have complexity more than L, we can’t prove this.

Gödel’s second incompleteness theorem also kicks in: if the system can prove that it’s consistent, it’s not! (If it’s not consistent, it can prove anything, so you shouldn’t trust it.) So, there’s a sense in which we can never be completely sure that our system of math is consistent. But let’s assume it is.

Given this, Chaitin’s theorem says:

There exists a constant L such that no string of bits has Kolmogorov complexity that provably exceeds L.

How can we get a number that does the job? Any number L with

U + \log_2(L) + C < L

will do. Here:

U is the length of a program where if you input a natural number i, it will search through all proofs in Peano arithmetic until it finds one that proves some bit string has Kolmogorov complexity > i. If it finds one, then it outputs this bit string. If it never finds one, it grinds on endlessly. (Of course, if i = L, it will never find one!)

C is a small overhead cost: the length of the extra 'glue' to create a bigger program that takes the number L, written out in binary, and feeds it into the program described above.

The length of L written out in binary is about \log_2(L). This bigger program thus has length

U + \log_2(L) + C

and for the proof of Chaitin’s incompleteness theorem to work, we need this to be smaller than L. Obviously we can accomplish this by making L big enough, since \log_2 L grows slower than L.

Given all the stuff I’ve told you, the proof of Chaitin’s theorem almost writes itself! You run this bigger program I just described. If there were a bit string whose Kolmogorov complexity is provably greater than L, this program would print one out. But that’s a contradiction, because this program has length less than L.

So, we just need to pick a computer language and a suitable system of math, and estimate U and, less importantly because it’s so much smaller, C. Then L will be just a bit bigger than U + C.

I picked the language C and Peano arithmetic and asked Bruce if he could guess, roughly, what answer we get for L. He replied:

I don’t think it can be done in C, since C semantics are not well-defined unless you specify a particular finite machine size. (Since C programs can do things like convert pointers to integers and back, tell you the size of any datatype, and convert data of any specified datatype to bytes and back.) On a finite machine of N bits, all programs either finish in time less than about 2^N or take forever.

But if you take “C without size-specific operations”, or a higher level language like Python, or for that matter a different sort of low-level language like a Turing machine, then that’s not an issue — you can define a precise semantics that allows it to run a program for an arbitrarily long time and allocate an arbitrary number of objects in memory which contain pointers to each other. (To stick with the spirit of the question, for whatever language you choose, you’d want to disallow use of any large external batch of information like a “standard library”, except for whatever is so basic that you think of it as part of the native language. This is not a serious handicap for this problem.)

The main things that the program ‘U‘ (I’d rather call the program itself ‘U’ than call its length ‘U‘) needs to do are:

• recognize a syntactically correct statement or proof;

• check the validity of a purported proof;

• recognize certain statements as saying or implying “The Kolmogorov complexity of n is more than i” for some n and i. (It’s not necessary to recognize all such statements, just at least one for each n and i; so it can just recognize a statement that consists of some template with specific values of n and i inserted into it at certain places.)

Assuming that U expresses the proofs it wants to check in a practical proof language (which will be more like what a practical theorem-prover like Coq uses than like what a traditional logician would recognize as “straight Peano arithmetic”, but which will not be excessively powerful in the spirit of this question), I’d estimate that the most complex part is checking proof validity, but that that can still be expressed in at most a few dozen syntactic rules, each expressible in a few lines of code. (The authors of a system like Coq, which includes code to actually do that, would know better, as long as they remember that the vast majority of their system’s actual code is not needed for this problem.)

This makes me think that even without trying to compact it much, in a reasonable language we could write U in a few hundred lines of code, or (after a bit of simple compression) a few thousand bytes. (And perhaps much less if we tried hard to compact the whole program in clever ways.)

So L will also be “a few thousand” (bytes or digits), or perhaps less, rather than some number you can never possibly count to.

Does anyone know if Chaitin or someone actually went ahead a wrote a program that showed a specific value of L would work?


Network Theory (Part 15)

26 October, 2011

Last time we saw how to get a graph whose vertices are states of a molecule and whose edges are transitions between states. We focused on two beautiful but not completely realistic examples that both give rise to the same highly symmetrical graph: the ‘Desargues graph’.

Today I’ll start with a few remarks about the Desargues graph. Then I’ll get to work showing how any graph gives:

• A Markov process, namely a random walk on the graph.

• A quantum process, where instead of having a probability to hop from vertex to vertex as time passes, we have an amplitude.

The trick is to use an operator called the ‘graph Laplacian’, a discretized version of the Laplacian which happens to be both infinitesimal stochastic and self-adjoint. As we saw in Part 12, such an operator will give rise both to a Markov process and a quantum process (that is, a one-parameter unitary group).

The most famous operator that’s both infinitesimal stochastic and self-adjoint is the Laplacian, \nabla^2. Because it’s both, the Laplacian shows up in two important equations: one in stochastic mechanics, the other in quantum mechanics.

• The heat equation:

\displaystyle{ \frac{d}{d t} \psi = \nabla^2 \psi }

describes how the probability \psi(x) of a particle being at the point x smears out as the particle randomly walks around:

The corresponding Markov process is called ‘Brownian motion’.

• The Schrödinger equation:

\displaystyle{ \frac{d}{d t} \psi = -i \nabla^2 \psi }

describes how the amplitude \psi(x) of a particle being at the point x wiggles about as the particle ‘quantumly’ walks around.

Both these equations have analogues where we replace space by a graph, and today I’ll describe them.

Drawing the Desargues graph

First I want to show you a nice way to draw the Desargues graph. For this it’s probably easiest to go back to our naive model of an ethyl cation:

Even though ethyl cations don’t really look like this, and we should be talking about some trigonal bipyramidal molecule instead, it won’t affect the math to come. Mathematically, the two problems are isomorphic! So let’s stick with this nice simple picture.

We can be a bit more abstract, though. A state of the ethyl cation is like having 5 balls, with 3 in one pile and 2 in the other. And we can focus on the first pile and forget the second, because whatever isn’t in the first pile must be in the second.

Of course a mathematician calls a pile of things a ‘set’, and calls those things ‘elements’. So let’s say we’ve got a set with 5 elements. Draw a red dot for each 2-element subset, and a blue dot for each 3-element subset. Draw an edge between a red dot and a blue dot whenever the 2-element subset is contained in the 3-element subset. We get the Desargues graph.

That’s true by definition. But I never proved that any of the pictures I showed you are correct! For example, this picture shows the Desargues graph:

but I never really proved this fact—and I won’t now, either.

To draw a picture we know is correct, it’s actually easier to start with a big graph that has vertices for all the subsets of our 5-element set. If we draw an edge whenever an n-element subset is contained in an (n+1)-element subset, the Desargues graph will be sitting inside this big graph.

Here’s what the big graph looks like:

This graph has 2^5 vertices. It’s actually a picture of a 5-dimensional hypercube! The vertices are arranged in columns. There’s

• one 0-element subset,

• five 1-element subsets,

• ten 2-element subsets,

• ten 3-element subsets,

• five 4-element subsets,

• one 5-element subset.

So, the numbers of vertices in each column go like this:

1 \quad 5 \quad 10 \quad 10 \quad 5 \quad 1

which is a row in Pascal’s triangle. We get the Desargues graph if we keep only the vertices corresponding to 2- and 3-element subsets, like this:

It’s less pretty than our earlier picture, but at least there’s no mystery to it. Also, it shows that the Desargues graph can be generalized in various ways. For example, there’s a theory of bipartite Kneser graphs H(n,k). The Desargues graph is H(5,2).

Desargues’ theorem

I can’t resist answering this question: why is it called the ‘Desargues graph’? This name comes from Desargues’ theorem, a famous result in projective geometry. Suppose you have two triangles ABC and abc, like this:

Suppose the lines Aa, Bb, and Cc all meet at a single point, the ‘center of perspectivity’. Then the point of intersection of ab and AB, the point of intersection of ac and AC, and the point of intersection of bc and BC all lie on a single line, the ‘axis of perspectivity’. The converse is true too. Quite amazing!

The Desargues configuration consists of all the actors in this drama:

• 10 points: A, B, C, a, b, c, the center of perspectivity, and the three points on the axis of perspectivity

and

• 10 lines: Aa, Bb, Cc, AB, AC, BC, ab, ac, bc and the axis of perspectivity

Given any configuration of points and lines, we can form a graph called its Levi graph by drawing a vertex for each point or line, and drawing edges to indicate which points lie on which lines. And now for the punchline: Levi graph of the Desargues configuration is the ‘Desargues-Levi graph’!—or Desargues graph, for short.

Alas, I don’t know how this is relevant to anything I’ve discussed. For now it’s just a tantalizing curiosity.

A random walk on the Desargues graph

Back to business! I’ve been telling you about the analogy between quantum mechanics and stochastic mechanics. This analogy becomes especially interesting in chemistry, which lies on the uneasy borderline between quantum and stochastic mechanics.

Fundamentally, of course, atoms and molecules are described by quantum mechanics. But sometimes chemists describe chemical reactions using stochastic mechanics instead. When can they get away with this? Apparently whenever the molecules involved are big enough and interacting with their environment enough for ‘decoherence’ to kick in. I won’t attempt to explain this now.

Let’s imagine we have a molecule of iron pentacarbonyl with—here’s the unrealistic part, but it’s not really too bad—distinguishable carbonyl groups:

Iron pentacarbonyl is liquid at room temperatures, so as time passes, each molecule will bounce around and occasionally do a maneuver called a ‘pseudorotation’:

We can approximately describe this process as a random walk on a graph whose vertices are states of our molecule, and whose edges are transitions between states—namely, pseudorotations. And as we saw last time, this graph is the Desargues graph:

Note: all the transitions are reversible here. And thanks to the enormous amount of symmetry, the rates of all these transitions must be equal.

Let’s write V for the set of vertices of the Desargues graph. A probability distribution of states of our molecule is a function

\displaystyle{ \psi : V \to [0,\infty) }

with

\displaystyle{ \sum_{x \in V} \psi(x) = 1 }

We can think of these probability distributions as living in this vector space:

L^1(V) = \{ \psi: V \to \mathbb{R} \}

I’m calling this space L^1 because of the general abstract nonsense explained in Part 12: probability distributions on any measure space live in a vector space called L^1. Today that notation is overkill, since every function on V lies in L^1. But please humor me.

The point is that we’ve got a general setup that applies here. There’s a Hamiltonian:

H : L^1(V) \to L^1(V)

describing the rate at which the molecule randomly hops from one state to another… and the probability distribution \psi \in L^1(X) evolves in time according to the equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

But what’s the Hamiltonian H? It’s very simple, because it’s equally likely for the state to hop from any vertex to any other vertex that’s connected to that one by an edge. Why? Because the problem has so much symmetry that nothing else makes sense.

So, let’s write E for the set of edges of the Desargues graph. We can think of this as a subset of V \times V by saying (x,y) \in E when x is connected to y by an edge. Then

\displaystyle{ (H \psi)(x) =  \sum_{y \,\, \textrm{such that} \,\, (x,y) \in E} \!\!\!\!\!\!\!\!\!\!\! \psi(y) \quad - \quad 3 \psi(x) }

We’re subtracting 3 \psi(x) because there are 3 edges coming out of each vertex x, so this is the rate at which the probability of staying at x decreases. We could multiply this Hamiltonian by a constant if we wanted the random walk to happen faster or slower… but let’s not.

The next step is to solve this discretized version of the heat equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

Abstractly, the solution is easy:

\psi(t) = \exp(t H) \psi(0)

But to actually compute \exp(t H), we might want to diagonalize the operator H. In this particular example, we could take advantage of the enormous symmetry of the Desargues graph. Its symmetry group includes the permutation group S_5, so we could take the vector space L^1(V) and break it up into irreducible representations of S_5. Each of these will give an eigenspace of H, so by this method we can diagonalize H. I’d sort of like to try this… but it’s a big digression, so I won’t. At least, not today!

Graph Laplacians

The Hamiltonian we just saw is an example of a ‘graph Laplacian’. We can write down such a Hamiltonian for any graph, but it gets a tiny bit more complicated when different vertices have different numbers of edges coming out of them.

The word ‘graph’ means lots of things, but right now I’m talking about simple graphs. Such a graph has a set of vertices V and a set of edges E \subseteq V \times V, such that

(x,y) \in E \implies (y,x) \in E

which says the edges are undirected, and

(x,x) \notin E

which says there are no loops. Let d(x) be the degree of the vertex x, meaning the number of edges coming out of it.

Then the graph Laplacian is this operator on L^1(V):

\displaystyle{ (H \psi)(x) =  \sum_{y \,\, \textrm{such that} \, \,(x,y) \in E} \!\!\!\!\!\!\!\!\!\!\! \Psi(y) \quad - \quad d(x) \Psi(x) }

There is a huge amount to say about graph Laplacians! If you want, you can get started here:

• Michael William Newman, The Laplacian Spectrum of Graphs, Masters Thesis, Department of Mathematics, University of Manitoba, 2000.

But for now, let’s just say that \exp(t H) is a Markov process describing a random walk on the graph, where hopping from one vertex to any neighboring vertex has unit probability per unit time. We can make the hopping faster or slower by multiplying H by a constant. And here is a good time to admit that most people use a graph Laplacian that’s the negative of ours, and write time evolution as \exp(-t H). The advantage is that then the eigenvalues of the Laplacian are \ge 0.

But what matters most is this. We can write the operator H as a matrix whose entry H_{x y} is 1 when there’s an edge from x to y and 0 otherwise, except when x = y, in which case the entry is -d(x). And then:

Puzzle 1. Show that for any finite graph, the graph Laplacian H is infinitesimal stochastic, meaning that:

\displaystyle{ \sum_{x \in V} H_{x y} = 0 }

and

x \ne y \implies  H_{x y} \ge 0

This fact implies that for any t \ge 0, the operator \exp(t H) is stochastic—just what we need for a Markov process.

But we could also use H as a Hamiltonian for a quantum system, if we wanted. Now we think of \psi(x) as the amplitude for being in the state x \in V. But now \psi is a function

\psi : V \to \mathbb{C}

with

\displaystyle{ \sum_{x \in V} |\psi(x)|^2 = 1 }

We can think of this function as living in the Hilbert space

L^2(V) = \{ \psi: V \to \mathbb{C} \}

where the inner product is

\langle \phi, \psi \rangle = \displaystyle{ \sum_{x \in V} \overline{\phi(x)} \psi(x) }

Puzzle 2. Show that for any finite graph, the graph Laplacian H: L^2(V) \to L^2(V) is self-adjoint, meaning that:

H_{x y} = \overline{H}_{y x}

This implies that for any t \in \mathbb{R}, the operator \exp(-i t H) is unitary—just what we need for one-parameter unitary group. So, we can take this version of Schrödinger’s equation:

\displaystyle{ \frac{d}{d t} \psi = -i H \psi }

and solve it:

\displaystyle{ \psi(t) = \exp(-i t H) \psi(0) }

and we’ll know that time evolution is unitary!

So, we’re in a dream world where we can do stochastic mechanics and quantum mechanics with the same Hamiltonian. I’d like to exploit this somehow, but I’m not quite sure how. Of course physicists like to use a trick called Wick rotation where they turn quantum mechanics into stochastic mechanics by replacing time by imaginary time. We can do that here. But I’d like to do something new, special to this context.

Maybe I should learn more about chemistry and graph theory. Of course, graphs show up in at least two ways: first for drawing molecules, and second for drawing states and transitions, as I’ve been doing. These books are supposed to be good:

• Danail Bonchev and D.H. Rouvray, eds., Chemical Graph Theory: Introduction and Fundamentals, Taylor and Francis, 1991.

• Nenad Trinajstic, Chemical Graph Theory, CRC Press, 1992.

• R. Bruce King, Applications of Graph Theory and Topology in Inorganic Cluster Coordination Chemistry, CRC Press, 1993.

The second is apparently the magisterial tome of the subject. The prices of these books are absurd: for example, Amazon sells the first for $300, and the second for $222. Luckily the university here should have them…


A Math Puzzle Coming From Chemistry

23 October, 2011

I posed this puzzle a while back, and nobody solved it. That’s okay—now that I think about it, I’m not sure how to solve it either!

It seems to involve group theory. But instead of working on it, solving it and telling you the answer, I’d rather dump all the clues in your lap, so we can figure it out together.

Suppose we have an ethyl cation. We’ll pretend it looks like this:

As I explained before, it actually doesn’t—not in real life. But never mind! Realism should never stand in the way of a good puzzle.

Continuing on in this unrealistic vein, we’ll pretend that the two black carbon atoms are distinguishable, and so are the five white hydrogen atoms. As you can see, 2 of the hydrogens are bonded to one carbon, and 3 to the other. We don’t care how the hydrogens are arranged, apart from which carbon each hydrogen is attached to. Given this, there are

2 \times \displaystyle{ \binom{5}{2} = 20 }

ways to arrange the hydrogens. Let’s call these arrangements states.

Now draw a dot for each of these 20 states. Draw an edge connecting two dots whenever you can get from one state to another by having a hydrogen hop from the carbon with 2 hydrogens to the carbon with 3. You’ll get this picture, called the Desargues graph:

The red dots are states where the first carbon has 2 hydrogens attached to it; the blue ones are states where the second carbon has 2 hydrogens attached to it. So, each edge goes between a red and a blue dot. And there are 3 edges coming out of each dot, since there are 3 hydrogens that can make the jump!

Now, the puzzle is to show that you can also get the Desargues graph from a different kind of molecule. Any molecule shaped like this will do:

The 2 balls on top and bottom are called axial, while the 3 around the middle are called equatorial.

There are various molecules like this. For example, phosphorus pentachloride. Let’s use that.

Like the ethyl cation, phosphorus pentachloride also has 20 states… but only if count them a certain way! We have to treat all 5 chlorines as distinguishable, but think of two arrangements of them as the same if we can rotate one to get the other. Again, I’m not claiming this is physically realistic: it’s just for the sake of the puzzle.

Phosphorus pentachloride has 6 rotational symmetries, since you can turn it around its axis 3 ways, but also flip it over. So, it has

\displaystyle{ \frac{5!}{6}  = 20}

states.

That’s good: exactly the number of dots in the Desargues graph! But how about the edges? We get these from certain transitions between states. These transitions are called pseudorotations, and they look like this:

Phosphorus pentachloride really does this! First the 2 axial guys move towards each other to become equatorial. Beware: now the equatorial ones are no longer in the horizontal plane: they’re in the plane facing us. Then 2 of the 3 equatorial guys swing out to become axial.

To get from one state to another this way, we have to pick 2 of the 3 equatorial guys to swing out and become axial. There are 3 choices here. So, we again get a graph with 20 vertices and 3 edges coming out of each vertex.

Puzzle. Is this graph the Desargues graph? If so, show it is.

I read in some chemistry papers that it is. But is it really? And if so, why? David Corfield suggested a promising strategy. He pointed out that we just need to get a 1-1 correspondence between

states of the ethyl cation and states of phosphorus pentachloride,

together with a compatible 1-1 correspondence between

transitions of the ethyl cation and transitions of phosphorus pentachloride.

And he suggested that to do this, we should think of the split of hydrogens into a bunch of 2 and a bunch of 3 as analogous to the split of chlorines into a bunch of 2 (the ‘axial’ ones) and a bunch of 3 (the ‘equatorial’ ones).

It’s a promising idea. There’s a problem, though! In the ethyl cation, a single hydrogen hops from the bunch of 3 to the bunch of 2. But in a pseudorotation, two chlorines go from the bunch of 2 to the bunch of 3… and meanwhile, two go back from the bunch of 3 to bunch of 2.

And if you think about it, there’s another problem too. In the ethyl cation, there are 2 distinguishable carbons. One of them has 3 hydrogens attached, and one doesn’t. But in phosphorus pentachloride it’s not like that. The 3 equatorial chlorines are just that: equatorial. They don’t have 2 choices about how to be that way. Or do they?

Well, there’s more to say, but this should already make it clear that getting ‘natural’ one-to-one correspondences is a bit tricky… if it’s even possible at all!

If you know some group theory, we could try solving the problem using the ideas behind Felix Klein’s ‘Erlangen program’. The group of permutations of 5 things, say S_5, acts as symmetries of either molecule. For the ethyl cation the set of states will be X  = S_5/G for some subgroup G. You can think of X as a set of structures of some sort on a 5-element set. The group S_5 acts on X, and the transitions will give an invariant binary relation on X, For phosphorus pentachloride we’ll have some set of states X' = S_5/G' for some other subgroup G', and the transitions will give an invariant relation on X'.

We could start by trying to see if G is the same as G'—or more precisely, conjugate. If they are, that’s a good sign. If not, it’s bad: it probably means there’s no ‘natural’ way to show the graph for phosphorus pentachloride is the Desargues graph.

I could say more, but I’ll stop here. In case you’re wondering, all this is just a trick to get more mathematicians interested in chemistry. A few may then go on to do useful things.


Network Theory (Part 14)

15 October, 2011

We’ve been doing a lot of hard work lately. Let’s take a break and think about a fun example from chemistry!

The ethyl cation

Suppose you start with a molecule of ethane, which has 2 carbons and 6 hydrogens arranged like this:

Then suppose you remove one hydrogen. The result is a positively charged ion, or ‘cation’. When I was a kid, I thought the opposite of a cation should be called a ‘dogion’. Alas, it’s not.

This particular cation, formed from removing one hydrogen from an ethane molecule, is called an ‘ethyl cation’. People used to think it looked like this:

They also thought a hydrogen could hop from the carbon with 3 hydrogens attached to it to the carbon with 2. So, they drew a graph with a vertex for each way the hydrogens could be arranged, and an edge for each hop. It looks really cool:

The red vertices come from arrangements where the first carbon has 2 hydrogens attached to it, and the blue vertices come from those where the second carbon has 2 hydrogens attached to it. So, each edge goes between a red vertex and a blue vertex.

This graph has 20 vertices, which are arrangements or ‘states’ of the ethyl cation. It has 30 edges, which are hops or ‘transitions’. Let’s see why those numbers are right.

First I need to explain the rules of the game. The rules say that the 2 carbon atoms are distinguishable: there’s a ‘first’ one and a ‘second’ one. The 5 hydrogen atoms are also distinguishable. But, all we care about is which carbon atom each hydrogen is bonded to: we don’t care about further details of its location. And we require that 2 of the hydrogens are bonded to one carbon, and 3 to the other.

If you’re a physicist, you may wonder why the rules work this way: after all, at a fundamental level, identical particles aren’t really distinguishable. I’m afraid I can’t give a fully convincing explanation right now: I’m just reporting the rules as they were told to me!

Given these rules, there are 2 choices of which carbon has two hydrogens attached to it. Then there are

\displaystyle{ \binom{5}{2} = \frac{5 \times 4}{2 \times 1} = 10}

choices of which two hydrogens are attached to it. This gives a total of 2 × 10 = 20 states. These are the vertices of our graph: 10 red and 10 blue.

The edges of the graph are transitions between states. Any hydrogen in the group of 3 can hop over to the group of 2. There are 3 choices for which hydrogen atom makes the jump. So, starting from any vertex in the graph there are 3 edges. This means there are 3 \times 20 / 2 = 30 edges.

Why divide by 2? Because each edge touches 2 vertices. We have to avoid double-counting them.

The Desargues graph

The idea of using this graph in chemistry goes back to this paper:

• A. T. Balaban, D. Fǎrcaşiu and R. Bǎnicǎ, Graphs of multiple 1,2-shifts in carbonium ions and related systems, Rev. Roum. Chim. 11 (1966), 1205.

This paper is famous because it was the first to use graphs in chemistry to describe molecular transitions, as opposed to using them as pictures of molecules!

But this particular graph was already famous for other reasons. It’s called the Desargues-Levi graph, or Desargues graph for short:

Desargues graph, Wikipedia.

Later I’ll say why it’s called this.

There are lots of nice ways to draw the Desargues graph. For example:

The reason why we can draw such pretty pictures is that the Desargues graph is very symmetrical. Clearly any permutation of the 5 hydrogens acts as a symmetry of the graph, and so does any permutation of the 2 carbons. This gives a symmetry group S_5 \times S_2, which has 5! \times 2! = 240 elements. And in fact this turns out to be the full symmetry group of the Desargues graph.

The Desargues graph, its symmetry group, and its applications to chemistry are discussed here:

• Milan Randic, Symmetry properties of graphs of interest in chemistry: II: Desargues-Levi graph, Int. Jour. Quantum Chem. 15 (1997), 663-682.

The ethyl cation, revisited

We can try to describe the ethyl cation using probability theory. If at any moment its state corresponds to some vertex of the Desargues graph, and it hops randomly along edges as time goes by, it will trace out a random walk on the Desargues graph. This is a nice example of a Markov process!

We could also try to describe the ethyl cation using quantum mechanics. Then, instead of having a probability of hopping along an edge, it has an amplitude of doing so. But as we’ve seen, a lot of similar math will still apply.

It should be fun to compare the two approaches. But I bet you’re wondering which approach is correct. This is a somewhat tricky question, at least for me. The answer would seem to depend on how much the ethyl cation is interacting with its environment—for example, bouncing off other molecules. When a system is interacting a lot with its environment, a probabilistic approach seems to be more appropriate. The relevant buzzword is ‘environmentally induced decoherence’.

However, there’s something much more basic I have tell you about.

After the paper by Balaban, Fǎrcaşiu and Bǎnicǎ came out, people gradually realized that the ethyl cation doesn’t really look like the drawing I showed you! It’s what chemists call ‘nonclassical’ ion. What they mean is this: its actual structure is not what you get by taking the traditional ball-and-stick model of an ethane molecule and ripping off a hydrogen. The ethyl cation really looks like this:

For more details, and pictures that you can actually rotate, see:

• Stephen Bacharach, Ethyl cation, Computational Organic Chemistry.

So, if we stubbornly insist on applying the Desargues graph to realistic chemistry, we need to find some other molecule to apply it to.

Trigonal bipyramidal molecules

Luckily, there are lots of options! They’re called trigonal bipyramidal molecules. They look like this:

The 5 balls on the outside are called ‘ligands’: they could be atoms or bunches of atoms. In chemistry, ‘ligand‘ just means something that’s stuck onto a central thing. For example, in phosphorus pentachloride the ligands are chlorine atoms, all attached to a central phosphorus atom:

It’s a colorless solid, but as you might expect, it’s pretty nasty stuff: it’s not flammable, but it reacts with water or heat to produce toxic chemicals like hydrogen chloride.

Another example is iron pentacarbonyl, where 5 carbon-oxygen ligands are attached to a central iron atom:

You can make this stuff by letting powdered iron react with carbon monoxide. It’s a straw-colored liquid with a pungent smell!

Whenever you’ve got a molecule of this shape, the ligands come in two kinds. There are the 2 ‘axial’ ones, and the 3 ‘equatorial’ ones:

And the molecule has 20 states… but only if count the states a certain way. We have to treat all 5 ligands as distinguishable, but think of two arrangements of them as the same if we can rotate one to get the other. The trigonal bipyramid has a rotational symmetry group with 6 elements. So, there are 5! / 6 = 20 states.

The transitions between states are devilishly tricky. They’re called pseudorotations, and they look like this:

If you look very carefully, you’ll see what’s going on. First the 2 axial ligands move towards each other to become equatorial.
Now the equatorial ones are no longer in the horizontal plane: they’re in the plane facing us! Then 2 of the 3 equatorial ones swing out to become axial. This fancy dance is called the Berry pseudorotation mechanism.

To get from one state to another this way, we have to pick 2 of the 3 equatorial ligands to swing out and become axial. There are 3 choices here. So, if we draw a graph with states as vertices and transitions as edges, it will have 20 vertices and 20 × 3 / 2 = 30 edges. That sounds suspiciously like the Desargues graph!

Puzzle 1. Show that the graph with states of a trigonal bipyramidal molecule as vertices and pseudorotations as edges is indeed the Desargues graph.

I think this fact was first noticed here:

• Paul C. Lauterbur and Fausto Ramirez, Pseudorotation in trigonal-bipyramidal molecules, J. Am. Chem. Soc. 90 (1968), 6722–6726.

Okay, enough for now! Next time I’ll say more about the Markov process or quantum process corresponding to a random walk on the Desargues graph. But since the Berry pseudorotation mechanism is so hard to visualize, I’ll pretend that the ethyl cation looks like this:

and I’ll use this picture to help us think about the Desargues graph.

That’s okay: everything we’ll figure out can easily be translated to apply to the real-world situation of a trigonal bipyramidal molecule. The virtue of math is that when two situations are ‘mathematically the same’, or ‘isomorphic’, we can talk about either one, and the results automatically apply to the other. This is true even if the one we talk about doesn’t actually exist in the real world!


Network Theory (Part 13)

11 October, 2011

Unlike some recent posts, this will be very short. I merely want to show you the quantum and stochastic versions of Noether’s theorem, side by side.

Having made my sacrificial offering to the math gods last time by explaining how everything generalizes when we replace our finite set X of states by an infinite set or an even more general measure space, I’ll now relax and state Noether’s theorem only for a finite set. If you’re the sort of person who finds that unsatisfactory, you can do the generalization yourself.

Two versions of Noether’s theorem

Let me write the quantum and stochastic Noether’s theorem so they look almost the same:

Theorem. Let X be a finite set. Suppose H is a self-adjoint operator on L^2(X), and let O be an observable. Then

[O,H] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value of O in the state \psi(t) does not change with t.

Theorem. Let X be a finite set. Suppose H is an infinitesimal stochastic operator on L^1(X), and let O be an observable. Then

[O,H] =0

if and only if for all states \psi(t) obeying the master equation

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

the expected values of O and O^2 in the state \psi(t) do not change with t.

This makes the big difference stick out like a sore thumb: in the quantum version we only need the expected value of O, while in the stochastic version we need the expected values of O and O^2!

Brendan Fong proved the stochastic version of Noether’s theorem in Part 11. Now let’s do the quantum version.

Proof of the quantum version

My statement of the quantum version was silly in a couple of ways. First, I spoke of the Hilbert space L^2(X) for a finite set X, but any finite-dimensional Hilbert space will do equally well. Second, I spoke of the “self-adjoint operator” H and the “observable” O, but in quantum mechanics an observable is the same thing as a self-adjoint operator!

Why did I talk in such a silly way? Because I was attempting to emphasize the similarity between quantum mechanics and stochastic mechanics. But they’re somewhat different. For example, in stochastic mechanics we have two very different concepts: infinitesimal stochastic operators, which generate symmetries, and functions on our set X, which are observables. But in quantum mechanics something wonderful happens: self-adjoint operators both generate symmetries and are observables! So, my attempt was a bit strained.

Let me state and prove a less silly quantum version of Noether’s theorem, which implies the one above:

Theorem. Suppose H and O are self-adjoint operators on a finite-dimensional Hilbert space. Then

[O,H] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value of O in the state \psi(t) does not change with t:

\displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle = 0 }

Proof. The trick is to compute the time derivative I just wrote down. Using Schrödinger’s equation, the product rule, and the fact that H is self-adjoint we get:

\begin{array}{ccl}  \displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle } &=&   \langle -i H \psi(t) , O \psi(t) \rangle + \langle \psi(t) , O (- i H \psi(t)) \rangle \\  \\  &=& i \langle \psi(t) , H O \psi(t) \rangle -i \langle \psi(t) , O H \psi(t)) \rangle \\  \\  &=& - i \langle \psi(t), [O,H] \psi(t) \rangle  \end{array}

So, if [O,H] = 0, clearly the above time derivative vanishes. Conversely, if this time derivative vanishes for all states \psi(t) obeying Schrödinger’s equation, we know

\langle \psi, [O,H] \psi \rangle = 0

for all states \psi and thus all vectors in our Hilbert space. Does this imply [O,H] = 0? Yes, because i times a commutator of a self-adjoint operators is self-adjoint, and for any self-adjoint operator A we have

\forall \psi  \; \; \langle \psi, A \psi \rangle = 0 \qquad \Rightarrow \qquad A = 0

This is a well-known fact whose proof goes like this. Assume \langle \psi, A \psi \rangle = 0 for all \psi. Then to show A = 0, it is enough to show \langle \phi, A \psi \rangle = 0 for all \phi and \psi. But we have a marvelous identity:

\begin{array}{ccl} \langle \phi, A \psi \rangle &=& \frac{1}{4} \left( \langle \phi + \psi, \, A (\phi + \psi) \rangle \; - \; \langle \psi - \phi, \, A (\psi - \phi) \rangle \right. \\ && \left. +i \langle \psi + i \phi, \, A (\psi + i \phi) \rangle \; - \; i\langle \psi - i \phi, \, A (\psi - i \phi) \rangle \right) \end{array}

and all four terms on the right vanish by our assumption.   █

The marvelous identity up there is called the polarization identity. In plain English, it says: if you know the diagonal entries of a self-adjoint matrix in every basis, you can figure out all the entries of that matrix in every basis.

Why is it called the ‘polarization identity’? I think because it shows up in optics, in the study of polarized light.

Comparison

In both the quantum and stochastic cases, the time derivative of the expected value of an observable O is expressed in terms of its commutator with the Hamiltonian. In the quantum case we have

\displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle = - i \langle \psi(t), [O,H] \psi(t) \rangle }

and for the right side to always vanish, we need [O,H] = 0 latex , thanks to the polarization identity. In the stochastic case, a perfectly analogous equation holds:

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

but now the right side can always vanish even without [O,H] = 0. We saw a counterexample in Part 11. There is nothing like the polarization identity to save us! To get [O,H] = 0 we need a supplementary hypothesis, for example the vanishing of

\displaystyle{ \frac{d}{d t} \int O^2 \psi(t) }

Okay! Starting next time we’ll change gears and look at some more examples of stochastic Petri nets and Markov processes, including some from chemistry. After some more of that, I’ll move on to networks of other sorts. There’s a really big picture here, and I’m afraid I’ve been getting caught up in the details of a tiny corner.


Network Theory (Part 12)

9 October, 2011

Last time we proved a version of Noether’s theorem for stochastic mechanics. Now I want to compare that to the more familiar quantum version.

But to do this, I need to say more about the analogy between stochastic mechanics and quantum mechanics. And whenever I try, I get pulled toward explaining some technical issues involving analysis: whether sums converge, whether derivatives exist, and so on. I’ve been trying to avoid such stuff—not because I dislike it, but because I’m afraid you might. But the more I put off discussing these issues, the more they fester and make me unhappy. In fact, that’s why it’s taken so long for me to write this post!

So, this time I will gently explore some of these issues. But don’t be scared: I’ll mainly talk about some simple big ideas. Next time I’ll discuss Noether’s theorem. I hope that by getting the technicalities out of my system, I’ll feel okay about hand-waving whenever I want.

And if you’re an expert on analysis, maybe you can help me with a question.

Stochastic mechanics versus quantum mechanics

First, we need to recall the analogy we began sketching in Part 5, and push it a bit further. The idea is that stochastic mechanics differs from quantum mechanics in two big ways:

• First, instead of complex amplitudes, stochastic mechanics uses nonnegative real probabilities. The complex numbers form a ring; the nonnegative real numbers form a mere rig, which is a ‘ring without negatives’. Rigs are much neglected in the typical math curriculum, but unjustly so: they’re almost as good as rings in many ways, and there are lots of important examples, like the natural numbers \mathbb{N} and the nonnegative real numbers, [0,\infty). For probability theory, we should learn to love rigs.

But there are, alas, situations where we need to subtract probabilities, even when the answer comes out negative: namely when we’re taking the time derivative of a probability. So sometimes we need \mathbb{R} instead of just [0,\infty).

• Second, while in quantum mechanics a state is described using a ‘wavefunction’, meaning a complex-valued function obeying

\int |\psi|^2 = 1

in stochastic mechanics it’s described using a ‘probability distribution’, meaning a nonnegative real function obeying

\int \psi = 1

So, let’s try our best to present the theories in close analogy, while respecting these two differences.

States

We’ll start with a set X whose points are states that a system can be in. Last time I assumed X was a finite set, but this post is so mathematical I might as well let my hair down and assume it’s a measure space. A measure space lets you do integrals, but a finite set is a special case, and then these integrals are just sums. So, I’ll write things like

\int f

and mean the integral of the function f over the measure space X, but if X is a finite set this just means

\sum_{x \in X} f(x)

Now, I’ve already defined the word ‘state’, but both quantum and stochastic mechanics need a more general concept of state. Let’s call these ‘quantum states’ and ‘stochastic states’:

• In quantum mechanics, the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

\int | \psi |^2 = 1

We call \psi: X \to \mathbb{C} obeying this equation a quantum state.

• In stochastic mechanics, the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

\int \psi = 1

We call \psi: X \to [0,\infty) obeying this equation a stochastic state.

In quantum mechanics we often use this abbreviation:

\langle \phi, \psi \rangle = \int \overline{\phi} \psi

so that a quantum state has

\langle \psi, \psi \rangle = 1

Similarly, we could introduce this notation in stochastic mechanics:

\langle \psi \rangle = \int \psi

so that a stochastic state has

\langle \psi \rangle = 1

But this notation is a bit risky, since angle brackets of this sort often stand for expectation values of observables. So, I’ve been writing \int \psi, and I’ll keep on doing this.

In quantum mechanics, \langle \phi, \psi \rangle is well-defined whenever both \phi and \psi live in the vector space

L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int |\psi|^2 < \infty \}

In stochastic mechanics, \langle \psi \rangle is well-defined whenever \psi lives in the vector space

L^1(X) =  \{ \psi: X \to \mathbb{R} \; : \; \int |\psi| < \infty \}

You’ll notice I wrote \mathbb{R} rather than [0,\infty) here. That’s because in some calculations we’ll need functions that take negative values, even though our stochastic states are nonnegative.

Observables

A state is a way our system can be. An observable is something we can measure about our system. They fit together: we can measure an observable when our system is in some state. If we repeat this we may get different answers, but there’s a nice formula for average or ‘expected’ answer.

• In quantum mechanics, an observable is a self-adjoint operator A on L^2(X). The expected value of A in the state \psi is

\langle \psi, A \psi \rangle

Here I’m assuming that we can apply A to \psi and get a new vector A \psi \in L^2(X). This is automatically true when X is a finite set, but in general we need to be more careful.

• In stochastic mechanics, an observable is a real-valued function A on X. The expected value of A in the state \psi is

\int A \psi

Here we’re using the fact that we can multiply A and \psi and get a new vector A \psi \in L^1(X), at least if A is bounded. Again, this is automatic if X is a finite set, but not otherwise.

Symmetries

Besides states and observables, we need ‘symmetries’, which are transformations that map states to states. We use these to describe how our system changes when we wait a while, for example.

• In quantum mechanics, an isometry is a linear map U: L^2(X) \to L^2(X) such that

\langle U \phi, U \psi \rangle = \langle \phi, \psi \rangle

for all \psi, \phi \in L^2(X). If U is an isometry and \psi is a quantum state, then U \psi is again a quantum state.

• In stochastic mechanics, a stochastic operator is a linear map U: L^1(X) \to L^1(X) such that

\int U \psi = \int \psi

and

\psi \ge 0 \; \; \Rightarrow \; \; U \psi \ge 0

for all \psi \in L^1(X). If U is stochastic and \psi is a stochastic state, then U \psi is again a stochastic state.

In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these, and their inverses are always isometries. There are, however, very few stochastic operators whose inverses are stochastic:

Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible, and its inverse is again an isometry.

Puzzle 2. Suppose X is a finite set. Which stochastic operators U: L^1(X) \to L^1(X) have stochastic inverses?

This is why we usually think of time evolution as being reversible quantum mechanics, but not in stochastic mechanics! In quantum mechanics we often describe time evolution using a ‘1-parameter group’, while in stochastic mechanics we describe it using a 1-parameter semigroup… meaning that we can run time forwards, but not backwards.

But let’s see how this works in detail!

Time evolution in quantum mechanics

In quantum mechanics there’s a beautiful relation between observables and symmetries, which goes like this. Suppose that for each time t we want a unitary operator U(t) :  L^2(X) \to L^2(X) that describes time evolution. Then it makes a lot of sense to demand that these operators form a 1-parameter group:

Definition. A collection of linear operators U(t) (t \in \mathbb{R}) on some vector space forms a 1-parameter group if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s,t \in \mathbb{R}.

Note that these conditions force all the operators U(t) to be invertible.

Now suppose our vector space is a Hilbert space, like L^2(X). Then we call a 1-parameter group a 1-parameter unitary group if the operators involved are all unitary.

It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice! So, we always focus on the continuous case:

Definition. A 1-parameter unitary group is strongly continuous if U(t) \psi depends continuously on t for all \psi, in this sense:

t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0

Then we get a classic result proved by Marshall Stone back in the early 1930s. You may not know him, but he was so influential at the University of Chicago during this period that it’s often called the “Stone Age”. And here’s one reason why:

Stone’s Theorem. There is a one-to-one correspondence between strongly continuous 1-parameter unitary groups on a Hilbert space and self-adjoint operators on that Hilbert space, given as follows. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(-i t H)

for a unique self-adjoint operator H. Conversely, any self-adjoint operator determines a strongly continuous 1-parameter group this way. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

When U(t) = \exp(-i t H) describes the evolution of a system in time, H is is called the Hamiltonian, and it has the physical meaning of ‘energy’. The equation I just wrote down is then called Schrödinger’s equation.

So, simply put, in quantum mechanics we have a correspondence between observables and nice one-parameter groups of symmetries. Not surprisingly, our favorite observable, energy, corresponds to our favorite symmetry: time evolution!

However, if you were paying attention, you noticed that I carefully avoided explaining how we define \exp(- i t H). I didn’t even say what a self-adjoint operator is. This is where the technicalities come in: they arise when H is unbounded, and not defined on all vectors in our Hilbert space.

Luckily, these technicalities evaporate for finite-dimensional Hilbert spaces, such as L^2(X) for a finite set X. Then we get:

Stone’s Theorem (Baby Version). Suppose we are given a finite-dimensional Hilbert space. In this case, a linear operator H on this space is self-adjoint iff it’s defined on the whole space and

\langle \phi , H \psi \rangle = \langle H \phi, \psi \rangle

for all vectors \phi, \psi. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(- i t H)

for a unique self-adjoint operator H, where

\displaystyle{ \exp(-i t H) \psi = \sum_{n = 0}^\infty \frac{(-i t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any self-adjoint operator on our space determines a strongly continuous 1-parameter group this way. For all vectors \psi in our space we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

and if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

Time evolution in stochastic mechanics

We’ve seen that in quantum mechanics, time evolution is usually described by a 1-parameter group of operators that comes from an observable: the Hamiltonian. Stochastic mechanics is different!

First, since stochastic operators aren’t usually invertible, we typically describe time evolution by a mere ‘semigroup’:

Definition. A collection of linear operators U(t) (t \in [0,\infty)) on some vector space forms a 1-parameter semigroup if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s, t \ge 0.

Now suppose this vector space is L^1(X) for some measure space X. We want to focus on the case where the operators U(t) are stochastic and depend continuously on t in the same sense we discussed earlier.

Definition. A 1-parameter strongly continuous semigroup of stochastic operators U(t) : L^1(X) \to L^1(X) is called a Markov semigroup.

What’s the analogue of Stone’s theorem for Markov semigroups? I don’t know a fully satisfactory answer! If you know, please tell me.

Later I’ll say what I do know—I’m not completely clueless—but for now let’s look at the ‘baby’ case where X is a finite set. Then the story is neat and complete:

Theorem. Suppose we are given a finite set X. In this case, a linear operator H on L^1(X) is infinitesimal stochastic iff it’s defined on the whole space,

\int H \psi = 0

for all \psi \in L^1(X), and the matrix of H in terms of the obvious basis obeys

H_{i j} \ge 0

for all j \ne i. Given a Markov semigroup U(t) on L^1(X), we can always write

U(t) = \exp(t H)

for a unique infinitesimal stochastic operator H, where

\displaystyle{ \exp(t H) \psi = \sum_{n = 0}^\infty \frac{(t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any infinitesimal stochastic operator on our space determines a Markov semigroup this way. For all \psi \in L^1(X) we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

and if we set

\psi(t) = \exp(t H) \psi

we have the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

In short, time evolution in stochastic mechanics is a lot like time evolution in quantum mechanics, except it’s typically not invertible, and the Hamiltonian is typically not an observable.

Why not? Because we defined an observable to be a function A: X \to \mathbb{R}. We can think of this as giving an operator on L^1(X), namely the operator of multiplication by A. That’s a nice trick, which we used to good effect last time. However, at least when X is a finite set, this operator will be diagonal in the obvious basis consisting of functions that equal 1 at one point of X and zero elsewhere. So, it can only be infinitesimal stochastic if it’s zero!

Puzzle 3. If X is a finite set, show that any operator on L^1(X) that’s both diagonal and infinitesimal stochastic must be zero.

The Hille–Yosida theorem

I’ve now told you everything you really need to know… but not everything I want to say. What happens when X is not a finite set? What are Markov semigroups like then? I can’t abide letting this question go unresolved! Unfortunately I only know a partial answer.

We can get a certain distance using the Hille-Yosida theorem, which is much more general.

Definition. A Banach space is vector space with a norm such that any Cauchy sequence converges.

Examples include Hilbert spaces like L^2(X) for any measure space, but also other spaces like L^1(X) for any measure space!

Definition. If V is a Banach space, a 1-parameter semigroup of operators U(t) : V \to V is called a contraction semigroup if it’s strongly continuous and

\| U(t) \psi \| \le \| \psi \|

for all t \ge 0 and all \psi \in V.

Examples include strongly continuous 1-parameter unitary groups, but also Markov semigroups!

Puzzle 4. Show any Markov semigroup is a contraction semigroup.

The Hille–Yosida theorem generalizes Stone’s theorem to contraction semigroups. In my misspent youth, I spent a lot of time carrying around Yosida’s book Functional Analysis. Furthermore, Einar Hille was the advisor of my thesis advisor, Irving Segal. Segal generalized the Hille–Yosida theorem to nonlinear operators, and I used this generalization a lot back when I studied nonlinear partial differential equations. So, I feel compelled to tell you this theorem:

Hille-Yosida Theorem. Given a contraction semigroup U(t) we can always write

U(t) = \exp(t H)

for some densely defined operator H such that H - \lambda I has an inverse and

\displaystyle{ \| (H - \lambda I)^{-1} \psi \| \le \frac{1}{\lambda} \| \psi \| }

for all \lambda > 0 and \psi \in V. Conversely, any such operator determines a strongly continuous 1-parameter group. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = U(t) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

If you like, you can take the stuff at the end of this theorem to be what we mean by saying U(t) = \exp(t H). When U(t) = \exp(t H), we say that H generates the semigroup U(t).

But now suppose V = L^1(X). Besides the conditions in the Hille–Yosida theorem, what extra conditions on H are necessary and sufficient for it to generate a Markov semigroup? In other words, what’s a definition of ‘infinitesimal stochastic operator’ that’s suitable not only when X is a finite set, but an arbitrary measure space?

I asked this question on Mathoverflow a few months ago, and so far the answers have not been completely satisfactory.

Some people mentioned the Hille–Yosida theorem, which is surely a step in the right direction, but not the full answer.

Others discussed the special case when \exp(t H) extends to a bounded self-adjoint operator on L^2(X). When X is a finite set, this special case happens precisely when the matrix H_{i j} is symmetric: the probability of hopping from j to i equals the probability of hopping from i to j. This is a fascinating special case, not least because when H is both infinitesimal stochastic and self-adjoint, we can use it as a Hamiltonian for both stochastic mechanics and quantum mechanics! Someday I want to discuss this. However, it’s just a special case.

After grabbing people by the collar and insisting that I wanted to know the answer to the question I actually asked—not some vaguely similar question—the best answer seems to be Martin Gisser’s reference to this book:

• Zhi-Ming Ma and Michael Röckner, Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer, Berlin, 1992.

This book provides a very nice self-contained proof of the Hille-Yosida theorem. On the other hand, it does not answer my question in general, but only when the skew-symmetric part of H is dominated (in a certain sense) by the symmetric part.

So, I’m stuck on this front, but that needn’t bring the whole project to a halt. We’ll just sidestep this question.

For a good well-rounded introduction to Markov semigroups and what they’re good for, try:

• Ryszard Rudnicki, Katarzyna Pichór and Marta Tyran-Kamínska, Markov semigroups and their applications.


Chaitin’s Theorem and the Surprise Examination Paradox

6 October, 2011

If you followed the business about Edward Nelson’s claim to have proved the inconsistency of arithmetic, you might have heard people mention this paper:

• Shira Kritchman and Ran Raz, The surprise examination paradox and the second incompleteness theorem, AMS Notices 57 (December 2010), 1454–1458.

It’s got a great idea in it, which I’d like to explain in a really sloppy way so everyone can understand it. Logic is cool, but most people never get to the cool part because they can’t fight their way through the rather dry technicalities.

You all know the surprise examination paradox, right? The teacher says one day he’ll give a quiz and it will be a surprise. So the kids think “well, it can’t be on the last day then—we’d know it was coming.” And then they think “well, so it can’t be on the day before the last day, either!—we’d know it was coming.” And so on… and they convince themselves it can’t happen at all.

But then the teacher gives it the very next day, and they’re completely surprised.

People still argue a lot about how to settle this paradox. But anyway, Kritchman and Raz use a rigorous version of this paradox together with Chaitin’s incompleteness theorem to demonstrate Gödel’s second incompleteness theorem—which says, very roughly, that:

If math can prove itself consistent, it’s not.

If you’re a logician, I bet this sort of sloppy statement really annoys you. Yeah? Does it? Take a chill pill, dude: this post isn’t for you—it’s for ordinary mortals. I said I want to summarize Kritchman and Raz’s argument in a really sloppy way. If you want precision and details, read their paper.

Okay, here we go:

Chaitin’s theorem, which is already astounding, says there’s some length L such that you can’t prove any particular string of bits needs a program longer than L to print it out. At least, this is so if math is consistent. If it’s not consistent, you can prove anything!

On the other hand, there’s some finite number of programs of length ≤ L. So if you take a list of more numbers than that, say 1, 2, …, N, there’s got to be at least one that needs a program longer than L to print it out.

Okay: there’s got to be at least one. How many? Suppose just one. Then we can go through all programs of length ≤ L, find those that print all the other numbers on our list… and thus, by a process of elimination, find the culprit.

But that means we’ve proved that this culprit is a number can only be computed by a program of length > L. But Chaitin’s theorem says that’s impossible! At least, not if math is consistent.

So there can’t be just one. At least, not if math is consistent.

Okay: suppose there are just two. Well, we can pull the same trick and find out who they are! So there can’t be just two, either. At least, not if math is consistent.

We can keep playing this game until we rule out all the possibilities… and then we’re really stuck. We get a contradiction. At least, if math is consistent.

So if we could prove math is consistent, we’d know it’s not!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers