A Course on Quantum Techniques for Stochastic Mechanics

18 September, 2012

Jacob Biamonte and I have come out with a draft of a book!

A course on quantum techniques for stochastic mechanics.

It’s based on the first 24 network theory posts on this blog. It owes a lot to everyone here, and the acknowledgements just scratch the surface of that indebtedness. At some later time I’d like to go through the posts and find the top twenty people who need to be thanked. But I’m leaving Singapore on Friday, going back to California to teach at U.C. Riverside, so I’ve been rushing to get something out before then.

If you see typos or other problems, please let us know!
We’ve reorganized the original blog articles and polished them up a bit, but we plan to do more before publishing these notes as a book.

I’m looking forward to teaching a seminar called Mathematics of the Environment when I get back to U.C. Riverside, and with luck I’ll put some notes from that on the blog here. I will also be trying to round up a team of grad students to work on network theory.

The next big topics in the network theory series will be electrical circuits and Bayesian networks. I’m beginning to see how these fit together with stochastic Petri nets in a unified framework, but I’ll need to talk and write about it to fill in all the details.

You can get a sense of what this course is about by reading this:

Foreword

This course is about a curious relation between two ways of describing situations that change randomly with the passage of time. The old way is probability theory and the new way is quantum theory

Quantum theory is based, not on probabilities, but on amplitudes. We can use amplitudes to compute probabilities. However, the relation between them is nonlinear: we take the absolute value of an amplitude and square it to get a probability. It thus seems odd to treat amplitudes as directly analogous to probabilities. Nonetheless, if we do this, some good things happen. In particular, we can take techniques devised in quantum theory and apply them to probability theory. This gives new insights into old problems.

There is, in fact, a subject eager to be born, which is mathematically very much like quantum mechanics, but which features probabilities in the same equations where quantum mechanics features amplitudes. We call this subject stochastic mechanics

Plan of the course

In Section 1 we introduce the basic object of study here: a ‘stochastic Petri net’. A stochastic Petri net describes in a very general way how collections of things of different kinds can randomly interact and turn into other things. If we consider large numbers of things, we obtain a simplified deterministic model called the ‘rate equation’, discussed in Section 2. More fundamental, however, is the ‘master equation’, introduced in Section 3. This describes how the probability of having various numbers of things of various kinds changes with time.

In Section 4 we consider a very simple stochastic Petri net and notice that in this case, we can solve the master equation using techniques taken from quantum mechanics. In Section 5 we sketch how to generalize this: for any stochastic Petri net, we can write down an operator called a ‘Hamiltonian’ built from ‘creation and annihilation operators’, which describes the rate of change of the probability of having various numbers of things. In Section 6 we illustrate this with an example taken from population biology. In this example the rate equation is just the logistic equation, one of the simplest models in population biology. The master equation describes reproduction and competition of organisms in a stochastic way.

In Section 7 we sketch how time evolution as described by the master equation can be written as a sum over Feynman diagrams. We do not develop this in detail, but illustrate it with a predator–prey model from population biology. In the process, we give a slicker way of writing down the Hamiltonian for any stochastic Petri net.

In Section 8 we enter into a main theme of this course: the study of equilibrium solutions of the master and rate equations. We present the Anderson–Craciun–Kurtz theorem, which shows how to get equilibrium solutions of the master equation from equilibrium solutions of the rate equation, at least if a certain technical condition holds. Brendan Fong has translated Anderson, Craciun and Kurtz’s original proof into the language of annihilation and creation operators, and we give Fong’s proof here. In this language, it turns out that the equilibrium solutions are mathematically just like ‘coherent states’ in quantum mechanics.

In Section 9 we give an example of the Anderson–Craciun–Kurtz theorem coming from a simple reversible reaction in chemistry. This example leads to a puzzle that is resolved by discovering that the presence of ‘conserved quantities’—quantities that do not change with time—let us construct many equilibrium solutions of the rate equation other than those given by the Anderson–Craciun–Kurtz theorem.

Conserved quantities are very important in quantum mechanics, and they are related to symmetries by a result called Noether’s theorem. In Section 10 we describe a version of Noether’s theorem for stochastic mechanics, which we proved with the help of Brendan Fong. This applies, not just to systems described by stochastic Petri nets, but a much more general class of processes called ‘Markov processes’. In the analogy to quantum mechanics, Markov processes are analogous to arbitrary quantum systems whose time evolution is given by a Hamiltonian. Stochastic Petri nets are analogous to a special case of these: the case where the Hamiltonian is built from annihilation and creation operators. In Section 11 we state the analogy between quantum mechanics and stochastic mechanics more precisely, and with more attention to mathematical rigor. This allows us to set the quantum and stochastic versions of Noether’s theorem side by side and compare them in Section 12.

In Section 13 we take a break from the heavy abstractions and look at a fun example from chemistry, in which a highly symmetrical molecule randomly hops between states. These states can be seen as vertices of a graph, with the transitions as edges. In this particular example we get a famous graph with 20 vertices and 30 edges, called the ‘Desargues graph’.

In Section 14 we note that the Hamiltonian in this example is a ‘graph Laplacian’, and, following a computation done by Greg Egan, we work out the eigenvectors and eigenvalues of this Hamiltonian explicitly. One reason graph Laplacians are interesting is that we can use them as Hamiltonians to describe time evolution in both stochastic and quantum mechanics. Operators with this special property are called ‘Dirichlet operators’, and we discuss them in Section 15. As we explain, they also describe electrical circuits made of resistors. Thus, in a peculiar way, the intersection of quantum mechanics and stochastic mechanics is the study of electrical circuits made of resistors!

In Section 16, we study the eigenvectors and eigenvalues of an arbitrary Dirichlet operator. We introduce a famous result called the Perron–Frobenius theorem for this purpose. However, we also see that the Perron–Frobenius theorem is important for understanding the equilibria of Markov processes. This becomes important later when we prove the ‘deficiency zero theorem’.

We introduce the deficiency zero theorem in Section 17. This result, proved by the chemists Feinberg, Horn and Jackson, gives equilibrium solutions for the rate equation for a large class of stochastic Petri nets. Moreover, these equilibria obey the extra condition that lets us apply the Anderson–Craciun–Kurtz theorem and obtain equlibrium solutions of the master equations as well. However, the deficiency zero theorem is best stated, not in terms of stochastic Petri nets, but in terms of another, equivalent, formalism: ‘chemical reaction networks’. So, we explain chemical reaction networks here, and use them heavily throughout the rest of the course. However, because they are applicable to such a large range of problems, we call them simply ‘reaction networks’. Like stochastic Petri nets, they describe how collections of things of different kinds randomly interact and turn into other things.

In Section 18 we consider a simple example of the deficiency zero theorem taken from chemistry: a diatomic gas. In Section 19 we apply the Anderson–Craciun–Kurtz theorem to the same example.

In Section 20 we begin the final phase of the course: proving the deficiency zero theorem, or at least a portion of it. In this section we discuss the concept of ‘deficiency’, which had been introduced before, but not really explained: the definition that makes the deficiency easy to compute is not the one that says what this concept really means. In Section 21 we show how to rewrite the rate equation of a stochastic Petri net—or equivalently, of a reaction network—in terms of a Markov process. This is surprising because the rate equation is nonlinear, while the equation describing a Markov process is linear in the probabilities involved. The trick is to use a nonlinear operation called ‘matrix exponentiation’. In Section 22 we study equilibria for Markov processes. Then, finally, in Section 23, we use these equilbria to obtain equilibrium solutions of the rate equation, completing our treatment of the deficiency zero theorem.


Rolling Circles and Balls (Part 3)

11 September, 2012

In Part 1 and Part 2 we looked at the delightful curves you get by rolling one circle on another. Now let’s see what happens when you roll one circle inside another!

Four times as big

If you roll a circle inside a circle that’s 4 times as big, we get an astroid:

Puzzle 1. How many times does the rolling circle turn as it rolls all the way around?

By the way: don’t confuse an astroid with an asteroid. They both got their names because someone thought they looked like stars, but that’s where resemblance ends!

You can get an astroid using this funny parody of the equation for a circle:

x^{2/3} + y^{2/3} = 1

Or, if you don’t like equations, you can get a quarter of an astroid by letting a ladder slide down a wall and taking a time-lapse photo!

In other words, you get a whole astroid by taking the envelope of all line segments of length 1 going from some point on the x axis to some point on the y axis!

Three times as big

If the fixed circle is just 3 times as big as the one rolling inside it, we get an deltoid:

Puzzle 2. Now how many times does the rolling circle turn as it rolls all the way around?

By the way: it looks like we’re back to naming curves after body parts… but we’re not: both this curve and the muscle called a deltoid got their names because they look like the Greek letter delta:

Puzzle 3. Did the Greek letter delta get that name because it was shaped like a river delta, or was it the other way around?

As you might almost expect by now, if you’ve been reading this whole series, there are weird relations between the deltoid and the astroid.

For example: take a deltoid and shine parallel rays of light at it from any direction. Then the envelope of these rays is an astroid!


We summarize this by saying that the astroid is a catacaustic of the deltoid. This picture is by Xah Lee, who has also made a nice movie of what happens as you rotate the light source:

• Xah Lee, Deltoid catacaustic movie.

I don’t completelly understand the rays going through the deltoid, in either the picture or the movie. It looks like those rays are getting refracted, but that would be a diacaustic, not a catacaustic. ,I think they’re formed by continuing reflected rays to straight lines that go through the deltoid. If you didn’t do that you wouldn’t get a whole astroid, just part of one.

Anyway, at the very least you get part of an astroid, which you can complete to a whole one. And then, as you rotate the light source, the astroid you get rolls around the deltoid in a pleasant manner! This is nicely illustrated here:

Deltoid catacaustic, Wolfram Mathworld.

You can also get a deltoid from a deltoid! Draw all the osculating circles of the deltoid—that is, circles that match the deltoid’s curvature as well as its slope at the points they touch. The centers of these circles lie on another, larger deltoid:

We summarize this by saying that the evolute of a deltoid is another deltoid.

There are also fancier ways to get deltoids if you know more math. For example, the set of traces of matrices lying in the group SU(3) forms a filled-in deltoid in the complex plane!

This raises a question which unfortunately I’m too lazy to answer myself. So, I’ll just pose it as a puzzle:

Puzzle 4. Is the set of traces of matrices lying in SU(4) a filled-in astroid? In simpler terms, consider the values of a + b + c + d that we can get from complex numbers a,b,c,d with |a| = |b| = |c| = |d| = 1 and abcd = 1. Do these values form a filled-in astroid?

Twice as big

But now comes the climax of today’s story: what happens when we let a circle roll inside a circle that’s exactly twice as big?

Now you can see the rolling circle turn around once as it rolls all the way around the big one.

More excitingly, if we track a point on the rolling circle, it traces out a straight line! Can you see why? Later I’ll give a few proofs.

This gadget is called a Tusi couple. You could use it to convert a rolling motion into a vibrating one using some gears. Greg Egan made a nice animation showing how:

The Tusi couple is named after the Persian astronomer and mathematician Nasir al-Din al-Tusi, who discovered it around 1247, when he wrote a commentary on Ptomely’s Almagest, an important astronomical text:

He wrote:

If two coplanar circles, the diameter of one of which is equal to half the diameter of the other, are taken to be internally tangent at a point, and if a point is taken on the smaller circle—and let it be at the point of tangency—and if the two circles move with simple motions in opposite direction in such a way that the motion of the smaller is twice that of the larger so the smaller completes two rotations for each rotation of the larger, then that point will be seen to move on the diameter of the larger circle that initially passes through the point of tangency, oscillating between the endpoints.

I don’t quite understand why he was interested in this, but it has something to do with using epicycles to build linear motion out of circular motion. It also has something to do with the apparent motion of planets between the Earth and the Sun.

Later Copernicus also studied the Tusi couple. He proved that the moving point really did trace out a straight line:

However, many suspect that this was not a true rediscovery: al-Tusi had also proved this, and some aspects of Copernicus’ proof seem too similar to al-Tusi’s to be coincidence:

• George Saaliba, Whose science is Arabic science in Renaissance Europe?, Section 2: Arabic/Islamic science and the Renaissance science in Italy.

• I. N. Veselovsky, Copernicus and Nasir al-Din al-Tusi, Journal for the History of Astronomy 4 (1973), 128–130.

In fact, the Tusi couple goes back way before al-Tusi. It was known to Proclus back around 450 AD! Apparently he wrote about it in his Commentary on the First Book of Euclid. Proclus is mainly famous as a philosopher: people think of him as bringing neo-Platonism to its most refined heights. Given that, it’s no surprise that he also liked math. Indeed, he said:

Wherever there is number, there is beauty.

And of course this is what I’ve been trying to show you throughout this series.

Why the Tusi couple works

So, here are three proofs that a Tusi couple really does trace out a straight line. I posed this as a puzzle on Google+, and here are my favorite three answers. I like them because they’re very different. Since people have thought about Tusi couples since 450 AD, I doubt any of these proofs are original to the people I’m mentioning here! Still, they deserve credit.

The first, due to Omar Antolín Camarena, is in the style of traditional Euclidean geometry.

Let O be the center of the big circle. Let A be the position of the traced point at the instant t when it’s the point of tangency between the circles, let B be its position at some future time t′ and let C be the point of tangency at that same time t′. Let X be the center of the small circle at that future time t′.

We want to prove A, B and O lie on a line. To do this, it suffices to show that the angle AOC equals the angle BOC:

Want: ∠BOC = ∠AOC

The arc AC of the big circle has the same length as the arc BC of the small circle, since they are both the distance rolled between times t and t′. But the big circle is twice as, so the angle BXC on the little circle must be twice the angle AOC on the big circle:

Know: ∠BXC = 2 ∠AOC

But it’s a famous fact in Euclidean geometry that the angle BXC is twice the angle BOC:

Know: ∠BXC = 2 ∠BOC

From the two equations we know, the one we want follows!

Here’s the proof of that ‘famous fact’, in case you forgot it:

The second proof is due to Greg Egan. He distilled it down to a moving picture:

It’s a ‘proof without words’, so you may need to think a while to see how it shows that the Tusi couple traces out a straight line.

The third proof is due to Boris Borcic. It uses complex numbers. Let the big circle be the unit circle in the complex plane. Then the point of contact between the rolling circle and the big one is:

e^{i t}

so the center of the rolling circle is:

\displaystyle{ \frac{e^{it}}{2} }

Since the rolling circle turns around once clockwise as it rolls around the big one, the point whose motion we’re tracking here:

is equal to:

\displaystyle{ \frac{e^{it}}{2} + \frac{e^{i(\pi - t)}}{2} =  \frac{e^{i t} - e^{-i t}}{2}  = i \sin t =  i \; \mathrm{Im}(e^{it}) }

So, this point moves up and down along a vertical line, and its height equals the height of the point of contact, e^{it}.

But the same sort of argument shows that if we track the motion of the opposite point on the rolling circle, it
equals:

\displaystyle{ \frac{e^{it}}{2} - \frac{e^{i(\pi - t)}}{2} =  \frac{e^{i t} + e^{-i t}}{2}  =  \cos t =   \mathrm{Re}(e^{it}) }

So, this opposite point moves back and forth along a horizontal straight line… and its horizontal coordinate equals that of the point of contact!

You can see all this clearly in the animation Borcic made:

The point of contact, the tip of red arrowhead, is e^{it}. The two opposite points on the rolling circle are \cos t and i \sin t. So, the rectangle here illustrates the fact that

e^{it} = \cos t + i \sin t

It’s interesting that this famous formula is hiding in the math of the Tusi couple! But it shouldn’t be surprising, because the Tusi couple is all about building linear motion out of circular motion… or conversely, decomposing a circular motion into linear motions.

Credits

Few of these pictures were made by me, and none of the animations. For most, you can see who created them by clicking on them. The animation of the astroid and deltoid as envelopes come from here:

Envelope, Math Images Project.

where they were made available under a GNU Free Documentation License. Here’s another animation from there:

This is a way of creating a deltoid as the envelope of some lines called ‘Wallace-Simson lines’. The definition of these lines is so baroque I didn’t dare tell it to you it earlier. But if you’re so bored you’re actually reading these credits, you might enjoy this.

Any triangle can be circumscribed by a circle. If we take any point on this circle, say M, we can drop perpendicular
lines from it to the triangle’s three sides, and get three points, say P, Q and R:

Amazingly, these points lie on a line:

This is the Wallace–Simson line of M. If we move the point M around the circle, we get lots of Wallace–Simson lines… and the envelope of these lines is a deltoid!

By the way: the Wallace–Simson line is named after William Wallace, who wrote about it, and Robert Simson, who didn’t. Don’t confuse it with the Wallace line! That was discovered by Alfred Russel Wallace.


Symmetry and the Fourth Dimension (Part 7)

7 September, 2012

Lately I’ve been showing you what happens if you start with a Platonic solid and start chopping off its corners, more and more, until you get the dual Platonic solid. There’s just one I haven’t done yet: the tetrahedron.

This is the simplest Platonic solid, so why did I wait and do this it last?

Because the tetrahedron is self-dual. Remember, the Coxeter diagram of this shape looks like this:

V—3—E—3—F

And remember what this diagram means. It means the tetrahedron has

• 3 vertices and 3 edges touching each face,
• 3 edges and 3 faces touching each vertex.

When we take the dual of a solid, we

• replace vertices by faces;
• replace edges by edges;
• replace faces by vertices.

So when we do this to the tetrahedron, we get back a tetrahedron! This ‘self-duality’ is reflected in the symmetry of its Coxeter diagram. If we switch the letters V and F, we get the same thing back—drawn backwards, but that doesn’t matter.

This self-duality also means that when we take a tetrahedron and keep cutting off the corners more and more deeply, we wind up where we started. And in fact, when we reach the halfway point of this process, we start retracing our steps… going backwards!

Let’s see how it goes. Remember, we have a system of diagrams for drawing the most important solids we meet along the way.

Tetrahedron: •—3—o—3—o

We start with the tetrahedron:


Truncated tetrahedron: •—3—•—3—o

Then we get the truncated tetrahedron:


Octahedron: o—3—•—3—o

Then, halfway through, we get a shape we’ve seen before! It’s our friend the octahedron:


Like the other ‘halfway through’ shapes we’ve seen—the cuboctahedron and icosidodecahedron—every edge of the octahedron lies on a great circle’s worth of edges:

        

Puzzle 1. Why does it always work this way?

I don’t actually know!

Truncated tetrahedron: o—3—•—3—•

Then we get back to the truncated tetrahedron:


Tetrahedron: o—3—o—3—•

At the end, we get back where we started… the tetrahedron:


Where are we?

We’ve begun to explore the three great families of semiregular polyhedra:

• the tetrahedron family shown here,

• the cube/octahedron family shown in Part 5,

• and the dodecahedron/icosahedron family shown in Part 6.

We’ve seen that for each family, we have a Coxeter complex, which is summarized by a Coxeter diagram. By coloring the dots in this diagram either white or black, we get different polyhedra in our family.

Our goal is to explore how this works in 4 dimensions. It’s very similar, but much more rich! We can still use Coxeter diagrams, but they’ll have four dots, so there will be more ways to label them, and we’ll get more shapes. And, of course, we’ll have the fun of learning to visualize 4-dimensional shapes!

But before we can explore the 4d story, there’s a hole in the story so far, that I need to fill.

Puzzle 2. Can you guess what it is?

Maybe you’ll see it if you look over our results so far.

The tetrahedron family

Here are the shapes related to the tetrahedron. It has some repeats, because the tetrahedron is its own dual! It also repeats some shapes we’ll see in other families.

tetrahedron •—3—o—3—o
truncated tetrahedron •—3—•—3—o
octahedron o—3—•—3—o
truncated tetrahedron o—3—•—3—•
tetrahedron o—3—o—3—•

And here’s the Coxeter complex that runs the show:

This has one right triangle for each element in the group that acts as symmetries of all these shapes. This group has 24 elements, and it’s called the tetrahedral finite reflection group, or A3. So, we can also call this collection of polyhedra the A3 family.

The cube/octahedron family

Here are the shapes related to the cube and the octahedron:

cube •—4—o—3—o
truncated cube •—4—•—3—o
cuboctahedron o—4—•—3—o
truncated octahedron o—4—•—3—•
octahedron o—4—o—3—•

And here’s the Coxeter complex:

Again, this has one right triangle for each element in the group that acts as symmetries of all these shapes. This group has 48 elements, and it’s called the octahedral finite reflection group, or B3. So, we can call this collection of polyhedra the B3 family.

The dodecahedron/icosahedron family

And here are the shapes related to dodecahedron and icosahedron:

dodecahedron •—5—o—3—o
truncated dodecahedron •—5—•—3—o
icosidodecahedron o—5—•—3—o
truncated icosahedron o—5—•—3—•
icosahedron o—5—o—3—•

And here’s the Coxeter complex:

Yet again, this has one right triangle for each element in the group that acts as symmetries of all these shapes. This group has 120 elements, and it’s called the icosahedral finite reflection group, or H3. So, we can call this collection of polyhedra the H3 family.


Rolling Circles and Balls (Part 2)

3 September, 2012

Last time we rolled a circle on another circle the same size, and looked at the curve traced out by a point on the rolling circle:

It’s called a cardioid.

But suppose we roll a circle on another circle that’s twice as big. Then we get a nephroid:

Puzzle 1. How many times does the small circle rotate as it rolls all the way around the big one here?

By the way, the name ‘cardioid’ comes from the Greek word for ‘heart’. The name ‘nephroid’ comes from the Greek word for a less noble organ: the kidney! But the Greeks didn’t talk about cardioids or nephroids—these names were invented in modern times.

Here are my 7 favorite ways to get a nephroid:

1) The way just described: roll a circle on a circle twice as big, and track the path of a point.

2) Alternatively, take a circle that’s one and a half times big as another, fit it around that smaller one, roll it around, and let one of its points trace out a curve. Again you get a nephroid!

3) Take a semicircle, point it upwards, shine parallel rays of light straight down at it, and let those rays reflect off it. The envelope of the reflected rays will be half of a nephroid:

This was discovered by Huygens in 1678, in his work on light. He was really big on the study of curves.

As I mentioned last time, a catacaustic is a curve formed as the envelope of rays emanating from a specified point and reflecting off a given curve. We can stretch the rules a bit and let that point be a ‘point at infinity’. Then the rays will be parallel. So, we’re seeing that the nephroid is a catacaustic of a circle.

Last time we saw the cardioid is also a catacaustic of a circle, but with light emanating from a point on the circle. It’s neat that the cardioid and nephroid both show up as catacaustics of the circle. But it’s just the beginning of the fun…

4) The nephroid is the catacaustic of the cardioid, if we let the light emanate from the cardioid’s cusp!

This was discovered by Jacques Bernoulli in 1692.

5) Let two points move around a circle, starting at the same place, but with one moving 3 times as fast as the other. At each moment connect them with a line. The envelope of these lines is a nephroid!

Last time we saw that if we replace the number 3 by 2 here, we get a cardioid. So, this is yet another way these two curves are related!

6) Draw a circle in blue and draw its diameter as a vertical line. Then draw all the circles that have their center somewhere on that blue circle, and are tangent to that vertical line. You get a nephroid:

The red circle here has the red dot as its center, and it’s tangent to a point on the vertical line. Here’s a nice animation of the process, made available by the Math Images Project under a GNU Free Documentation License:

7) Finally, here’s how to draw a nephroid starting with a nephroid! Draw all the osculating circles of the nephroid—that is, circles that match the nephroid’s curvature as well as its slope at the points they touch. The centers of these circles give another nephroid:

This trick is an example of an ‘evolute’. The evolute of a curve is the set of centers of the osculating circles of that curve. Last time we saw the evolute of a cardioid is another cardioid. Now we’re seeing the nephroid shares this property!

Apparently the same is true for all curves formed by rolling one circle on another. These curves are called epicycloids. In a sense, these are the mathematical leftovers of the theory of epicycles in astronomy.

It would be nice if some of the funny relations we’ve been seeing between the cardioid and the nephroid generalize to relations between the epicycloid with k cusps and the one with k+1 cusps. But I don’t know if that’s true.

It would also be nice if the epicycloids with more and more cusps were named after increasingly disgusting organs of the body. But in fact, I don’t know any special names for them once we reach k = 3.

Puzzle 2. Use one of the 7 constructions above to get an equation for the nephroid. What is the simplest equation you can find for this curve?

Next time

Most of the pictures above are from Wikicommons, but the picture of the nephroid as a catacaustic of the cardioid is from Xah Lee’s wonderful website on plane curves. As usual, you can click on the pictures and get more informatino.

My ultimate goal is to tell you some amazing things about what happens when you roll one ball on another that’s exactly 3 times as big. These things have nothing to do with plane curves, actually. But I’ve been taking many detours, and next time I’ll talk about some curves formed by rolling one circle inside another!

Right now, thought I need a break. I need to stop thinking about all these curves. I think I’ll get a cup of coffee.


Rolling Circles and Balls (Part 1)

31 August, 2012

For over a decade I’ve been struggling with certain math puzzle, first with the help of James Dolan and later also with John Huerta. It’s about something amazing that happens when you roll a ball on another ball that’s exactly 3 times as big. John Huerta and I just finished a paper about it, and I’d like to explain that here.

But I’d like to ease into it slowly, so I’ll start by talking about what happens when you roll a circle on another circle that’s the exact same size:


Can you see how the rolling circle rotates twice as it rolls around the fixed circle once? Do you understand why?

The heart-shaped curve traced out by any point on the rolling circle is called a cardioid. In her latest video Vi Hart pretends to complain about parabolas while actually telling us quite a lot about them, and much else too:

Naturally, with her last name, she prefers the cardioid. She describes various ways to draw this curve: for example, by turning the hated parabola inside out. Here are my 6 favorite ways:

1) The one we’ve seen already: roll a circle on another circle the same size, and track the motion of a point on the rolling circle:


2) Take a parabola and ‘turn it inside out’, replacing each point with polar coordinates (r, θ) by a point with coordinates (1/r, θ). As long as your parabola doesn’t contain the origin, you get a cardioid:


This ‘turning inside out’ trick is called conformal inversion.

3) Draw all circles whose centers are points on a fixed circle, and which contain a specified point on that circle:

Here’s a nice animation of this process, made available by the Math Images Project under a GNU Free Documentation License:

4) Let light rays emanate from one point on a circle and reflect off all other points on that circle. Draw all these reflected rays, and you’ll see a cardioid:

If you draw all light rays that reflect off some curve, the curve they snuggle up against (their so-called envelope) is called a catacaustic.

5) Draw 36 equally spaced points on a circle numbered 0 to 35, and draw a line between each point n and the point 2n modulo 36. You’ll see a cardioid, approximately:

But there’s nothing special about the number 36. If you take more evenly spaced points, you get a better approximation to a cardioid. You get a perfect cardioid if you connect each point (1, θ) to the point (1, 2θ) with a line, and take the envelope of these lines.

6) Finally, here’s how to draw a cardioid starting with a cardioid! Draw all the osculating circles of the cardioid—that is, circles match the cardioid’s curvature as well as its slope at the points they touch. The centers of these circles give another cardioid:

This picture has some distracting lines on it; just look at the big and the little cardioid, and the circles. This trick is an example of an ‘evolute’. The evolute of a curve is the set of centers of the osculating circles of that curve.

All the pictures above are from Xah Lee’s wonderful website or the Wikipedia article on cardioids. Click on the picture to see where it came from and get more information.

I ❤ cardioids!

Next time we’ll see what happens when we roll a circle inside a circle that’s exactly twice as big.

Constructions on curves

We’ve seen a few constructions on curves:

• A roulette is the curve traced out by a point attached to a given curve as it rolls without slipping along a second curve.

We rolled a circle on a circle and got a cardioid, but you could roll a parabola on another parabola:

This gives a curve called the cissoid of Diocles, which in some coordinate system (not the one shown) is given by this cubic equation:

(x^2+y^2)x=2ay^2

• A catacaustic is the envelope of rays emanating from a specified point (perhaps a point at infinite distance, which produces parallel rays) and reflecting off a given curve.

We’ve obtained the cardioid as a catacaustic of the circle. Supposedly if you take the cissoid of Diocles and form its catacaustic using rays emanating from its ‘focus’, you get a cardioid! This would be a seventh way to get a cardioid, but I don’t understand it, even though it’s described on Wolfram Mathworld. I don’t even know what the ‘focus’ of a cissoid of Diocles is. Can you help?

• The evolute of a curve is the curve formed by the centers of its tangent circles.

We’ve seen that the cardioid is its own evolute. The evolute of an ellipse looks like this:

It’s called an astroid, and it’s given by an equation of this form:

a x^{2/3} + b y^{2/3} = 1

If you take the tangent circles of the black points on the ellipse above, their centers are the sharp pointy ‘cusps’ of the astroid.

Order from chaos?

Some other famous ways to construct new curves from old ones include the involute, the isoptic, and the pedal. I could describe them… but I won’t. You get the picture: there’s a zoo of curves and constructions on curves, and lots of relations between these constructions. It’s all very beautiful, but also a bit of a mess.

It seems that all these constructions, and their relations, should be studied more systematically in algebraic geometry. It may seem like a somewhat musty and old-fashioned branch of algebraic geometry, but surely there’s a way to make it new and fresh using modern math. Has someone done this?


An Entropy Challenge

29 August, 2012

If you like computer calculations, here’s a little challenge for you. Oscar Dahlsten may have solved it, but we’d love for you to check his work. It’s pretty important for the foundations of thermodynamics, but you don’t need to know any physics or even anything beyond a little algebra to tackle it! First I’ll explain it in really simple terms, then I’ll remind you a bit of why it matters.

We’re looking for two lists of nonnegative numbers, of the same length, listed in decreasing order:

p_1 \ge p_2 \ge \cdots \ge p_n \ge 0

q_1 \ge q_2 \ge \cdots \ge q_n \ge 0

that sum to 1:

p_1 + \cdots + p_n = 1

q_1 + \cdots + q_n = 1

and that obey this inequality:

\displaystyle{ \frac{1}{1 - \beta} \ln \sum_{i=1}^n p_i^\beta  \le  \frac{1}{1 - \beta} \ln \sum_{i=1}^n q_i^\beta }

for all 0 < \beta < \infty (ignoring \beta = 1), yet do not obey these inequalities:

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n.

Oscar’s proposed solution is this:

p = (0.4, 0.29, 0.29, 0.02)

q = (0.39, 0.31, 0.2, 0.1)

Can you see if this works? Is there a simpler example, like one with lists of just 3 numbers?

This question came up near the end of my post More Second Laws of Thermodynamics. I phrased the question with a bit more jargon, and said a lot more about its significance. Suppose we have two probability distributions on a finite set, say p and q. We say p majorizes q if

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n, when we write both lists of numbers in decreasing order. This means p is ‘less flat’ than q, so it should have less entropy. And indeed it does: not just for ordinary entropy, but also for Rényi entropy! The Rényi entropy of p is defined by

\displaystyle{ H_\beta(p) = \frac{1}{1 - \beta} \ln \sum_{i=1}^n p_i^\beta  }

where 0 < \beta < 1 or 1 < \beta < \infty. We can also define Rényi entropy for \beta = 0, 1, \infty by taking a limit, and at \beta = 1 we get the ordinary entropy

\displaystyle{ H_1(p) = - \sum_{i = 1}^n p_i \ln (p_i) }

The question is whether majorization is more powerful than Rényi entropy as a tool to to tell when one probability distribution is less flat than another. I know that if p majorizes q, its Rényi entropy is less than than that of q for all 0 \le \beta \le \infty. Your mission, should you choose to accept it, is to show the converse is not true.


Network Theory (Part 24)

23 August, 2012

Now we’ve reached the climax of our story so far: we’re ready to prove the deficiency zero theorem. First let’s talk about it informally a bit. Then we’ll review the notation, and then—hang on to your seat!—we’ll give the proof.

The crucial trick is to relate a bunch of chemical reactions, described by a ‘reaction network’ like this:

to a simpler problem where a system randomly hops between states arranged in the same pattern:

This is sort of amazing, because we’ve thrown out lots of detail. It’s also amazing because this simpler problem is linear. In the original problem, the chance that a reaction turns a B + E into a D is proportional to the number of B’s times the number of E’s. That’s nonlinear! But in the simplified problem, the chance that your system hops from state 4 to state 3 is just proportional to the probability that it’s in state 4 to begin with. That’s linear.

The wonderful thing is that, at least under some conditions, we can find equilibrium solutions of our original problem starting from equilibrium solutions of the simpler problem.

Let’s roughly sketch how it works, and where we are so far. Our simplified problem is described by an equation like this:

\displaystyle{ \frac{d}{d t} \psi = H \psi }

where \psi is a function that the probability of being in each state, and H describes the probability per time of hopping from one state to another. We can easily understand quite a lot about the equilibrium solutions, where \psi doesn’t change at all:

H \psi = 0

because this is a linear equation. We did this in Part 23. Of course, when I say ‘easily’, that’s a relative thing: we needed to use the Perron–Frobenius theorem, which Jacob introduced in Part 20. But that’s a well-known theorem in linear algebra, and it’s easy to apply here.

In Part 22, we saw that the original problem was described by an equation like this, called the ‘rate equation’:

\displaystyle{ \frac{d x}{d t} = Y H x^Y  }

Here x is a vector whose entries describe the amount of each kind of chemical: the amount of A’s, the amount of B’s, and so on. The matrix H is the same as in the simplified problem, but Y is a matrix that says how many times each chemical shows up in each spot in our reaction network:

The key thing to notice is x^Y, where we take a vector and raise it to the power of a matrix. We explained this operation back in Part 22. It’s this operation that says how many B + E pairs we have, for example, given the number of B’s and the number of E’s. It’s this that makes the rate equation nonlinear.

Now, we’re looking for equilibrium solutions of the rate equation, where the rate of change is zero:

Y H x^Y = 0

But in fact we’ll do even better! We’ll find solutions of this:

H x^Y = 0

And we’ll get these by taking our solutions of this:

H \psi = 0

and adjusting them so that

\psi = x^Y

while \psi remains a solution of H \psi = 0.

But: how do we do this ‘adjusting’? That’s the crux of the whole business! That’s what we’ll do today.

Remember, \psi is a function that gives a probability for each ‘state’, or numbered box here:

The picture here consists of two pieces, called ‘connected components’: the piece containing boxes 0 and 1, and the piece containing boxes 2, 3 and 4. It turns out that we can multiply \psi by a function that’s constant on each connected component, and if we had H \psi = 0 to begin with, that will still be true afterward. The reason is that there’s no way for \psi to ‘leak across’ from one component to another. It’s like having water in separate buckets. You can increase the amount of water in one bucket, and decrease it another, and as long as the water’s surface remains flat in each bucket, the whole situation remains in equilibrium.

That’s sort of obvious. What’s not obvious is that we can adjust \psi this way so as to ensure

\psi = x^Y

for some x.

And indeed, it’s not always true! It’s only true if our reaction network obeys a special condition. It needs to have ‘deficiency zero’. We defined this concept back in Part 21, but now we’ll finally use it for something. It turns out to be precisely the right condition to guarantee we can tweak any function on our set of states, multiplying it by a function that’s constant on each connected component, and get a new function \psi with

\psi = x^Y

When all is said and done, that is the key to the deficiency zero theorem.

Review

The battle is almost upon us—we’ve got one last chance to review our notation. We start with a stochastic reaction network:

This consists of:

• finite sets of transitions T, complexes K and species S,

• a map r: T \to (0,\infty) giving a rate constant for each transition,

source and target maps s,t : T \to K saying where each transition starts and ends,

• a one-to-one map Y : K \to \mathbb{N}^S saying how each complex is made of species.

Then we extend s, t and Y to linear maps:

Then we put inner products on these vector spaces as described last time, which lets us ‘turn around’ the maps s and t by taking their adjoints:

s^\dagger, t^\dagger : \mathbb{R}^K \to \mathbb{R}^T

More surprisingly, we can ‘turn around’ Y and get a nonlinear map using ‘matrix exponentiation’:

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   x^Y \end{array}

This is most easily understood by thinking of x as a row vector and Y as a matrix:

Remember, complexes are made out of species. The matrix entry Y_{i j} says how many things of the jth species there are in a complex of the ith kind. If \psi \in \mathbb{R}^K says how many complexes there are of each kind, Y \psi \in \mathbb{R}^S says how many things there are of each species. Conversely, if x \in \mathbb{R}^S says how many things there are of each species, x^Y \in \mathbb{R}^K says how many ways we can build each kind of complex from them.

So, we get these maps:

Next, the boundary operator

\partial : \mathbb{R}^T \to \mathbb{R}^K

describes how each transition causes a change in complexes:

\partial = t - s

As we saw last time, there is a Hamiltonian

H : \mathbb{R}^K \to \mathbb{R}^K

describing a Markov processes on the set of complexes, given by

H = \partial s^\dagger

But the star of the show is the rate equation. This describes how the number of things of each species changes with time. We write these numbers in a list and get a vector x \in \mathbb{R}^S with nonnegative components. The rate equation says:

\displaystyle{ \frac{d x}{d t} = Y H x^Y }

We can read this as follows:

x says how many things of each species we have now.

x^Y says how many complexes of each kind we can build from these species.

s^\dagger x^Y says how many transitions of each kind can originate starting from these complexes, with each transition weighted by its rate.

H x^Y = \partial s^\dagger x^Y is the rate of change of the number of complexes of each kind, due to these transitions.

Y H x^Y is the rate of change of the number of things of each species.

The zero deficiency theorem

We are looking for equilibrium solutions of the rate equation, where the number of things of each species doesn’t change at all:

Y H x^Y = 0

In fact we will find complex balanced equilibrium solutions, where even the number of complexes of each kind doesn’t change:

H x^Y = 0

More precisely, we have:

Deficiency Zero Theorem (Child’s Version). Suppose we have a reaction network obeying these two conditions:

1. It is weakly reversible, meaning that whenever there’s a transition from one complex \kappa to another \kappa', there’s a directed path of transitions going back from \kappa' to \kappa.

2. It has deficiency zero, meaning \mathrm{im} \partial  \cap \mathrm{ker} Y = \{ 0 \} .

Then for any choice of rate constants there exists a complex balanced equilibrium solution of the rate equation where all species are present in nonzero amounts. In other words, there exists x \in \mathbb{R}^S with all components positive and such that:

H x^Y = 0

Proof. Because our reaction network is weakly reversible, the theorems in Part 23 show there exists \psi \in (0,\infty)^K with

H \psi = 0

This \psi may not be of the form x^Y, but we shall adjust \psi so that it becomes of this form, while still remaining a solution of H \psi = 0 latex . To do this, we need a couple of lemmas:

Lemma 1. \mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K.

Proof. We need to use a few facts from linear algebra. If V is a finite-dimensional vector space with inner product, the orthogonal complement L^\perp of a subspace L \subseteq V consists of vectors that are orthogonal to everything in L:

L^\perp = \{ v \in V : \quad \forall w \in L \quad \langle v, w \rangle = 0 \}

We have

(L \cap M)^\perp = L^\perp + M^\perp

where L and M are subspaces of V and + denotes the sum of two subspaces: that is, the smallest subspace containing both. Also, if T: V \to W is a linear map between finite-dimensional vector spaces with inner product, we have

(\mathrm{ker} T)^\perp = \mathrm{im} T^\dagger

and

(\mathrm{im} T)^\perp = \mathrm{ker} T^\dagger

Now, because our reaction network has deficiency zero, we know that

\mathrm{im} \partial \cap \mathrm{ker} Y = \{ 0 \}

Taking the orthogonal complement of both sides, we get

(\mathrm{im} \partial \cap \mathrm{ker} Y)^\perp = \mathbb{R}^K

and using the rules we mentioned, we obtain

\mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger = \mathbb{R}^K

as desired.   █

Now, given a vector \phi in \mathbb{R}^K or \mathbb{R}^S with all positive components, we can define the logarithm of such a vector, component-wise:

(\ln \phi)_i = \ln (\phi_i)

Similarly, for any vector \phi in either of these spaces, we can define its exponential in a component-wise way:

(\exp \phi)_i = \exp(\phi_i)

These operations are inverse to each other. Moreover:

Lemma 2. The nonlinear operator

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   x^Y \end{array}

is related to the linear operator

\begin{array}{ccc} \mathbb{R}^S &\to& \mathbb{R}^K \\                               x     &\mapsto&   Y^\dagger x \end{array}

by the formula

x^Y = \exp(Y^\dagger \ln x )

which holds for all x \in (0,\infty)^S.

Proof. A straightforward calculation. By the way, this formula would look a bit nicer if we treated \ln x as a row vector and multiplied it on the right by Y: then we would have

x^Y = \exp((\ln x) Y)

The problem is that we are following the usual convention of multiplying vectors by matrices on the left, yet writing the matrix on the right in x^Y. Taking the transpose Y^\dagger of the matrix Y serves to compensate for this.   █

Now, given our vector \psi \in (0,\infty)^K with H \psi = 0, we can take its logarithm and get \ln \psi \in \mathbb{R}^K. Lemma 1 says that

\mathbb{R}^K = \mathrm{ker} \partial^\dagger + \mathrm{im} Y^\dagger

so we can write

\ln \psi =  \alpha + Y^\dagger \beta

where \alpha \in \mathrm{ker} \partial^\dagger and \beta \in \mathbb{R}^S. Moreover, we can write

\beta = \ln x

for some x \in (0,\infty)^S, so that

\ln \psi = \alpha + Y^\dagger (\ln x)

Exponentiating both sides component-wise, we get

\psi  =   \exp(\alpha) \; \exp(Y^\dagger (\ln x))

where at right we are taking the component-wise product of vectors. Thanks to Lemma 2, we conclude that

\psi = \exp(\alpha) x^Y

So, we have taken \psi and almost written it in the form x^Y—but not quite! We can adjust \psi to make it be of this form:

\exp(-\alpha) \psi = x^Y

Clearly all the components of \exp(-\alpha) \psi are positive, since the same is true for both \psi and \exp(-\alpha). So, the only remaining task is to check that

H(\exp(-\alpha) \psi) = 0

We do this using two lemmas:

Lemma 3. If H \psi = 0 and \alpha \in \mathrm{ker} \partial^\dagger, then H(\exp(-\alpha) \psi) = 0.

Proof. It is enough to check that multiplication by \exp(-\alpha) commutes with the Hamiltonian H, since then

H(\exp(-\alpha) \psi) = \exp(-\alpha) H \psi = 0

Recall from Part 23 that H is the Hamiltonian of a Markov process associated to this ‘graph with rates’:

As noted here:

• John Baez and Brendan Fong, A Noether theorem for Markov processes.

multiplication by some function on K commutes with H if and only if that function is constant on each connected component of this graph. Such functions are called conserved quantities.

So, it suffices to show that \exp(-\alpha) is constant on each connected component. For this, it is enough to show that \alpha itself is constant on each connected component. But this will follow from the next lemma, since \alpha \in \mathrm{ker} \partial^\dagger.   █

Lemma 4. A function \alpha \in \mathbb{R}^K is a conserved quantity iff \partial^\dagger \alpha = 0 . In other words, \alpha is constant on each connected component of the graph s, t: T \to K iff \partial^\dagger \alpha = 0 .

Proof. Suppose \partial^\dagger \alpha = 0, or in other words, \alpha \in \mathrm{ker} \partial^\dagger, or in still other words, \alpha \in (\mathrm{im} \partial)^\perp. To show that \alpha is constant on each connected component, it suffices to show that whenever we have two complexes connected by a transition, like this:

\tau: \kappa \to \kappa'

then \alpha takes the same value at both these complexes:

\alpha_\kappa = \alpha_{\kappa'}

To see this, note

\partial \tau = t(\tau) - s(\tau) = \kappa' - \kappa

and since \alpha \in (\mathrm{im} \partial)^\perp, we conclude

\langle \alpha, \kappa' - \kappa \rangle = 0

But calculating this inner product, we see

\alpha_{\kappa'} - \alpha_{\kappa} = 0

as desired.

For the converse, we simply turn the argument around: if \alpha is constant on each connected component, we see \langle \alpha, \kappa' - \kappa \rangle = 0 whenever there is a transition \tau : \kappa \to \kappa'. It follows that \langle \alpha, \partial \tau \rangle = 0 for every transition \tau, so \alpha \in (\mathrm{im} \partial)^\perp .

And thus concludes the proof of the lemma!   █

And thus concludes the proof of the theorem!   █

And thus concludes this post!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers