Time Crystals

26 September, 2012

When water freezes and forms a crystal, it creates a periodic pattern in space. Could there be something that crystallizes to form a periodic pattern in time? Frank Wilczek, who won the Nobel Prize for helping explain why quarks and gluons trapped inside a proton or neutron act like freely moving particles when you examine them very close up, dreamt up this idea and called it a time crystal:

• Frank Wilczek, Classical time crystals.

• Frank Wilczek, Quantum time crystals.

‘Time crystals’ sound like something from Greg Egan’s Orthogonal trilogy, set in a universe where there’s no fundamental distinction between time and space. But Wilczek wanted to realize these in our universe.

Of course, it’s easy to make a system that behaves in an approximately periodic way while it slowly runs down: that’s how a clock works: tick tock, tick tock, tick tock… But a system that keeps ‘ticking away’ without using up any resource or running down would be a strange new thing. There’s no telling what weird stuff we might do with it.

It’s also interesting because physicists love symmetry. In ordinary physics there are two very important symmetries: spatial translation symmetry, and time translation symmetry. Spatial translation symmetry says that if you move an experiment any amount to the left or right, it works the same way. Time translation symmetry says that if you do an experiment any amount of time earlier or later, it works the same way.

Crystals are fascinating because they ‘spontaneously break’ spatial translation symmetry. Take a liquid, cool it until it freezes, and it forms a crystal which does not look the same if you move it any amount to the right or left. It only looks the same if you move it certain discrete steps to the right or left!

The idea of a ‘time crystal’ is that it’s a system that spontaneously breaks time translation symmetry.

Given how much physicists have studied time translation symmetry and spontaneous symmetry breaking, it’s sort of shocking that nobody before 2012 wrote about this possibility. Or maybe someone did—but I haven’t heard about it.

It takes real creativity to think of an idea so radical yet so simple. But Wilczek is famously creative. For example, he came up with anyons: a new kind of particle, neither boson nor fermion, that lives in a universe where space is 2-dimensional. And now we can make those in the lab.

Unfortunately, Wilczek didn’t know how to make a time crystal. But now a team including Xiang Zhang (seated) and Tongcang Li (standing) at U.C. Berkeley have a plan for how to do it.

Actually they propose a ring-shaped system that’s periodic in time and also in space, as shown in the picture. They call it a space-time crystal:

Here we propose a space-time crystal of trapped ions and a method to realize it experimentally by confining ions in a ring-shaped trapping potential with a static magnetic field. The ions spontaneously form a spatial ring crystal due to Coulomb repulsion. This ion crystal can rotate persistently at the lowest quantum energy state in magnetic fields with fractional fluxes. The persistent rotation of trapped ions produces the temporal order, leading to the formation of a space-time crystal. We show that these space-time crystals are robust for direct experimental observation. The proposed space-time crystals of trapped ions provide a new dimension for exploring many-body physics and emerging properties of matter.

The new paper is here:

• Tongcang Li, Zhe-Xuan Gong, Zhang-Qi Yin, H. T. Quan, Xiaobo Yin, Peng Zhang, L.-M. Duan and Xiang Zhang, Space-time crystals of trapped ions.

Alas, the press release put out by Lawrence Berkeley National Laboratory is very misleading. It describes the space-time crystal as a clock that

will theoretically persist even after the rest of our universe reaches entropy, thermodynamic equilibrium or “heat-death”.

NO!

First of all, ‘reaching entropy’ doesn’t mean anything. More importantly, as time goes by and things fall apart, this space-time crystal, assuming anyone can actually make it, will also fall apart.

I know what the person talking to the reporter was trying to say: the cool thing about this setup is that it gives a system that’s truly time-periodic, not gradually using up some resource and running down like an ordinary clock. But nonphysicist readers, seeing an article entitled ‘A Clock that Will Last Forever’, may be fooled into thinking this setup is, umm, a clock that will last forever. It’s not.

If this setup were the whole universe, it might keep ticking away forever. But in fact it’s just a small, carefully crafted portion of our universe, and it interacts with the rest of our universe, so it will gradually fall apart when everything else does… or in fact much sooner: as soon as the scientists running it turn off the experiment.

Classifying space-time crystals

What could we do with space-time crystals? It’s way too early to tell, at least for me. But since I’m a mathematician, I’d be happy to classify them. Over on Google+, William Rutiser asked if there are 4d analogs of the 3d crystallographic groups. And the answer is yes! Mathematicians with too much time on their hands have classified the analogues of crystallographic groups in 4 dimensions:

Space group: classification in small dimensions, Wikipedia.

In general these groups are called space groups (see the article for the definition). In 1 dimension there are just two, namely the symmetry groups of this:

— o — o — o — o — o — o —

and this:

— > — > — > — > — > — > —

In 2 dimensions there are 17 and they’re called wallpaper groups. In 3 dimensions there are 230 and they are called crystallographic groups. In 4 dimensions there are 4894, in 5 dimensions there are… hey, Wikipedia leaves this space blank in their table!… and in 6 dimensions there are 28,934,974.

So, there is in principle quite a large subject to study here, if people can figure out how to build a variety of space-time crystals.

There’s already book on this, if you’re interested:

• Harold Brown, Rolf Bulow, Joachim Neubuser, Hans Wondratschek and Hans Zassenhaus, Crystallographic Groups of Four-Dimensional Space, Wiley Monographs in Crystallography, 1978.


A Course on Quantum Techniques for Stochastic Mechanics

18 September, 2012

Jacob Biamonte and I have come out with a draft of a book!

A course on quantum techniques for stochastic mechanics.

It’s based on the first 24 network theory posts on this blog. It owes a lot to everyone here, and the acknowledgements just scratch the surface of that indebtedness. At some later time I’d like to go through the posts and find the top twenty people who need to be thanked. But I’m leaving Singapore on Friday, going back to California to teach at U.C. Riverside, so I’ve been rushing to get something out before then.

If you see typos or other problems, please let us know!
We’ve reorganized the original blog articles and polished them up a bit, but we plan to do more before publishing these notes as a book.

I’m looking forward to teaching a seminar called Mathematics of the Environment when I get back to U.C. Riverside, and with luck I’ll put some notes from that on the blog here. I will also be trying to round up a team of grad students to work on network theory.

The next big topics in the network theory series will be electrical circuits and Bayesian networks. I’m beginning to see how these fit together with stochastic Petri nets in a unified framework, but I’ll need to talk and write about it to fill in all the details.

You can get a sense of what this course is about by reading this:

Foreword

This course is about a curious relation between two ways of describing situations that change randomly with the passage of time. The old way is probability theory and the new way is quantum theory

Quantum theory is based, not on probabilities, but on amplitudes. We can use amplitudes to compute probabilities. However, the relation between them is nonlinear: we take the absolute value of an amplitude and square it to get a probability. It thus seems odd to treat amplitudes as directly analogous to probabilities. Nonetheless, if we do this, some good things happen. In particular, we can take techniques devised in quantum theory and apply them to probability theory. This gives new insights into old problems.

There is, in fact, a subject eager to be born, which is mathematically very much like quantum mechanics, but which features probabilities in the same equations where quantum mechanics features amplitudes. We call this subject stochastic mechanics

Plan of the course

In Section 1 we introduce the basic object of study here: a ‘stochastic Petri net’. A stochastic Petri net describes in a very general way how collections of things of different kinds can randomly interact and turn into other things. If we consider large numbers of things, we obtain a simplified deterministic model called the ‘rate equation’, discussed in Section 2. More fundamental, however, is the ‘master equation’, introduced in Section 3. This describes how the probability of having various numbers of things of various kinds changes with time.

In Section 4 we consider a very simple stochastic Petri net and notice that in this case, we can solve the master equation using techniques taken from quantum mechanics. In Section 5 we sketch how to generalize this: for any stochastic Petri net, we can write down an operator called a ‘Hamiltonian’ built from ‘creation and annihilation operators’, which describes the rate of change of the probability of having various numbers of things. In Section 6 we illustrate this with an example taken from population biology. In this example the rate equation is just the logistic equation, one of the simplest models in population biology. The master equation describes reproduction and competition of organisms in a stochastic way.

In Section 7 we sketch how time evolution as described by the master equation can be written as a sum over Feynman diagrams. We do not develop this in detail, but illustrate it with a predator–prey model from population biology. In the process, we give a slicker way of writing down the Hamiltonian for any stochastic Petri net.

In Section 8 we enter into a main theme of this course: the study of equilibrium solutions of the master and rate equations. We present the Anderson–Craciun–Kurtz theorem, which shows how to get equilibrium solutions of the master equation from equilibrium solutions of the rate equation, at least if a certain technical condition holds. Brendan Fong has translated Anderson, Craciun and Kurtz’s original proof into the language of annihilation and creation operators, and we give Fong’s proof here. In this language, it turns out that the equilibrium solutions are mathematically just like ‘coherent states’ in quantum mechanics.

In Section 9 we give an example of the Anderson–Craciun–Kurtz theorem coming from a simple reversible reaction in chemistry. This example leads to a puzzle that is resolved by discovering that the presence of ‘conserved quantities’—quantities that do not change with time—let us construct many equilibrium solutions of the rate equation other than those given by the Anderson–Craciun–Kurtz theorem.

Conserved quantities are very important in quantum mechanics, and they are related to symmetries by a result called Noether’s theorem. In Section 10 we describe a version of Noether’s theorem for stochastic mechanics, which we proved with the help of Brendan Fong. This applies, not just to systems described by stochastic Petri nets, but a much more general class of processes called ‘Markov processes’. In the analogy to quantum mechanics, Markov processes are analogous to arbitrary quantum systems whose time evolution is given by a Hamiltonian. Stochastic Petri nets are analogous to a special case of these: the case where the Hamiltonian is built from annihilation and creation operators. In Section 11 we state the analogy between quantum mechanics and stochastic mechanics more precisely, and with more attention to mathematical rigor. This allows us to set the quantum and stochastic versions of Noether’s theorem side by side and compare them in Section 12.

In Section 13 we take a break from the heavy abstractions and look at a fun example from chemistry, in which a highly symmetrical molecule randomly hops between states. These states can be seen as vertices of a graph, with the transitions as edges. In this particular example we get a famous graph with 20 vertices and 30 edges, called the ‘Desargues graph’.

In Section 14 we note that the Hamiltonian in this example is a ‘graph Laplacian’, and, following a computation done by Greg Egan, we work out the eigenvectors and eigenvalues of this Hamiltonian explicitly. One reason graph Laplacians are interesting is that we can use them as Hamiltonians to describe time evolution in both stochastic and quantum mechanics. Operators with this special property are called ‘Dirichlet operators’, and we discuss them in Section 15. As we explain, they also describe electrical circuits made of resistors. Thus, in a peculiar way, the intersection of quantum mechanics and stochastic mechanics is the study of electrical circuits made of resistors!

In Section 16, we study the eigenvectors and eigenvalues of an arbitrary Dirichlet operator. We introduce a famous result called the Perron–Frobenius theorem for this purpose. However, we also see that the Perron–Frobenius theorem is important for understanding the equilibria of Markov processes. This becomes important later when we prove the ‘deficiency zero theorem’.

We introduce the deficiency zero theorem in Section 17. This result, proved by the chemists Feinberg, Horn and Jackson, gives equilibrium solutions for the rate equation for a large class of stochastic Petri nets. Moreover, these equilibria obey the extra condition that lets us apply the Anderson–Craciun–Kurtz theorem and obtain equlibrium solutions of the master equations as well. However, the deficiency zero theorem is best stated, not in terms of stochastic Petri nets, but in terms of another, equivalent, formalism: ‘chemical reaction networks’. So, we explain chemical reaction networks here, and use them heavily throughout the rest of the course. However, because they are applicable to such a large range of problems, we call them simply ‘reaction networks’. Like stochastic Petri nets, they describe how collections of things of different kinds randomly interact and turn into other things.

In Section 18 we consider a simple example of the deficiency zero theorem taken from chemistry: a diatomic gas. In Section 19 we apply the Anderson–Craciun–Kurtz theorem to the same example.

In Section 20 we begin the final phase of the course: proving the deficiency zero theorem, or at least a portion of it. In this section we discuss the concept of ‘deficiency’, which had been introduced before, but not really explained: the definition that makes the deficiency easy to compute is not the one that says what this concept really means. In Section 21 we show how to rewrite the rate equation of a stochastic Petri net—or equivalently, of a reaction network—in terms of a Markov process. This is surprising because the rate equation is nonlinear, while the equation describing a Markov process is linear in the probabilities involved. The trick is to use a nonlinear operation called ‘matrix exponentiation’. In Section 22 we study equilibria for Markov processes. Then, finally, in Section 23, we use these equilbria to obtain equilibrium solutions of the rate equation, completing our treatment of the deficiency zero theorem.


More Second Laws of Thermodynamics

24 August, 2012

Oscar Dahlsten is visiting the Centre for Quantum Technologies, so we’re continuing some conversations about entropy that we started last year, back when the Entropy Club was active. But now Jamie Vicary and Brendan Fong are involved in the conversations.

I was surprised when Oscar told me that for a large class of random processes, the usual second law of thermodynamics is just one of infinitely many laws saying that various kinds of disorder increase. I’m annoyed that nobody ever told me about this before! It’s as if they told me about conservation of energy but not conservation of schmenergy, and phlenergy, and zenergy

So I need to tell you about this. You may not understand it, but at least I can say I tried. I don’t want you blaming me for concealing all these extra second laws of thermodynamics!

Here’s the basic idea. Not all random processes are guaranteed to make entropy increase. But a bunch of them always make probability distributions flatter in a certain precise sense. This makes the entropy of the probability distribution increase. But when you make a probability distribution flatter in this sense, a bunch of other quantities increase too! For example, besides the usual entropy, there are infinitely many other kinds of entropy, called ‘Rényi entropies’, one for each number between 0 and ∞. And a doubly stochastic operator makes all the Rényi entropies increase! This fact is a special case of Theorem 10 here:

• Tim van Erven and Peter Harremoës, Rényi divergence and majorization.

Let me state this fact precisely, and then say a word about how this is related to quantum theory and ‘the collapse of the wavefunction’.

To keep things simple let’s talk about probability distributions on a finite set, though Erven and Harremoës generalize it all to a measure space.

How do we make precise the concept that one probability distribution is flatter than another? You know it when you see it, at least some of the time. For example, suppose I have some system in thermal equilibrium at some temperature, and the probabilities of it being in various states look like this:

Then say I triple the temperature. The probabilities flatten out:

But how can we make this concept precise in a completely general way? We can do it using the concept of ‘majorization’. If one probability distribution is less flat than another, people say it ‘majorizes’ that other one.

Here’s the definition. Say we have two probability distributions p and q on the same set. For each one, list the probabilities in decreasing order:

p_1 \ge p_2 \ge \cdots \ge p_n

q_1 \ge q_2 \ge \cdots \ge q_n

Then we say p majorizes q if

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n. So, the idea is that the biggest probabilities in the distribution p add up to more than the corresponding biggest ones in q.

In 1960, Alfred Rényi defined a generalization of the usual Shannon entropy that depends on a parameter \beta. If p is a probability distribution on a finite set, its Rényi entropy of order \beta is defined to be

\displaystyle{ H_\beta(p) = \frac{1}{1 - \beta} \ln \sum_i p_i^\beta }

where 0 \le \beta < \infty. Well, to be honest: if \beta is 0, 1, or \infty we have to define this by taking a limit where we let \beta creep up to that value. But the limit exists, and when \beta = 1 we get the usual Shannon entropy

\displaystyle{ H_1(p) = - \sum_i p_i \ln(p_i) }

As I explained a while ago, Rényi entropies are important ways of measuring biodiversity. But here’s what I learned just now, from the paper by Erven and Harremoës:

Theorem 1. If a probability distribution p majorizes a probability distribution q, its Rényi entropies are smaller:

\displaystyle{ H_\beta(p) \le H_\beta(q) }

for all 0 \le \beta < \infty.

And here’s what makes this fact so nice. If you do something to a classical system in a way that might involve some randomness, we can describe your action using a stochastic matrix. An n \times n matrix T is called stochastic if whenever p \in \mathbb{R}^n is a probability distribution, so is T p. This is equivalent to saying:

• the matrix entries of T are all \ge 0, and

• each column of T sums to 1.

If T is stochastic, it’s not necessarily true that the entropy of T p is greater than or equal to that of p, not even for the Shannon entropy.

Puzzle 1. Find a counterexample.

However, entropy does increase if we use specially nice stochastic matrices called ‘doubly stochastic’ matrices. People say a matrix T doubly stochastic if it’s stochastic and it maps the probability distribution

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

to itself. This is the most spread-out probability distribution of all: every other probability distribution majorizes this one.

Why do they call such matrices ‘doubly’ stochastic? Well, if you’ve got a stochastic matrix, each column sums to one. But a stochastic operator is doubly stochastic if and only if each row sums to 1 as well.

Here’s a really cool fact:

Theorem 2. If T is doubly stochastic, p majorizes T p for any probability distribution p \in \mathbb{R}^n. Conversely, if a probability distribution p majorizes a probability distribution q, then q = T p for some doubly stochastic matrix T.

Taken together, Theorems 1 and 2 say that doubly stochastic transformations increase entropy… but not just Shannon entropy! They increase all the different Rényi entropies, as well. So if time evolution is described by a doubly stochastic matrix, we get lots of ‘second laws of thermodynamics’, saying that all these different kinds of entropy increase!

Finally, what does all this have to do with quantum mechanics, and collapsing the wavefunction? There are different things to say, but this is the simplest:

Theorem 3. Given two probability distributions p, q \in \mathbb{R}^n, then p majorizes q if and only there exists a self-adjoint matrix D with eigenvalues p_i and diagonal entries q_i.

The matrix D will be a density matrix: a self-adjoint matrix with positive eigenvalues and trace equal to 1. We use such matrices to describe mixed states in quantum mechanics.

Theorem 3 gives a precise sense in which preparing a quantum system in some state, letting time evolve, and then measuring it ‘increases randomness’.

How? Well, suppose we have a quantum system whose Hilbert space is \mathbb{C}^n. If we prepare the system in a mixture of the standard basis states with probabilities p_i, we can describe it with a diagonal density matrix D_0. Then suppose we wait a while and some unitary time evolution occurs. The system is now described by a new density matrix

D = U D_0 \, U^{-1}

where U is some unitary operator. If we then do a measurement to see which of the standard basis states our system now lies in, we’ll get the different possible results with probabilities q_i, the diagonal entries of D. But the eigenvalues of D will still be the numbers p_i. So, by the theorem, p majorizes q!

So, not only Shannon entropy but also all the Rényi entropies will increase!

Of course, there are some big physics questions lurking here. Like: what about the real world? In the real world, do lots of different kinds of entropy tend to increase, or just some?

Of course, there’s a huge famous old problem about how reversible time evolution can be compatible with any sort of law saying that entropy must always increase! Still, there are some arguments, going back to Boltzmann’s H-theorem, which show entropy increases under some extra conditions. So then we can ask if other kinds of entropy, like Rényi entropy, increase as well. This will be true whenever we can argue that time evolution is described by doubly stochastic matrices. Theorem 3 gives a partial answer, but there’s probably much more to say.

I don’t have much more to say right now, though. I’ll just point out that while doubly stochastic matrices map the ‘maximally smeared-out’ probability distribution

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

to itself, a lot of this theory generalizes to stochastic matrices that map exactly one other probability distribution to itself. We need to work with relative Rényi entropy instead of Rényi entropy, and so on, but I don’t think these adjustments are really a big deal. And there are nice theorems that let you know when a stochastic matrix maps exactly one probability distribution to itself, based on the Perron–Frobenius theorem.

References

I already gave you a reference for Theorem 1, namely the paper by Erven and Harremoës, though I don’t think they were the first to prove this particular result: they generalize it quite a lot.

What about Theorem 2? It goes back at least to here:

• Barry C. Arnold, Majorization and the Lorenz Order: A Brief Introduction, Springer Lecture Notes in Statistics 43, Springer, Berlin, 1987.

The partial order on probability distributions given by majorization is also called the ‘Lorenz order’, but mainly when we consider probability distributions on infinite sets. This name presumably comes from the Lorenz curve, a measure of income inequality. This curve shows for the bottom x% of households, what percentage y% of the total income they have:

Puzzle 2. If you’ve got two different probability distributions of incomes, and one majorizes the other, how are their Lorenz curves related?

When we generalize majorization by letting some other probability distribution take the place of

\displaystyle{ p_0 = (\frac{1}{n}, \dots, \frac{1}{n}) }

it seems people call it the ‘Markov order’. Here’s a really fascinating paper on that, which I’m just barely beginning to understand:

• A. N. Gorban, P. A. Gorban and G. Judge, Entropy: the Markov ordering approach, Entropy 12 (2010), 1145–1193.

What about Theorem 3? Apparently it goes back to here:

• A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 20 (1971), 633.

though I only know this thanks to a more recent paper:

• Michael A. Nielsen, Conditions for a class of entanglement transformations, Phys. Rev. Lett. 83 (1999), 436–439.

By the way, Nielsen’s paper contains another very nice result about majorization! Suppose you have states \psi and \phi of a 2-part quantum system. You can trace out one part and get density matrices describing mixed states of the other part, say D_\psi and D_\phi. Then Nielsen shows you can get from \psi to \phi using ‘local operations and classical communication’ if and only if D_\phi majorizes D_\psi. Note that things are going backwards here compared to how they’ve been going in the rest of this post: if we can get from \psi to \phi, then all forms of entropy go down when we go from D_\psi to D_\phi! This ‘anti-second-law’ behavior is confusing at first, but familiar to me by now.

When I first learned all this stuff, I naturally thought of the following question—maybe you did too, just now. If p, q \in \mathbb{R}^n are probability distributions and

\displaystyle{ H_\beta(p) \le H_\beta(q) }

for all 0 \le \beta < \infty, is it true that p majorizes q?

Apparently the answer must be no, because Klimesh has gone to quite a bit of work to obtain a weaker conclusion: not that p majorizes q, but that p \otimes r majorizes q \otimes r for some probability distribution r \in \mathbb{R}^m. He calls this catalytic majorization, with r serving as a ‘catalyst’:

• Matthew Klimesh, Inequalities that collectively completely characterizes the catalytic majorization relation.

I thank Vlatko Vedral here at the CQT for pointing this out!

Finally, here is a good general introduction to majorization, pointed out by Vasileios Anagnostopoulos:

• T. Ando, Majorization, doubly stochastic matrices, and comparison of eigenvalues, Linear Algebra and its Applications 118 (1989), 163-–248.


Network Theory (Part 20)

6 August, 2012

guest post by Jacob Biamonte

We’re in the middle of a battle: in addition to our typical man versus equation scenario, it’s a battle between two theories. For those good patrons following the network theory series, you know the two opposing forces well. It’s our old friends, at it again:

Stochastic Mechanics vs Quantum Mechanics!

Today we’re reporting live from a crossroads, and we’re facing a skirmish that gives rise to what some might consider a paradox. Let me sketch the main thesis before we get our hands dirty with the gory details.

First I need to tell you that the battle takes place at the intersection of stochastic and quantum mechanics. We recall from Part 16 that there is a class of operators called ‘Dirichlet operators’ that are valid Hamiltonians for both stochastic and quantum mechanics. In other words, you can use them to generate time evolution both for old-fashioned random processes and for quantum processes!

Staying inside this class allows the theories to fight it out on the same turf. We will be considering a special subclass of Dirichlet operators, which we call ‘irreducible Dirichlet operators’. These are the ones where starting in any state in our favorite basis of states, we have a nonzero chance of winding up in any other. When considering this subclass, we found something interesting:

Thesis. Let H be an irreducible Dirichlet operator with n eigenstates. In stochastic mechanics, there is only one valid state that is an eigenvector of H: the unique so-called ‘Perron–Frobenius state’. The other n-1 eigenvectors are forbidden states of a stochastic system: the stochastic system is either in the Perron–Frobenius state, or in a superposition of at least two eigensvectors. In quantum mechanics, all n eigenstates of H are valid states.

This might sound like a riddle, but today as we’ll prove, riddle or not, it’s a fact. If it makes sense, well that’s another issue. As John might have said, it’s like a bone kicked down from the gods up above: we can either choose to chew on it, or let it be. Today we are going to do a bit of chewing.

One of the many problems with this post is that John had a nut loose on his keyboard. It was not broken! I’m saying he wrote enough blog posts on this stuff to turn them into a book. I’m supposed to be compiling the blog articles into a massive LaTeX file, but I wrote this instead.

Another problem is that this post somehow seems to use just about everything said before, so I’m going to have to do my best to make things self-contained. Please bear with me as I try to recap what’s been done. For those of you familiar with the series, a good portion of the background for what we’ll cover today can be found in Part 12 and Part 16.

At the intersection of two theories

As John has mentioned in his recent talks, the typical view of how quantum mechanics and probability theory come into contact looks like this:

The idea is that quantum theory generalizes classical probability theory by considering observables that don’t commute.

That’s perfectly valid, but we’ve been exploring an alternative view in this series. Here quantum theory doesn’t subsume probability theory, but they intersect:

What goes in the middle you might ask? As odd as it might sound at first, John showed in Part 16 that electrical circuits made of resistors constitute the intersection!

For example, a circuit like this:

gives rise to a Hamiltonian H that’s good both for stochastic mechanics and quantum mechanics. Indeed, he found that the power dissipated by a circuit made of resistors is related to the familiar quantum theory concept known as the expectation value of the Hamiltonian!

\textrm{power} = -2 \langle \psi, H \psi \rangle

Oh—and you might think we made a mistake and wrote our Ω (ohm) symbols upside down. We didn’t. It happens that ℧ is the symbol for a ‘mho’—a unit of conductance that’s the reciprocal of an ohm. Check out Part 16 for the details.

Stochastic mechanics versus quantum mechanics

Let’s recall how states, time evolution, symmetries and observables work in the two theories. Today we’ll fix a basis for our vector space of states, and we’ll assume it’s finite-dimensional so that all vectors have n components over either the complex numbers \mathbb{C} or the reals \mathbb{R}. In other words, we’ll treat our space as either \mathbb{C}^n or \mathbb{R}^n. In this fashion, linear operators that map such spaces to themselves will be represented as square matrices.

Vectors will be written as \psi_i where the index i runs from 1 to n, and we think of each choice of the index as a state of our system—but since we’ll be using that word in other ways too, let’s call it a configuration. It’s just a basic way our system can be.

States

Besides the configurations i = 1,\dots, n, we have more general states that tell us the probability or amplitude of finding our system in one of these configurations:

Stochastic states are n-tuples of nonnegative real numbers:

\psi_i \in \mathbb{R}^+

The probability of finding the system in the ith configuration is defined to be \psi_i. For these probabilities to sum to one, \psi_i needs to be normalized like this:

\displaystyle{ \sum_i \psi_i = 1 }

or in the notation we’re using in these articles:

\displaystyle{ \langle \psi \rangle = 1 }

where we define

\displaystyle{ \langle \psi \rangle = \sum_i \psi_i }

Quantum states are n-tuples of complex numbers:

\psi_i \in \mathbb{C}

The probability of finding a state in the ith configuration is defined to be |\psi(x)|^2. For these probabilities to sum to one, \psi needs to be normalized like this:

\displaystyle{ \sum_i |\psi_i|^2 = 1 }

or in other words

\langle \psi,  \psi \rangle = 1

where the inner product of two vectors \psi and \phi is defined by

\displaystyle{ \langle \psi, \phi \rangle = \sum_i \overline{\psi}_i \phi_i }

Now, the usual way to turn a quantum state \psi into a stochastic state is to take the absolute value of each number \psi_i and then square it. However, if the numbers \psi_i happen to be nonnegative, we can also turn \psi into a stochastic state simply by multiplying it by a number to ensure \langle \psi \rangle = 1.

This is very unorthodox, but it lets us evolve the same vector \psi either stochastically or quantum-mechanically, using the recipes I’ll describe next. In physics jargon these correspond to evolution in ‘real time’ and ‘imaginary time’. But don’t ask me which is which: from a quantum viewpoint stochastic mechanics uses imaginary time, but from a stochastic viewpoint it’s the other way around!

Time evolution

Time evolution works similarly in stochastic and quantum mechanics, but with a few big differences:

• In stochastic mechanics the state changes in time according to the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

which has the solution

\psi(t) = \exp(t H) \psi(0)

• In quantum mechanics the state changes in time according to Schrödinger’s equation:

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

which has the solution

\psi(t) = \exp(-i t H) \psi(0)

The operator H is called the Hamiltonian. The properties it must have depend on whether we’re doing stochastic mechanics or quantum mechanics:

• We need H to be infinitesimal stochastic for time evolution given by \exp(t H) to send stochastic states to stochastic states. In other words, we need that (i) its columns sum to zero and (ii) its off-diagonal entries are real and nonnegative:

\displaystyle{ \sum_i H_{i j}=0 }

i\neq j \Rightarrow H_{i j}\geq 0

• We need H to be self-adjoint for time evolution given by \exp(-i t H) to send quantum states to quantum states. So, we need

H = H^\dagger

where we recall that the adjoint of a matrix is the conjugate of its transpose:

\displaystyle{ (H^\dagger)_{i j} := \overline{H}_{j i} }

We are concerned with the case where the operator H generates both a valid quantum evolution and also a valid stochastic one:

H is a Dirichlet operator if it’s both self-adjoint and infinitesimal stochastic.

We will soon go further and zoom in on this intersection! But first let’s finish our review.

Symmetries

As John explained in Part 12, besides states and observables we need symmetries, which are transformations that map states to states. These include the evolution operators which we only briefly discussed in the preceding subsection.

• A linear map U that sends quantum states to quantum states is called an isometry, and isometries are characterized by this property:

U^\dagger U = 1

• A linear map U that sends stochastic states to stochastic states is called a stochastic operator, and stochastic operators are characterized by these properties:

\displaystyle{ \sum_i U_{i j} = 1 }

and

U_{i j} \geq 0

A notable difference here is that in our finite-dimensional situation, isometries are always invertible, but stochastic operators may not be! If U is an n \times n matrix that’s an isometry, U^\dagger is its inverse. So, we also have

U U^\dagger = 1

and we say U is unitary. But if U is stochastic, it may not have an inverse—and even if it does, its inverse is rarely stochastic. This explains why in stochastic mechanics time evolution is often not reversible, while in quantum mechanics it always is.

Puzzle 1. Suppose U is a stochastic n \times n matrix whose inverse is stochastic. What are the possibilities for U?

It is quite hard for an operator to be a symmetry in both stochastic and quantum mechanics, especially in our finite-dimensional situation:

Puzzle 2. Suppose U is an n \times n matrix that is both stochastic and unitary. What are the possibilities for U?

Observables

‘Observables’ are real-valued quantities that can be measured, or predicted, given a specific theory.

• In quantum mechanics, an observable is given by a self-adjoint matrix O, and the expected value of the observable O in the quantum state \psi is

\displaystyle{ \langle \psi , O \psi \rangle = \sum_{i,j} \overline{\psi}_i O_{i j} \psi_j }

• In stochastic mechanics, an observable O has a value O_i in each configuration i, and the expected value of the observable O in the stochastic state \psi is

\displaystyle{ \langle O \psi \rangle = \sum_i O_i \psi_i }

We can turn an observable in stochastic mechanics into an observable in quantum mechanics by making a diagonal matrix whose diagonal entries are the numbers O_i.

From graphs to matrices

Back in Part 16, John explained how a graph with positive numbers on its edges gives rise to a Hamiltonian in both quantum and stochastic mechanics—in other words, a Dirichlet operator.

Here’s how this works. We’ll consider simple graphs: graphs without arrows on their edges, with at most one edge from one vertex to another, with no edges from a vertex to itself. And we’ll only look at graphs with finitely many vertices and edges. We’ll assume each edge is labelled by a positive number, like this:

If our graph has n vertices, we can create an n \times n matrix A where A_{i j} is the number labelling the edge from i to j, if there is such an edge, and 0 if there’s not. This matrix is symmetric, with real entries, so it’s self-adjoint. So A is a valid Hamiltonian in quantum mechanics.

How about stochastic mechanics? Remember that a Hamiltonian H in stochastic mechanics needs to be ‘infinitesimal stochastic’. So, its off-diagonal entries must be nonnegative, which is indeed true for our A, but also the sums of its columns must be zero, which is not true when our A is nonzero.

But now comes the best news you’ve heard all day: we can improve A to a stochastic operator in a way that is completely determined by A itself! This is done by subtracting a diagonal matrix L whose entries are the sums of the columns of A:

L_{i i} = \sum_i A_{i j}

i \ne j \Rightarrow L_{i j} = 0

It’s easy to check that

H = A - L

is still self-adjoint, but now also infinitesimal stochastic. So, it’s a Dirichlet operator: a good Hamiltonian for both stochastic and quantum mechanics!

In Part 16, we saw a bit more: every Dirichlet operator arises this way. It’s easy to see. You just take your Dirichlet operator and make a graph with one edge for each nonzero off-diagonal entry. Then you label the edge with this entry.

So, Dirichlet operators are essentially the same as finite simple graphs with edges labelled by positive numbers.

Now, a simple graph can consist of many separate ‘pieces’, called components. Then there’s no way for a particle hopping along the edges to get from one component to another, either in stochastic or quantum mechanics. So we might as well focus our attention on graphs with just one component. These graphs are called ‘connected’. In other words:

Definition. A simple graph is connected if it is nonempty and there is a path of edges connecting any vertex to any other.

Our goal today is to understand more about Dirichlet operators coming from connected graphs. For this we need to learn the Perron–Frobenius theorem. But let’s start with something easier.

Perron’s theorem

In quantum mechanics it’s good to think about observables that have positive expected values:

\langle \psi, O \psi \rangle > 0

for every quantum state \psi \in \mathbb{C}^n. These are called positive definite. But in stochastic mechanics it’s good to think about matrices that are positive in a more naive sense:

Definition. An n \times n real matrix T is positive if all its entries are positive:

T_{i j} > 0

for all 1 \le i, j \le n.

Similarly:

Definition. A vector \psi \in \mathbb{R}^n is positive if all its components are positive:

\psi_i > 0

for all 1 \le i \le n.

We’ll also define nonnegative matrices and vectors in the same way, replacing > 0 by \ge 0. A good example of a nonnegative vector is a stochastic state.

In 1907, Perron proved the following fundamental result about positive matrices:

Perron’s Theorem. Given a positive square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| < r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T \psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

In other words, if T is positive, it has a unique eigenvalue with the largest absolute value. This eigenvalue is positive. Up to a constant factor, it has an unique eigenvector. We can choose this eigenvector to be positive. And then, up to a constant factor, it’s the only nonnegative eigenvector of T.

From matrices to graphs

The conclusions of Perron’s theorem don’t hold for matrices that are merely nonnegative. For example, these matrices

\left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) , \qquad \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right)

are nonnegative, but they violate lots of the conclusions of Perron’s theorem.

Nonetheless, in 1912 Frobenius published an impressive generalization of Perron’s result. In its strongest form, it doesn’t apply to all nonnegative matrices; only to those that are ‘irreducible’. So, let us define those.

We’ve seen how to build a matrix from a graph. Now we need to build a graph from a matrix! Suppose we have an n \times n matrix T. Then we can build a graph G_T with n vertices where there is an edge from the ith vertex to the jth vertex if and only if T_{i j} \ne 0.

But watch out: this is a different kind of graph! It’s a directed graph, meaning the edges have directions, there’s at most one edge going from any vertex to any vertex, and we do allow an edge going from a vertex to itself. There’s a stronger concept of ‘connectivity’ for these graphs:

Definition. A directed graph is strongly connected if there is a directed path of edges going from any vertex to any other vertex.

So, you have to be able to walk along edges from any vertex to any other vertex, but always following the direction of the edges! Using this idea we define irreducible matrices:

Definition. A nonnegative square matrix T is irreducible if its graph G_T is strongly connected.

The Perron–Frobenius theorem

Now we are ready to state:

The Perron-Frobenius Theorem. Given an irreducible nonnegative square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| \le r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T\psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

The only conclusion of this theorem that’s weaker than those of Perron’s theorem is that there may be other eigenvalues with |\lambda| = r. For example, this matrix is irreducible and nonnegative:

\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)

Its Perron–Frobenius eigenvalue is 1, but it also has -1 as an eigenvalue. In general, Perron-Frobenius theory says quite a lot about the other eigenvalues on the circle |\lambda| = r, but we won’t need that fancy stuff here.

Perron–Frobenius theory is useful in many ways, from highbrow math to ranking football teams. We’ll need it not just today but also later in this series. There are many books and other sources of information for those that want to take a closer look at this subject. If you’re interested, you can search online or take a look at these:

• Dimitrious Noutsos, Perron Frobenius theory and some extensions, 2008. (Includes proofs of the basic theorems.)

• V. S. Sunder, Perron Frobenius theory, 18 December 2009. (Includes applications to graph theory, Markov chains and von Neumann algebras.)

• Stephen Boyd, Lecture 17: Perron Frobenius theory, Winter 2008-2009. (Includes a max-min characterization of the Perron–Frobenius eigenvalue and applications to Markov chains, economics, population growth and power control.)

I have not taken a look myself, but if anyone is interested and can read German, the original work appears here:

• Oskar Perron, Zur Theorie der Matrizen, Math. Ann. 64 (1907), 248–263.

• Georg Frobenius, Über Matrizen aus nicht negativen Elementen, S.-B. Preuss Acad. Wiss. Berlin (1912), 456–477.

And, of course, there’s this:

• Wikipedia, Perron–Frobenius theorem.

It’s got a lot of information.

Irreducible Dirichlet operators

Now comes the payoff. We saw how to get a Dirichlet operator H from any finite simple graph with edges labelled by positive numbers. Now let’s apply Perron–Frobenius theory to prove our thesis.

Unfortunately, the matrix H is rarely nonnegative. If you remember how we built it, you’ll see its off-diagonal entries will always be nonnegative… but its diagonal entries can be negative.

Luckily, we can fix this just by adding a big enough multiple of the identity matrix to H! The result is a nonnegative matrix

T = H + c I

where c > 0 is some large number. This matrix T has the same eigenvectors as H. The off-diagonal matrix entries of T are the same as those of H, so T_{i j} is nonzero for i \ne j exactly when the graph we started with has an edge from i to j. So, for i \ne j, the graph G_T will have an directed edge going from i to j precisely when our original graph had an edge from i to j. And that means that if our original graph was connected, G_T will be strongly connected. Thus, by definition, the matrix T is irreducible!

Since T is nonnegative and irreducible, the Perron–Frobenius theorem swings into action and we conclude:

Lemma. Suppose H is the Dirichlet operator coming from a connected finite simple graph with edges labelled by positive numbers. Then the eigenvalues of H are real. Let \lambda be the largest eigenvalue. Then there is a positive vector \psi \in \mathbb{R}^n with H\psi = \lambda \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. The eigenvalues of H are real since H is self-adjoint. Notice that if r is the Perron–Frobenius eigenvalue of T = H + c I and

T \psi = r \psi

then

H \psi = (r - c)\psi

By the Perron–Frobenius theorem the number r is positive, and it has the largest absolute value of any eigenvalue of T. Thanks to the subtraction, the eigenvalue r - c may not have the largest absolute value of any eigenvalue of H. It is, however, the largest eigenvalue of H, so we take this as our \lambda. The rest follows from the Perron–Frobenius theorem.   █

But in fact we can improve this result, since the largest eigenvalue \lambda is just zero. Let’s also make up a definition, to make our result sound more slick:

Definition. A Dirichlet operator is irreducible if it comes from a connected finite simple graph with edges labelled by positive numbers.

This meshes nicely with our earlier definition of irreducibility for nonnegative matrices. Now:

Theorem. Suppose H is an irreducible Dirichlet operator. Then H has zero as its largest real eigenvalue. There is a positive vector \psi \in \mathbb{R}^n with H\psi = 0. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. Choose \lambda as in the Lemma, so that H\psi = \lambda \psi. Since \psi is positive we can normalize it to be a stochastic state:

\displaystyle{ \sum_i \psi_i = 1 }

Since H is a Dirichlet operator, \exp(t H) sends stochastic states to stochastic states, so

\displaystyle{ \sum_i (\exp(t H) \psi)_i = 1 }

for all t \ge 0. On the other hand,

\displaystyle{ \sum_i (\exp(t H)\psi)_i = \sum_i e^{t \lambda} \psi_i = e^{t \lambda} }

so we must have \lambda = 0.   █

What’s the point of all this? One point is that there’s a unique stochastic state \psi that’s an equilibrium state: since H \psi = 0, it doesn’t change with time. It’s also globally stable: since all the other eigenvalues of H are negative, all other stochastic states converge to this one as time goes forward.

An example

There are many examples of irreducible Dirichlet operators. For instance, in Part 15 we talked about graph Laplacians. The Laplacian of a connected simple graph is always irreducible. But let us try a different sort of example, coming from the picture of the resistors we saw earlier:

Let’s create a matrix A whose entry A_{i j} is the number labelling the edge from i to j if there is such an edge, and zero otherwise:

A = \left(  \begin{array}{ccccc}   0 & 2 & 1 & 0 & 1 \\   2 & 0 & 0 & 1 & 1 \\   1 & 0 & 0 & 2 & 1 \\   0 & 1 & 2 & 0 & 1 \\   1 & 1 & 1 & 1 & 0  \end{array}  \right)

Remember how the game works. The matrix A is already a valid Hamiltonian for quantum mechanics, since it’s self adjoint. However, to get a valid Hamiltonian for both stochastic and quantum mechanics—in other words, a Dirichlet operator—we subtract the diagonal matrix L whose entries are the sums of the columns of A. In this example it just so happens that the column sums are all 4, so L = 4 I, and our Dirichlet operator is

H = A - 4 I = \left(  \begin{array}{rrrrr}   -4 & 2 & 1 & 0 & 1 \\   2 & -4 & 0 & 1 & 1 \\   1 & 0 & -4 & 2 & 1 \\   0 & 1 & 2 & -4 & 1 \\   1 & 1 & 1 & 1 & -4  \end{array}  \right)

We’ve set up this example so it’s easy to see that the vector \psi = (1,1,1,1,1) has

H \psi = 0

So, this is the unique eigenvector for the eigenvalue 0. We can use Mathematica to calculate the remaining eigenvalues of H. The set of eigenvalues is

\{0, -7, -8, -8, -3 \}

As we expect from our theorem, the largest real eigenvalue is 0. By design, the eigenstate associated to this eigenvalue is

| v_0 \rangle = (1, 1, 1, 1, 1)

(This funny notation for vectors is common in quantum mechanics, so don’t worry about it.) All the other eigenvectors fail to be nonnegative, as predicted by the theorem. They are:

| v_1 \rangle = (1, -1, -1, 1, 0)
| v_2 \rangle = (-1, 0, -1, 0, 2)
| v_3 \rangle = (-1, 1, -1, 1, 0)
| v_4 \rangle = (-1, -1, 1, 1, 0)

To compare the quantum and stochastic states, consider first |v_0\rangle. This is the only eigenvector that can be normalized to a stochastic state. Remember, a stochastic state must have nonnegative components. This rules out |v_1\rangle through to |v_4\rangle as valid stochastic states, no matter how we normalize them! However, these are allowed as states in quantum mechanics, once we normalize them correctly. For a stochastic system to be in a state other than the Perron–Frobenius state, it must be a linear combination of least two eigenstates. For instance,

\psi_a = (1-a)|v_0\rangle + a |v_1\rangle

can be normalized to give stochastic state only if 0 \leq a \leq \frac{1}{2}.

And, it’s easy to see that it works this way for any irreducible Dirichlet operator, thanks to our theorem. So, our thesis has been proved true!

Puzzles on irreducibility

Let us conclude with a couple more puzzles. There are lots of ways to characterize irreducible nonnegative matrices; we don’t need to mention graphs. Here’s one:

Puzzle 3. Let T be a nonnegative n \times n matrix. Show that T is irreducible if and only if for all i,j \ge 0, (T^m)_{i j} > 0 for some natural number m.

You may be confused because today we explained the usual concept of irreducibility for nonnegative matrices, but also defined a concept of irreducibility for Dirichlet operators. Luckily there’s no conflict: Dirichlet operators aren’t nonnegative matrices, but if we add a big multiple of the identity to a Dirichlet operator it becomes a nonnegative matrix, and then:

Puzzle 4. Show that a Dirichlet operator H is irreducible if and only if the nonnegative operator H + c I (where c is any sufficiently large constant) is irreducible.

Irreducibility is also related to the nonexistence of interesting conserved quantities. In Part 11 we saw a version of Noether’s Theorem for stochastic mechanics. Remember that an observable O in stochastic mechanics assigns a number O_i to each configuration i = 1, \dots, n. We can make a diagonal matrix with O_i as its diagonal entries, and by abuse of language we call this O as well. Then we say O is a conserved quantity for the Hamiltonian H if the commutator [O,H] = O H - H O vanishes.

Puzzle 5. Let H be a Dirichlet operator. Show that H is irreducible if and only if every conserved quantity O for H is a constant, meaning that for some c \in \mathbb{R} we have O_i = c for all i. (Hint: examine the proof of Noether’s theorem.)

In fact this works more generally:

Puzzle 6. Let H be an infinitesimal stochastic matrix. Show that H + c I is an irreducible nonnegative matrix for all sufficiently large c if and only if every conserved quantity O for H is a constant.


Network Theory (Part 19)

18 July, 2012

joint with Jacob Biamonte

It’s time to resume the network theory series! We’re writing a little book based on some of these posts, so we want to finish up our discussion of stochastic Petri nets and chemical reaction networks. But it’s been a long time, so we’ll assume you forgot everything we’ve said before, and make this post as self-contained as possible.

Last time we started looking at a simple example: a diatomic gas.

A diatomic molecule of this gas can break apart into two atoms:

A_2 \to A + A

and conversely, two atoms can combine to form a diatomic molecule:

A + A \to A_2

We can draw both these reactions using a chemical reaction network:

where we’re writing B instead of A_2 to abstract away some detail that’s just distracting here.

Last time we looked at the rate equation for this chemical reaction network, and found equilibrium solutions of that equation. Now let’s look at the master equation, and find equilibrium solutions of that. This will serve as a review of three big theorems.

The master equation

We’ll start from scratch. The master equation is all about how atoms or molecules or rabbits or wolves or other things interact randomly and turn into other things. So, let’s write \psi_{m,n} for the probability that we have m atoms of A and n molecule of B in our container. These probabilities are functions of time, and master equation will say how they change.

First we need to pick a rate constant for each reaction. Let’s say the rate constant for this reaction is some number \alpha > 0:

B \to A + A

while the rate constant for the reverse reaction is some number \beta > 0:

A + A \to B

Before we make it pretty using the ideas we’ve been explaining all along, the master equation says

\begin{array}{ccr} \displaystyle{ \frac{d}{d t} \psi_{m,n} (t)} &=&  \alpha (n+1) \, \psi_{m-2,n+1}(t) \\ \\   &&- \; \alpha n \, \psi_{m,n}(t) \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1}(t) \\ \\ && - \;\beta m(m-1) \, \psi_{m,n}(t) \\ \\   \end{array}

where we define \psi_{i,j} to be zero if either i or j is negative.

Yuck!

Normally we don’t show you such nasty equations. Indeed the whole point of our work has been to demonstrate that by packaging the equations in a better way, we can understand them using high-level concepts instead of mucking around with millions of scribbled symbols. But we thought we’d show you what’s secretly lying behind our beautiful abstract formalism, just once.

Each term has a meaning. For example, the third one:

\beta (m+2)(m+1)\psi_{m+2,n-1}(t)

means that the reaction A + A \to B will tend to increase the probability of there being m atoms of A and n molecules of B if we start with m+2 atoms of A and n-1 molecules of B. This reaction can happen in (m+2)(m+1) ways. And it happens at a probabilistic rate proportional to the rate constant for this reaction, \beta.

We won’t go through the rest of the terms. It’s a good exercise to do so, but there could easily be a typo in the formula, since it’s so long and messy. So let us know if you find one!

To simplify this mess, the key trick is to introduce a generating function that summarizes all the probabilities in a single power series:

\Psi = \sum_{m,n \ge 0} \psi_{m,n} y^m \, z^n

It’s a power series in two variables, y and z, since we have two chemical species: As and Bs.

Using this trick, the master equation looks like

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is a sum of terms, one for each reaction. This Hamiltonian is built from operators that annihilate and create As and Bs. The annihilation and creation operators for A atoms are:

\displaystyle{ a = \frac{\partial}{\partial y} , \qquad a^\dagger = y }

The annihilation operator differentiates our power series with respect to the variable y. The creation operator multiplies it by that variable. Similarly, the annihilation and creation operators for B molecules are:

\displaystyle{ b = \frac{\partial}{\partial z} , \qquad b^\dagger = z }

In Part 8 we explained a recipe that lets us stare at our chemical reaction network and write down this Hamiltonian:

H = \alpha ({a^\dagger}^2 b - b^\dagger b) + \beta (b^\dagger a^2 - {a^\dagger}^2 a^2)

As promised, there’s one term for each reaction. But each term is itself a sum of two: one that increases the probability that our container of chemicals will be in a new state, and another that decreases the probability that it’s in its original state. We get a total of four terms, which correspond to the four terms in our previous way of writing the master equation.

Puzzle 1. Show that this new way of writing the master equation is equivalent to the previous one.

Equilibrium solutions

Now we will look for all equilibrium solutions of the master equation: in other words, solutions that don’t change with time. So, we’re trying to solve

H \Psi = 0

Given the rather complicated form of the Hamiltonian, this seems tough. The challenge looks more concrete but even more scary if we go back to our original formulation. We’re looking for probabilities \psi_{m,n}, nonnegative numbers that sum to one, such that

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

for all m and n.

This equation is horrid! But the good news is that it’s linear, so a linear combination of solutions is again a solution. This lets us simplify the problem using a conserved quantity.

Clearly, there’s a quantity that the reactions here don’t change:

What’s that? It’s the number of As plus twice the number of Bs. After all, a B can turn into two As, or vice versa.

(Of course the secret reason is that B is a diatomic molecule made of two As. But you’d be able to follow the logic here even if you didn’t know that, just by looking at the chemical reaction network… and sometimes this more abstract approach is handy! Indeed, the way chemists first discovered that certain molecules are made of certain atoms is by seeing which reactions were possible and which weren’t.)

Suppose we start in a situation where we know for sure that the number of Bs plus twice the number of As equals some number k:

\psi_{m,n} = 0  \;  \textrm{unless} \; m+2n = k

Then we know \Psi is initially of the form

\displaystyle{ \Psi = \sum_{m+2n = k} \psi_{m,n} \, y^m z^n  }

But since the number of As plus twice the number of Bs is conserved, if \Psi obeys the master equation it will continue to be of this form!

Put a fancier way, we know that if a solution of the master equation starts in this subspace:

\displaystyle{ L_k =  \{ \Psi: \; \Psi = \sum_{m+2n = k} \psi_{m,n} y^m z^n \; \textrm{for some} \; \psi_{m,n} \}  }

it will stay in this subspace. So, because the master equation is linear, we can take any solution \Psi and write it as a linear combination of solutions \Psi_k, one in each subspace L_k.

In particular, we can do this for an equilibrium solution \Psi. And then all the solutions \Psi_k are also equilibrium solutions: they’re linearly independent, so if one of them changed with time, \Psi would too.

This means we can just look for equilibrium solutions in the subspaces L_k. If we find these, we can get all equilibrium solutions by taking linear combinations.

Once we’ve noticed that, our horrid equation makes a bit more sense:

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

Note that if the pair of subscripts m, n obey m + 2n = k, the same is true for the other pairs of subscripts here! So our equation relates the values of \psi_{m,n} for all the points (m,n) with integer coordinates lying on this line segment:

m+2n = k , \qquad m ,n \ge 0

You should be visualizing something like this:

If you think about it a minute, you’ll see that if we know \psi_{m,n} at two points on such a line, we can keep using our equation to recursively work out all the rest. So, there are at most two linearly independent equilibrium solutions of the master equation in each subspace L_k.

Why at most two? Why not two? Well, we have to be a bit careful about what happens at the ends of the line segment: remember that \psi_{m,n} is defined to be zero when m or n becomes negative. If we think very hard about this, we’ll see there’s just one linearly independent equilibrium solution of the master equation in each subspace L_k. But this is the sort of nitty-gritty calculation that’s not fun to watch someone else do, so we won’t bore you with that.

Soon we’ll move on to a more high-level approach to this problem. But first, one remark. Our horrid equation is like a fancy version of the usual discretized form of the equation

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

namely:

\psi_{n-1} - 2 \psi_{n} + \psi_{n+1} = 0

And this makes sense, since we get

\displaystyle {\frac{d^2 \psi}{d x^2} = 0 }

by taking the heat equation:

\displaystyle \frac{\partial \psi}{\partial t} = {\frac{\partial^2 \psi}{\partial x^2} }

and assuming \psi doesn’t depend on time. So what we’re doing is a lot like looking for equilibrium solutions of the heat equation.

The heat equation describes how heat smears out as little particles of heat randomly move around. True, there don’t really exist ‘little particles of heat’, but this equation also describes the diffusion of any other kind of particles as they randomly move around undergoing Brownian motion. Similarly, our master equation describes a random walk on this line segment:

m+2n = k , \qquad m , n \ge 0

or more precisely, the points on this segment with integer coordinates. The equilibrium solutions arise when the probabilities \psi_{m,n} have diffused as much as possible.

If you think about it this way, it should be physically obvious that there’s just one linearly independent equilibrium solution of the master equation for each value of k.

There’s a general moral here, too, which we’re seeing in a special case: the master equation for a chemical reaction network really describes a bunch of random walks, one for each allowed value of the conserved quantities that can be built as linear combinations of number operators. In our case we have one such conserved quantity, but in general there may be more (or none).

Furthermore, these ‘random walks’ are what we’ve been calling Markov processes.

Noether’s theorem

We simplified our task of finding equilibrium solutions of the master equation by finding a conserved quantity. The idea of simplifying problems using conserved quantities is fundamental to physics: this is why physicists are so enamored with quantities like energy, momentum, angular momentum and so on.

Nowadays physicists often use ‘Noether’s theorem’ to get conserved quantities from symmetries. There’s a very simple version of Noether’s theorem for quantum mechanics, but in Part 11 we saw a version for stochastic mechanics, and it’s that version that is relevant now. Here’s a paper which explains it in detail:

• John Baez and Brendan Fong, Noether’s theorem for Markov processes.

We don’t really need Noether’s theorem now, since we found the conserved quantity and exploited it without even noticing the symmetry. Nonetheless it’s interesting to see how it relates to what we’re doing.

For the reaction we’re looking at now, the idea is that the subspaces L_k are eigenspaces of an operator that commutes with the Hamiltonian H. It follows from standard math that a solution of the master equation that starts in one of these subspaces, stays in that subspace.

What is this operator? It’s built from ‘number operators’. The number operator for As is

N_A = a^\dagger a

and the number operator for Bs is

N_B = b^\dagger b

A little calculation shows

N_A \,y^m z^n = m \, y^m z^n, \quad \qquad  N_B\, y^m z^n = n \,y^m z^n

so the eigenvalue of N_A is the number of As, while the eigenvalue of N_B is the number of Bs. This is why they’re called number operators.

As a consequence, the eigenvalue of the operator N_A + 2N_B is the number of As plus twice the number of Bs:

(N_A + 2N_B) \, y^m z^n = (m + 2n) \, y^m z^n

Let’s call this operator O, since it’s so important:

O = N_A + 2N_B

If you think about it, the spaces L_k we saw a minute ago are precisely the eigenspaces of this operator:

L_k = \{ \Psi : \; O \Psi = k \Psi \}

As we’ve seen, solutions of the master equation that start in one of these eigenspaces will stay there. This lets take some techniques that are very familiar in quantum mechanics, and apply them to this stochastic situation.

First of all, time evolution as described by the master equation is given by the operators \exp(t H). In other words,

\displaystyle{ \frac{d}{d t} \Psi(t) } = H \Psi(t) \; \textrm{and} \;  \Psi(0) = \Phi \quad  \Rightarrow \quad \Psi(t) = \exp(t H) \Phi

But if you start in some eigenspace of O, you stay there. Thus if \Phi is an eigenvector of O, so is \exp(t H) \Phi, with the same eigenvalue. In other words,

O \Phi = k \Phi

implies

O \exp(t H) \Phi = k \exp(t H) \Phi = \exp(t H) O \Phi

But since we can choose a basis consisting of eigenvectors of O, we must have

O \exp(t H) = \exp(t H) O

or, throwing caution to the winds and differentiating:

O H = H O

So, as we’d expect from Noether’s theorem, our conserved quantity commutes with the Hamiltonian! This in turn implies that H commutes with any polynomial in O, which in turn suggests that

\exp(s O) H = H \exp(s O)

and also

\exp(s O) \exp(t H) = \exp(t H) \exp(s O)

The last equation says that O generates a 1-parameter family of ‘symmetries’: operators \exp(s O) that commute with time evolution. But what do these symmetries actually do? Since

O y^m z^n = (m + 2n) y^m z^n

we have

\exp(s O) y^m z^n = e^{s(m + 2n)}\, y^m z^n

So, this symmetry takes any probability distribution \psi_{m,n} and multiplies it by e^{s(m + 2n)}.

In other words, our symmetry multiplies the relative probability of finding our container of gas in a given state by a factor of e^s for each A atom, and by a factor of e^{2s} for each B molecule. It might not seem obvious that this operation commutes with time evolution! However, experts on chemical reaction theory are familiar with this fact.

Finally, a couple of technical points. Starting where we said “throwing caution to the winds”, our treatment has not been rigorous, since O and H are unbounded operators, and these must be handled with caution. Nonetheless, all the commutation relations we wrote down are true.

The operators \exp(s O) are unbounded for positive s. They’re bounded for negative s, so they give a one-parameter semigroup of bounded operators. But they’re not stochastic operators: even for s negative, they don’t map probability distributions to probability distributions. However, they do map any nonzero vector \Psi with \psi_{m,n} \ge 0 to a vector \exp(s O) \Psi with the same properties. So, we can just normalize this vector and get a probability distribution. The need for this normalization is why we spoke of relative probabilities.

The Anderson–Craciun–Kurtz theorem

Now we’ll actually find all equilibrium solutions of the master equation in closed form. To understand this final section, you really do need to remember some things we’ve discussed earlier. Last time we considered the same chemical reaction network we’re studying today, but we looked at its rate equation, which looks like this:

\displaystyle{ \frac{d}{d t} x_1 =  2 \alpha x_2 - 2 \beta x_1^2}

\displaystyle{ \frac{d}{d t} x_2 = - \alpha x_2 + \beta x_1^2 }

This describes how the number of As and Bs changes in the limit where there are lots of them and we can treat them as varying continuously, in a deterministic way. The number of As is x_1, and the number of Bs is x_2.

We saw that the quantity

x_1 + 2 x_2

is conserved, just as today we’ve seen that N_A + 2 N_B is conserved. We saw that the rate equation has one equilibrium solution for each choice of x_1 + 2 x_2. And we saw that these equilibrium solutions obey

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

The Anderson–Craciun–Kurtz theorem, introduced in Part 9, is a powerful result that gets equilibrium solution of the master equation from equilibrium solutions of the rate equation. It only applies to equilibrium solutions that are ‘complex balanced’, but that’s okay:

Puzzle 2. Show that the equilibrium solutions of the rate equation for the chemical reaction network

are complex balanced.

So, given any equilibrium solution (x_1,x_2) of our rate equation, we can hit it with the Anderson-Craciun-Kurtz theorem and get an equilibrium solution of the master equation! And it looks like this:

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

In this solution, the probability distribution

\displaystyle{ \psi_{m,n} = e^{-(x_1 + x_2)} \,  \frac{x_1^m x_2^n} {m! n! } }

is a product of Poisson distributions. The factor in front is there to make the numbers \psi_{m,n} add up to one. And remember, x_1, x_2 are any nonnegative numbers with

\displaystyle{ \frac{x_1^2}{x_2} = \frac{\alpha}{\beta} }

So from all we’ve said, the above formula is an explicit closed-form solution of the horrid equation

\begin{array}{ccr} 0 &=&  \alpha (n+1) \, \psi_{m-2,n+1} \\ \\   &&- \; \alpha n \, \psi_{m,n} \\  \\  && + \beta (m+2)(m+1) \, \psi_{m+2,n-1} \\ \\ && - \;\beta m(m-1) \, \psi_{m,n} \\ \\   \end{array}

That’s pretty nice. We found some solutions without ever doing any nasty calculations.

But we’ve really done better than getting some equilibrium solutions of the master equation. By restricting attention to n,m with m+2n = k, our formula for \psi_{m,n} gives an equilibrium solution that lives in the eigenspace L_k:

\displaystyle{  \Psi_k = e^{-(x_1 + x_2)} \, \sum_{m+2n =k} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

And by what we’ve said, linear combinations of these give all equilibrium solutions of the master equation.

And we got them with very little work! Despite all the fancy talk in today’s post, we essentially just took the equilibrium solutions of the rate equation and plugged them into a straightforward formula to get equilibrium solutions of the master equation. This is why the Anderson–Craciun–Kurtz theorem is so nice. And of course we’re looking at a very simple reaction network: for more complicated ones it becomes even better to use this theorem to avoid painful calculations.

We could go further. For example, we could study nonequilibrium solutions using Feynman diagrams like this:

But instead, we will leave off with two more puzzles. We introduced some symmetries, but we haven’t really explored them yet:

Puzzle 3. What do the symmetries associated to the conserved quantity O do to the equilibrium solutions of the master equation given by

\displaystyle{  \Psi = e^{-(x_1 + x_2)} \, \sum_{m,n \ge 0} \frac{x_1^m x_2^n} {m! n! } \, y^m z^n }

where (x_1,x_2) is an equilibrium solution of the rate equation? In other words, what is the significance of the one-parameter family of solutions

\exp(s O) \Psi ?

Also, we used a conceptual argument to check that H commutes with O, but it’s good to know that we can check this sort of thing directly:

Puzzle 4. Compute the commutator

[H, O] = H O - O H

and show it vanishes.


The Mathematics of Biodiversity (Part 5)

3 July, 2012

I’d be happy to get your feedback on these slides of the talk I’m giving the day after tomorrow:

• John Baez, Diversity, entropy and thermodynamics, 6 July 2012, Exploratory Conference on the Mathematics of Biodiversity, Centre de Recerca Matemàtica, Barcelona.

Abstract: As is well known, some popular measures of biodiversity are formally identical to measures of entropy developed by Shannon, Rényi and others. This fact is part of a larger analogy between thermodynamics and the mathematics of biodiversity, which we explore here. Any probability distribution can be extended to a 1-parameter family of probability distributions where the parameter has the physical meaning of ‘temperature’. This allows us to introduce thermodynamic concepts such as energy, entropy, free energy and the partition function in any situation where a probability distribution is present—for example, the probability distribution describing the relative abundances of different species in an ecosystem. The Rényi entropy of this probability distribution is closely related to the change in free energy with temperature. We give one application of thermodynamic ideas to population dynamics, coming from the work of Marc Harper: as a population approaches an ‘evolutionary optimum’, the amount of Shannon information it has ‘left to learn’ is nonincreasing. This fact is closely related to the Second Law of Thermodynamics.

This talk is rather different than the one I’d envisaged giving! There was a lot of interest in my work on Rényi entropy and thermodynamics, because Rényi entropies—and their exponentials, called the Hill numbers—are an important measure of biodiversity. So, I decided to spend a lot of time talking about that.


Fluid Flows and Infinite Dimensional Manifolds (Part 3)

30 May, 2012

Or: Twisting on the Borderline

guest post by Tim van Beek

flow around a symmetric airfoil, lamiar to turbulent

In Part 2 of this series, I told you what ideal incompressible fluids are. Then I explained how the equation of motion for such fluids, Euler’s equation, can be seen as the equation for geodesics in \mathrm{SDiff}(M), the infinite dimensional manifold of volume preserving diffeomorphisms. Last time I promised to talk about the role of pressure in Euler’s equation. I also mentioned that Arnold used this geometric setup to put a bound on how good weather forecasts can be. I will try to fulfill both promises in the next blog post!

But last time I also mentioned that the ideal fluid has serious drawbacks as a model. This is an important topic, too, so I would like to explain this a little bit further first, in this blog post.

So, this time we will talk a little bit about how we can get viscosity, and therefore turbulence, back into the picture. This will lead us to the Navier–Stokes equation. Can ideal fluids, which solve Euler’s equation, model fluids with a very small viscosity? This depends on what happens to solutions when one lets the viscosity go to zero in the Navier–Stokes equation, so I will show you a result that answers this question in a specific context.

I’ll also throw in a few graphics that illustrate the transition from laminar flow to turbulent flow at boundaries, starting with the one above. These are all from:

• Milton Van Dyke, An Album of Fluid Motion, Parabolic Press, 12th edition, 1982.

flow around a circular cylinder, laminar to turbulent

Re-introducing viscosity: The Navier-Stokes equation

The motion of an incompressible, homogeneous, ideal fluid is described by Euler’s equation:

\displaystyle{ \partial_t u + u \cdot \nabla u = - \nabla p }

Ideal fluids are very nice mathematically. Nicest of all are potential flows, where the velocity vector field is the gradient of a potential. In two dimensions can be studied using complex analysis! One could say that a whole ‘industry’ evolved around the treatment of these kinds of fluid flows. It was even taught to some extend to engineers, before computers took over. Here’s a very nice, somewhat nostalgic book to read about that:

• L. M. Milne-Thomson, Theoretical Aerodynamics, 4th edition, Dover Publications, New York, 2011. (Reprint of the 1958 edition.)

The assumption of ‘incompressibility’ is not restrictive for most applications involving fluid flows of water and air, for example. Maybe you are a little bit surprised that I mention air, because the compressibility of air is a part of every day life, for example when you pump up a cycle tire. It is, however, not necessary to include this property when you model air flowing at velocities that are significantly lower than the speed of sound in air. The rule of thumb for engineers seems to be that one needs to include compressibility for speeds around Mach 0.3 or more:

Compressible aerodynamics, Wikipedia.

However, the concept of an ‘ideal’ fluid takes viscosity out of the picture—and therefore also turbulence, and the drag that a body immersed in fluid feels. As I mentioned last time, this is called the D’Alembert’s paradox.

The simplest way to introduce viscosity is by considering a Newtonian fluid. This is a fluid where the viscosity is a constant, and the relation of velocity differences and resulting shear forces is strictly linear. This leads to the the Navier–Stokes equation for incompressible fluids:

\displaystyle{ \partial_t u + u \cdot \nabla u - \nu \Delta u = - \nabla p }

If you think about molten plastic, or honey, you will notice that the viscosity actually depends on the temperature, and maybe also on the pressure and other parameters, of the fluid. The science that is concerned with the exploration of these effects is called rheology. This is an important research topic and the reason why producers of, say, plastic sheets, sometimes keep physicists around. But let’s stick to Newtownian fluids for now.

curved wall, lamiar to turbulent

Sending Viscosity to Zero: Boundary Layers

Since we get Euler’s equation if we set \nu = 0 in the above equation, the question is, if in some sense or another solutions of the Navier-Stokes equation converge to a solution of Euler’s equation in the limit of vanishing viscosity? If you had asked me, I would have guessed: No. The mathematical reason is that we have a transition from a second order partial differential equation to a first order one. This is usually called a singular perturbation. The physical reason is that a nonzero viscosity will give rise to phenomena like turbulence and energy dissipation that cannot occur in an ideal fluid. Well, the last argument shows that we cannot expect convergence for long times if eddies are present, so there certainly is a loophole.

The precise formulation of the last statement depends on the boundary conditions one chooses. One way is this: Let u be a smooth solution of Euler’s equation in \mathbb{R}^3 with sufficiently fast decay at infinity (this is our boundary condition), then the kinetic energy E

\displaystyle{ E =  \frac{1}{2} \int \| u \|^2 \; \mathrm{d x} }

is conserved for all time. This is not the only conserved quantity for Euler’s equation, but it’s a very important one.

But now, suppose u is a smooth solution of the Navier–Stokes equation in \mathbb{R}^3 with sufficiently fast decay at infinity. In this case we have

\displaystyle{ \frac{d E}{d t} =  - \nu \int \| \nabla \times u \|^2 \mathrm{d x} }

So, the presence of viscosity turns a conserved quantity into a decaying quantity! Since the 20th century, engineers have taken these effects into account following the idea of ‘boundary layers’ introduced by Ludwig Prandtl, as I already mentioned last time. Actually the whole technique of singular perturbation theory has been developed following this ansatz. This has become a mathematical technique to get asymptotic expansions of solutions of complicated nonlinear partial differential equations.

The idea is that the concept of ‘ideal’ fluid is good except at boundaries, where effects due to viscosity cannot be ignored. This is true for a lot of fluids like air and water, which have a very low viscosity. Therefore one tries to match a solution describing an ideal fluid flow far away from the boundaries with a specific solution for a viscous fluid with prescribed boundary conditions valid in a thin layer on the boundaries. This works quite well in applications. One of the major textbooks about this topic has been around for over 60 years now and has reached its 10th edition in German. It is:

• H. Schlichting and K. Gersten: Boundary-Layer Theory, 8th edition, Springer, Berlin, 2000.

Since I am also interested in numerical models and applications in engineering, I should probably read it. (I don’t know when the 10th edition will be published in English.)

flow on a sphere, lamiar to turbulent

Sending Viscosity to Zero: Convergence Results

However, this approach does not tell us under what circumstances we can expect convergence of solutions u_{\nu} to the viscous Navier–Stokes equations with viscosity \nu > 0 to a solution u_{0} of Euler’s equation with zero viscosity. That is, are there such solutions, boundary and initial conditions and a topology on an appropriate topological vector space, such that

\lim_{\nu \to 0} u_{\nu} = u_{0} \; \text{?}

I asked this question over on Mathoverflow and got some interesting answers. Obviously, a lot of brain power has gone into this question, and there are both interesting positive and negative results. As an example, let me describe a very interesting positive result. I learned about it from this book:

• Andrew J. Majda and Andrea L. Bertozzi, Vorticity and Incompressible Flow, Cambridge University Press, Cambridge, 2001.

It’s Proposition 3.2 in this book. There are three assumptions that we need to make in order for things to work out:

• First, we need to fix an interval [0, T] for the time. As mentioned above, we should not expect that we can get convergence for an unbounded time interval like [0, \infty].

• Secondly, we need to assume that solutions u_{\nu} of the Navier–Stokes equation and a solution u_0 of Euler’s equation exist and are smooth.

• Thirdly we will dodge the issue of boundary layers by assuming that the solutions exist on the whole of \mathbb{R}^3 with sufficiently fast decay. As I already mentioned above, a viscous fluid will of course show very different behavior at a boundary than an ideal (that is, nonviscous) one. Our third assumption means that there is no such boundary layer present.

We will denote the L^{\infty} norm on vector fields by \| \cdot \| and use the big O notation.

Given our three assumptions, Proposition 3.2 says that:

\displaystyle{ \mathrm{sup}_{0 \le t \le T} \; \| u_{\nu} - u_0 \| = O(\nu) }

It also gives the convergence of the derivatives:

\displaystyle{ \int_0^T \| \nabla (u_{\nu} - u_0) \| \; d t =  O(\nu^{1/2}) }

This result illustrates that the boundary layer ansatz may work, because the ideal fluid approximation could be a good one away from boundaries and for fluids with low viscosity like water and air.

flow around a symmetric airfoil, laminar to turbulent

So, when John von Neumann joked that “ideal water” is like “dry water”, I would have said: “Well, that is half right”.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers