High-Speed Finance

8 August, 2012

 

These days, a lot of buying and selling of stocks is done by computers—it’s called algorithmic trading. Computers can do it much faster than people. Watch how they’ve been going wild!

The date is at lower left. In 2000 it took several seconds for computers to make a trade. By 2010 the time had dropped to milliseconds… or even microseconds. And around this year, market activity started becoming much more intense.

I can’t even see the Flash Crash on May 6 of 2010—also known as The Crash of 2:45. The Dow Jones plummeted 9% in 5 minutes, then quickly bounced back. For fifteen minutes, the economy lost a trillion dollars. Then it reappeared.

But on August 5, 2011, when the credit rating of the US got downgraded, you’ll see the activity explode! And it’s been crazy ever since.

The movie above was created by Nanex, a company that provides market data to traders. The x axis shows the time of day, from 9:30 to 16:00. The y axis… well, it’s the amount of some activity per unit time, but they don’t say what. Do you know?

The folks at Nanex have something very interesting to say about this. It’s not high frequency trading or ‘HFT’ that they’re worried about—that’s actually gone down slightly from 2008 to 2012. What’s gone up is ‘high frequency quoting’, also known as ‘quote spam’ or ‘quote stuffing’.

Over on Google+, Sergey Ten explained the idea to me:

Quote spam is a well-known tactic. It used by high-frequency traders to get competitive advantage over other high-frequency traders. HF traders generate high-frequency quote spam using a pseudorandom (or otherwise structured) algorithm, with his computers coded to ignore it. His competitors don’t know the generating algorithm and have to process each quote, thus increasing their load, consuming bandwidth and getting a slight delay in processing.

A quote is an offer to buy or sell stock at a given price. For a clear and entertaining of how this works and why traders are locked into a race for speed, try:

• Chris Stucchio, A high frequency trader’s apology, Part 1, 16 April 2012. Part 2, 25 April 2012.

I don’t know a great introduction to quote spam, but this paper isn’t bad:

• Jared F. Egginton, Bonnie F. Van Ness, and Robert A. Van Ness, Quote stuffing, 15 March 2012.

Toward the physical limits of speed

In fact, the battle for speed is so intense that trading has run up against the speed of light.

For example, by 2013 there will be a new transatlantic cable at the bottom of the ocean, the first in a decade. Why? Just to cut the communication time between US and UK traders by 5 milliseconds. The new fiber optic line will be straighter than existing ones:

“As a rule of thumb, each 62 miles that the light has to travel takes about 1 millisecond,” Thorvardarson says. “So by straightening the route between Halifax and London, we have actually shortened the cable by 310 miles, or 5 milliseconds.”

Meanwhile, a London-based company called Fixnetix has developed a special computer chip that can prepare a trade in just 740 nanoseconds. But why stop at nanoseconds?

With the race for the lowest “latency” continuing, some market participants are even talking about picoseconds––trillionths of a second. At first the realm of physics and math and then computer science, the picosecond looms as the next time barrier.

Actions that take place in nanoseconds and picoseconds in some cases run up against the sheer limitations of physics, said Mark Palmer, chief executive of Lexington, Mass.-based StreamBase Systems.

Black swans and the ultrafast machine ecology

As high-frequency trading and high-frequency quoting leave slow-paced human reaction times in the dust, markets start to behave differently. Here’s a great paper about that:

• Neil Johnson, Guannan Zhao, Eric Hunsader, Jing Meng, Amith Ravindar, Spencer Carran amd Brian Tivnan, Financial black swans driven by ultrafast machine ecology.

A black swan is an unexpectedly dramatic event, like a market crash or a stock bubble that bursts. But according to this paper, such events are now happening all the time at speeds beyond our perception!

Here’s one:

It’s a price spike in the stock of a company called Super Micro Computer, Inc.. On October 1st, 2010, it shot up 26% and then crashed back down. But this all happened in 25 milliseconds!

These ultrafast black swans happen at least once a day. And they happen most of all to financial institutions.

Here’s a great blog article about this stuff:

• Mark Buchanan, Approaching the singularity—in global finance, The Physics of Finance, 13 February 2012.

I won’t try to outdo Buchanan’s analysis. I’ll just quote the abstract of the original paper:

Society’s drive toward ever faster socio-technical systems, means that there is an urgent need to understand the threat from ‘black swan’ extreme events that might emerge. On 6 May 2010, it took just five minutes for a spontaneous mix of human and machine interactions in the global trading cyberspace to generate an unprecedented system-wide Flash Crash. However, little is known about what lies ahead in the crucial sub-second regime where humans become unable to respond or intervene sufficiently quickly. Here we analyze a set of 18,520 ultrafast black swan events that we have uncovered in stock-price movements between 2006 and 2011. We provide empirical evidence for, and an accompanying theory of, an abrupt system-wide transition from a mixed human-machine phase to a new all-machine phase characterized by frequent black swan events with ultrafast durations (<650ms for crashes, <950ms for spikes). Our theory quantifies the systemic fluctuations in these two distinct phases in terms of the diversity of the system's internal ecology and the amount of global information being processed. Our finding that the ten most susceptible entities are major international banks, hints at a hidden relationship between these ultrafast 'fractures' and the slow 'breaking' of the global financial system post-2006. More generally, our work provides tools to help predict and mitigate the systemic risk developing in any complex socio-technical system that attempts to operate at, or beyond, the limits of human response times.

Trans-quantitative analysts?

When you get into an arms race of trying to write algorithms whose behavior other algorithms can’t predict, the math involved gets very tricky. Over on Google+, F. Lengvel pointed out something strange. In May 2010, Christian Marks claimed that financiers were hiring experts on large ordinals—crudely speaking, big infinite numbers!—to design algorithms that were hard to outwit.

I can’t confirm his account, but I can’t resist quoting it:

In an unexpected development for the depressed market for mathematical logicians, Wall Street has begun quietly and aggressively recruiting proof theorists and recursion theorists for their expertise in applying ordinal notations and ordinal collapsing functions to high-frequency algorithmic trading. Ordinal notations, which specify sequences of ordinal numbers of ever increasing complexity, are being used by elite trading operations to parameterize families of trading strategies of breathtaking sophistication.

The monetary advantage of the current strategy is rapidly exhausted after a lifetime of approximately four seconds — an eternity for a machine, but barely enough time for a human to begin to comprehend what happened. The algorithm then switches to another trading strategy of higher ordinal rank, and uses this for a few seconds on one or more electronic exchanges, and so on, while opponent algorithms attempt the same maneuvers, risking billions of dollars in the process.

The elusive and highly coveted positions for proof theorists on Wall Street, where they are known as trans-quantitative analysts, have not been advertised, to the chagrin of executive recruiters who work on commission. Elite hedge funds and bank holding companies have been discreetly approaching mathematical logicians who have programming experience and who are familiar with arcane software such as the ordinal calculator. A few logicians were offered seven figure salaries, according to a source who was not authorized to speak on the matter.

Is this for real? I like the idea of ‘trans-quantitative analysts’: it reminds me of ‘transfinite numbers’, which is another name for infinities. But it sounds a bit like a joke, and I haven’t been able to track down any references to trans-quantitative analysts, except people talking about Christian Marks’ blog article.

I understand a bit about ordinal notations, but I don’t think this is the time to go into that—not before I’m sure this stuff is for real. Instead, I’d rather reflect on a comment of Boris Borcic over on Google+:

Last week it occurred to me that LessWrong and OvercomingBias together might play a role to explain why Singularists don’t seem to worry about High Frequency Robot Trading as a possible pathway for Singularity-like developments. I mean IMO they should, the Singularity is about machines taking control, ownership is control, HFT involves slicing ownership in time-slices too narrow for humans to know themselves owners and possibly control.

The ensuing discussion got diverted to the question of whether algorithmic trading involved ‘intelligence’, but maybe intelligence is not the point. Perhaps algorithmic traders have become successful parasites on the financial system without themselves being intelligent, simply by virtue of their speed. And the financial system, in turn, seems to be a successful parasite on our economy. Regulatory capture—the control of the regulating agencies by the industry they’re supposed to be regulating—seems almost complete. Where will this lead?


Network Theory (Part 20)

6 August, 2012

guest post by Jacob Biamonte

We’re in the middle of a battle: in addition to our typical man versus equation scenario, it’s a battle between two theories. For those good patrons following the network theory series, you know the two opposing forces well. It’s our old friends, at it again:

Stochastic Mechanics vs Quantum Mechanics!

Today we’re reporting live from a crossroads, and we’re facing a skirmish that gives rise to what some might consider a paradox. Let me sketch the main thesis before we get our hands dirty with the gory details.

First I need to tell you that the battle takes place at the intersection of stochastic and quantum mechanics. We recall from Part 16 that there is a class of operators called ‘Dirichlet operators’ that are valid Hamiltonians for both stochastic and quantum mechanics. In other words, you can use them to generate time evolution both for old-fashioned random processes and for quantum processes!

Staying inside this class allows the theories to fight it out on the same turf. We will be considering a special subclass of Dirichlet operators, which we call ‘irreducible Dirichlet operators’. These are the ones where starting in any state in our favorite basis of states, we have a nonzero chance of winding up in any other. When considering this subclass, we found something interesting:

Thesis. Let H be an irreducible Dirichlet operator with n eigenstates. In stochastic mechanics, there is only one valid state that is an eigenvector of H: the unique so-called ‘Perron–Frobenius state’. The other n-1 eigenvectors are forbidden states of a stochastic system: the stochastic system is either in the Perron–Frobenius state, or in a superposition of at least two eigensvectors. In quantum mechanics, all n eigenstates of H are valid states.

This might sound like a riddle, but today as we’ll prove, riddle or not, it’s a fact. If it makes sense, well that’s another issue. As John might have said, it’s like a bone kicked down from the gods up above: we can either choose to chew on it, or let it be. Today we are going to do a bit of chewing.

One of the many problems with this post is that John had a nut loose on his keyboard. It was not broken! I’m saying he wrote enough blog posts on this stuff to turn them into a book. I’m supposed to be compiling the blog articles into a massive LaTeX file, but I wrote this instead.

Another problem is that this post somehow seems to use just about everything said before, so I’m going to have to do my best to make things self-contained. Please bear with me as I try to recap what’s been done. For those of you familiar with the series, a good portion of the background for what we’ll cover today can be found in Part 12 and Part 16.

At the intersection of two theories

As John has mentioned in his recent talks, the typical view of how quantum mechanics and probability theory come into contact looks like this:

The idea is that quantum theory generalizes classical probability theory by considering observables that don’t commute.

That’s perfectly valid, but we’ve been exploring an alternative view in this series. Here quantum theory doesn’t subsume probability theory, but they intersect:

What goes in the middle you might ask? As odd as it might sound at first, John showed in Part 16 that electrical circuits made of resistors constitute the intersection!

For example, a circuit like this:

gives rise to a Hamiltonian H that’s good both for stochastic mechanics and quantum mechanics. Indeed, he found that the power dissipated by a circuit made of resistors is related to the familiar quantum theory concept known as the expectation value of the Hamiltonian!

\textrm{power} = -2 \langle \psi, H \psi \rangle

Oh—and you might think we made a mistake and wrote our Ω (ohm) symbols upside down. We didn’t. It happens that ℧ is the symbol for a ‘mho’—a unit of conductance that’s the reciprocal of an ohm. Check out Part 16 for the details.

Stochastic mechanics versus quantum mechanics

Let’s recall how states, time evolution, symmetries and observables work in the two theories. Today we’ll fix a basis for our vector space of states, and we’ll assume it’s finite-dimensional so that all vectors have n components over either the complex numbers \mathbb{C} or the reals \mathbb{R}. In other words, we’ll treat our space as either \mathbb{C}^n or \mathbb{R}^n. In this fashion, linear operators that map such spaces to themselves will be represented as square matrices.

Vectors will be written as \psi_i where the index i runs from 1 to n, and we think of each choice of the index as a state of our system—but since we’ll be using that word in other ways too, let’s call it a configuration. It’s just a basic way our system can be.

States

Besides the configurations i = 1,\dots, n, we have more general states that tell us the probability or amplitude of finding our system in one of these configurations:

Stochastic states are n-tuples of nonnegative real numbers:

\psi_i \in \mathbb{R}^+

The probability of finding the system in the ith configuration is defined to be \psi_i. For these probabilities to sum to one, \psi_i needs to be normalized like this:

\displaystyle{ \sum_i \psi_i = 1 }

or in the notation we’re using in these articles:

\displaystyle{ \langle \psi \rangle = 1 }

where we define

\displaystyle{ \langle \psi \rangle = \sum_i \psi_i }

Quantum states are n-tuples of complex numbers:

\psi_i \in \mathbb{C}

The probability of finding a state in the ith configuration is defined to be |\psi(x)|^2. For these probabilities to sum to one, \psi needs to be normalized like this:

\displaystyle{ \sum_i |\psi_i|^2 = 1 }

or in other words

\langle \psi,  \psi \rangle = 1

where the inner product of two vectors \psi and \phi is defined by

\displaystyle{ \langle \psi, \phi \rangle = \sum_i \overline{\psi}_i \phi_i }

Now, the usual way to turn a quantum state \psi into a stochastic state is to take the absolute value of each number \psi_i and then square it. However, if the numbers \psi_i happen to be nonnegative, we can also turn \psi into a stochastic state simply by multiplying it by a number to ensure \langle \psi \rangle = 1.

This is very unorthodox, but it lets us evolve the same vector \psi either stochastically or quantum-mechanically, using the recipes I’ll describe next. In physics jargon these correspond to evolution in ‘real time’ and ‘imaginary time’. But don’t ask me which is which: from a quantum viewpoint stochastic mechanics uses imaginary time, but from a stochastic viewpoint it’s the other way around!

Time evolution

Time evolution works similarly in stochastic and quantum mechanics, but with a few big differences:

• In stochastic mechanics the state changes in time according to the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

which has the solution

\psi(t) = \exp(t H) \psi(0)

• In quantum mechanics the state changes in time according to Schrödinger’s equation:

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

which has the solution

\psi(t) = \exp(-i t H) \psi(0)

The operator H is called the Hamiltonian. The properties it must have depend on whether we’re doing stochastic mechanics or quantum mechanics:

• We need H to be infinitesimal stochastic for time evolution given by \exp(t H) to send stochastic states to stochastic states. In other words, we need that (i) its columns sum to zero and (ii) its off-diagonal entries are real and nonnegative:

\displaystyle{ \sum_i H_{i j}=0 }

i\neq j \Rightarrow H_{i j}\geq 0

• We need H to be self-adjoint for time evolution given by \exp(-i t H) to send quantum states to quantum states. So, we need

H = H^\dagger

where we recall that the adjoint of a matrix is the conjugate of its transpose:

\displaystyle{ (H^\dagger)_{i j} := \overline{H}_{j i} }

We are concerned with the case where the operator H generates both a valid quantum evolution and also a valid stochastic one:

H is a Dirichlet operator if it’s both self-adjoint and infinitesimal stochastic.

We will soon go further and zoom in on this intersection! But first let’s finish our review.

Symmetries

As John explained in Part 12, besides states and observables we need symmetries, which are transformations that map states to states. These include the evolution operators which we only briefly discussed in the preceding subsection.

• A linear map U that sends quantum states to quantum states is called an isometry, and isometries are characterized by this property:

U^\dagger U = 1

• A linear map U that sends stochastic states to stochastic states is called a stochastic operator, and stochastic operators are characterized by these properties:

\displaystyle{ \sum_i U_{i j} = 1 }

and

U_{i j} \geq 0

A notable difference here is that in our finite-dimensional situation, isometries are always invertible, but stochastic operators may not be! If U is an n \times n matrix that’s an isometry, U^\dagger is its inverse. So, we also have

U U^\dagger = 1

and we say U is unitary. But if U is stochastic, it may not have an inverse—and even if it does, its inverse is rarely stochastic. This explains why in stochastic mechanics time evolution is often not reversible, while in quantum mechanics it always is.

Puzzle 1. Suppose U is a stochastic n \times n matrix whose inverse is stochastic. What are the possibilities for U?

It is quite hard for an operator to be a symmetry in both stochastic and quantum mechanics, especially in our finite-dimensional situation:

Puzzle 2. Suppose U is an n \times n matrix that is both stochastic and unitary. What are the possibilities for U?

Observables

‘Observables’ are real-valued quantities that can be measured, or predicted, given a specific theory.

• In quantum mechanics, an observable is given by a self-adjoint matrix O, and the expected value of the observable O in the quantum state \psi is

\displaystyle{ \langle \psi , O \psi \rangle = \sum_{i,j} \overline{\psi}_i O_{i j} \psi_j }

• In stochastic mechanics, an observable O has a value O_i in each configuration i, and the expected value of the observable O in the stochastic state \psi is

\displaystyle{ \langle O \psi \rangle = \sum_i O_i \psi_i }

We can turn an observable in stochastic mechanics into an observable in quantum mechanics by making a diagonal matrix whose diagonal entries are the numbers O_i.

From graphs to matrices

Back in Part 16, John explained how a graph with positive numbers on its edges gives rise to a Hamiltonian in both quantum and stochastic mechanics—in other words, a Dirichlet operator.

Here’s how this works. We’ll consider simple graphs: graphs without arrows on their edges, with at most one edge from one vertex to another, with no edges from a vertex to itself. And we’ll only look at graphs with finitely many vertices and edges. We’ll assume each edge is labelled by a positive number, like this:

If our graph has n vertices, we can create an n \times n matrix A where A_{i j} is the number labelling the edge from i to j, if there is such an edge, and 0 if there’s not. This matrix is symmetric, with real entries, so it’s self-adjoint. So A is a valid Hamiltonian in quantum mechanics.

How about stochastic mechanics? Remember that a Hamiltonian H in stochastic mechanics needs to be ‘infinitesimal stochastic’. So, its off-diagonal entries must be nonnegative, which is indeed true for our A, but also the sums of its columns must be zero, which is not true when our A is nonzero.

But now comes the best news you’ve heard all day: we can improve A to a stochastic operator in a way that is completely determined by A itself! This is done by subtracting a diagonal matrix L whose entries are the sums of the columns of A:

L_{i i} = \sum_i A_{i j}

i \ne j \Rightarrow L_{i j} = 0

It’s easy to check that

H = A - L

is still self-adjoint, but now also infinitesimal stochastic. So, it’s a Dirichlet operator: a good Hamiltonian for both stochastic and quantum mechanics!

In Part 16, we saw a bit more: every Dirichlet operator arises this way. It’s easy to see. You just take your Dirichlet operator and make a graph with one edge for each nonzero off-diagonal entry. Then you label the edge with this entry.

So, Dirichlet operators are essentially the same as finite simple graphs with edges labelled by positive numbers.

Now, a simple graph can consist of many separate ‘pieces’, called components. Then there’s no way for a particle hopping along the edges to get from one component to another, either in stochastic or quantum mechanics. So we might as well focus our attention on graphs with just one component. These graphs are called ‘connected’. In other words:

Definition. A simple graph is connected if it is nonempty and there is a path of edges connecting any vertex to any other.

Our goal today is to understand more about Dirichlet operators coming from connected graphs. For this we need to learn the Perron–Frobenius theorem. But let’s start with something easier.

Perron’s theorem

In quantum mechanics it’s good to think about observables that have positive expected values:

\langle \psi, O \psi \rangle > 0

for every quantum state \psi \in \mathbb{C}^n. These are called positive definite. But in stochastic mechanics it’s good to think about matrices that are positive in a more naive sense:

Definition. An n \times n real matrix T is positive if all its entries are positive:

T_{i j} > 0

for all 1 \le i, j \le n.

Similarly:

Definition. A vector \psi \in \mathbb{R}^n is positive if all its components are positive:

\psi_i > 0

for all 1 \le i \le n.

We’ll also define nonnegative matrices and vectors in the same way, replacing > 0 by \ge 0. A good example of a nonnegative vector is a stochastic state.

In 1907, Perron proved the following fundamental result about positive matrices:

Perron’s Theorem. Given a positive square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| < r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T \psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

In other words, if T is positive, it has a unique eigenvalue with the largest absolute value. This eigenvalue is positive. Up to a constant factor, it has an unique eigenvector. We can choose this eigenvector to be positive. And then, up to a constant factor, it’s the only nonnegative eigenvector of T.

From matrices to graphs

The conclusions of Perron’s theorem don’t hold for matrices that are merely nonnegative. For example, these matrices

\left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) , \qquad \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right)

are nonnegative, but they violate lots of the conclusions of Perron’s theorem.

Nonetheless, in 1912 Frobenius published an impressive generalization of Perron’s result. In its strongest form, it doesn’t apply to all nonnegative matrices; only to those that are ‘irreducible’. So, let us define those.

We’ve seen how to build a matrix from a graph. Now we need to build a graph from a matrix! Suppose we have an n \times n matrix T. Then we can build a graph G_T with n vertices where there is an edge from the ith vertex to the jth vertex if and only if T_{i j} \ne 0.

But watch out: this is a different kind of graph! It’s a directed graph, meaning the edges have directions, there’s at most one edge going from any vertex to any vertex, and we do allow an edge going from a vertex to itself. There’s a stronger concept of ‘connectivity’ for these graphs:

Definition. A directed graph is strongly connected if there is a directed path of edges going from any vertex to any other vertex.

So, you have to be able to walk along edges from any vertex to any other vertex, but always following the direction of the edges! Using this idea we define irreducible matrices:

Definition. A nonnegative square matrix T is irreducible if its graph G_T is strongly connected.

The Perron–Frobenius theorem

Now we are ready to state:

The Perron-Frobenius Theorem. Given an irreducible nonnegative square matrix T, there is a positive real number r, called the Perron–Frobenius eigenvalue of T, such that r is an eigenvalue of T and any other eigenvalue \lambda of T has |\lambda| \le r. Moreover, there is a positive vector \psi \in \mathbb{R}^n with T\psi = r \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of T must be a scalar multiple of \psi.

The only conclusion of this theorem that’s weaker than those of Perron’s theorem is that there may be other eigenvalues with |\lambda| = r. For example, this matrix is irreducible and nonnegative:

\left( \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right)

Its Perron–Frobenius eigenvalue is 1, but it also has -1 as an eigenvalue. In general, Perron-Frobenius theory says quite a lot about the other eigenvalues on the circle |\lambda| = r, but we won’t need that fancy stuff here.

Perron–Frobenius theory is useful in many ways, from highbrow math to ranking football teams. We’ll need it not just today but also later in this series. There are many books and other sources of information for those that want to take a closer look at this subject. If you’re interested, you can search online or take a look at these:

• Dimitrious Noutsos, Perron Frobenius theory and some extensions, 2008. (Includes proofs of the basic theorems.)

• V. S. Sunder, Perron Frobenius theory, 18 December 2009. (Includes applications to graph theory, Markov chains and von Neumann algebras.)

• Stephen Boyd, Lecture 17: Perron Frobenius theory, Winter 2008-2009. (Includes a max-min characterization of the Perron–Frobenius eigenvalue and applications to Markov chains, economics, population growth and power control.)

I have not taken a look myself, but if anyone is interested and can read German, the original work appears here:

• Oskar Perron, Zur Theorie der Matrizen, Math. Ann. 64 (1907), 248–263.

• Georg Frobenius, Über Matrizen aus nicht negativen Elementen, S.-B. Preuss Acad. Wiss. Berlin (1912), 456–477.

And, of course, there’s this:

• Wikipedia, Perron–Frobenius theorem.

It’s got a lot of information.

Irreducible Dirichlet operators

Now comes the payoff. We saw how to get a Dirichlet operator H from any finite simple graph with edges labelled by positive numbers. Now let’s apply Perron–Frobenius theory to prove our thesis.

Unfortunately, the matrix H is rarely nonnegative. If you remember how we built it, you’ll see its off-diagonal entries will always be nonnegative… but its diagonal entries can be negative.

Luckily, we can fix this just by adding a big enough multiple of the identity matrix to H! The result is a nonnegative matrix

T = H + c I

where c > 0 is some large number. This matrix T has the same eigenvectors as H. The off-diagonal matrix entries of T are the same as those of H, so T_{i j} is nonzero for i \ne j exactly when the graph we started with has an edge from i to j. So, for i \ne j, the graph G_T will have an directed edge going from i to j precisely when our original graph had an edge from i to j. And that means that if our original graph was connected, G_T will be strongly connected. Thus, by definition, the matrix T is irreducible!

Since T is nonnegative and irreducible, the Perron–Frobenius theorem swings into action and we conclude:

Lemma. Suppose H is the Dirichlet operator coming from a connected finite simple graph with edges labelled by positive numbers. Then the eigenvalues of H are real. Let \lambda be the largest eigenvalue. Then there is a positive vector \psi \in \mathbb{R}^n with H\psi = \lambda \psi. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. The eigenvalues of H are real since H is self-adjoint. Notice that if r is the Perron–Frobenius eigenvalue of T = H + c I and

T \psi = r \psi

then

H \psi = (r - c)\psi

By the Perron–Frobenius theorem the number r is positive, and it has the largest absolute value of any eigenvalue of T. Thanks to the subtraction, the eigenvalue r - c may not have the largest absolute value of any eigenvalue of H. It is, however, the largest eigenvalue of H, so we take this as our \lambda. The rest follows from the Perron–Frobenius theorem.   █

But in fact we can improve this result, since the largest eigenvalue \lambda is just zero. Let’s also make up a definition, to make our result sound more slick:

Definition. A Dirichlet operator is irreducible if it comes from a connected finite simple graph with edges labelled by positive numbers.

This meshes nicely with our earlier definition of irreducibility for nonnegative matrices. Now:

Theorem. Suppose H is an irreducible Dirichlet operator. Then H has zero as its largest real eigenvalue. There is a positive vector \psi \in \mathbb{R}^n with H\psi = 0. Any other vector with this property is a scalar multiple of \psi. Furthermore, any nonnegative vector that is an eigenvector of H must be a scalar multiple of \psi.

Proof. Choose \lambda as in the Lemma, so that H\psi = \lambda \psi. Since \psi is positive we can normalize it to be a stochastic state:

\displaystyle{ \sum_i \psi_i = 1 }

Since H is a Dirichlet operator, \exp(t H) sends stochastic states to stochastic states, so

\displaystyle{ \sum_i (\exp(t H) \psi)_i = 1 }

for all t \ge 0. On the other hand,

\displaystyle{ \sum_i (\exp(t H)\psi)_i = \sum_i e^{t \lambda} \psi_i = e^{t \lambda} }

so we must have \lambda = 0.   █

What’s the point of all this? One point is that there’s a unique stochastic state \psi that’s an equilibrium state: since H \psi = 0, it doesn’t change with time. It’s also globally stable: since all the other eigenvalues of H are negative, all other stochastic states converge to this one as time goes forward.

An example

There are many examples of irreducible Dirichlet operators. For instance, in Part 15 we talked about graph Laplacians. The Laplacian of a connected simple graph is always irreducible. But let us try a different sort of example, coming from the picture of the resistors we saw earlier:

Let’s create a matrix A whose entry A_{i j} is the number labelling the edge from i to j if there is such an edge, and zero otherwise:

A = \left(  \begin{array}{ccccc}   0 & 2 & 1 & 0 & 1 \\   2 & 0 & 0 & 1 & 1 \\   1 & 0 & 0 & 2 & 1 \\   0 & 1 & 2 & 0 & 1 \\   1 & 1 & 1 & 1 & 0  \end{array}  \right)

Remember how the game works. The matrix A is already a valid Hamiltonian for quantum mechanics, since it’s self adjoint. However, to get a valid Hamiltonian for both stochastic and quantum mechanics—in other words, a Dirichlet operator—we subtract the diagonal matrix L whose entries are the sums of the columns of A. In this example it just so happens that the column sums are all 4, so L = 4 I, and our Dirichlet operator is

H = A - 4 I = \left(  \begin{array}{rrrrr}   -4 & 2 & 1 & 0 & 1 \\   2 & -4 & 0 & 1 & 1 \\   1 & 0 & -4 & 2 & 1 \\   0 & 1 & 2 & -4 & 1 \\   1 & 1 & 1 & 1 & -4  \end{array}  \right)

We’ve set up this example so it’s easy to see that the vector \psi = (1,1,1,1,1) has

H \psi = 0

So, this is the unique eigenvector for the eigenvalue 0. We can use Mathematica to calculate the remaining eigenvalues of H. The set of eigenvalues is

\{0, -7, -8, -8, -3 \}

As we expect from our theorem, the largest real eigenvalue is 0. By design, the eigenstate associated to this eigenvalue is

| v_0 \rangle = (1, 1, 1, 1, 1)

(This funny notation for vectors is common in quantum mechanics, so don’t worry about it.) All the other eigenvectors fail to be nonnegative, as predicted by the theorem. They are:

| v_1 \rangle = (1, -1, -1, 1, 0)
| v_2 \rangle = (-1, 0, -1, 0, 2)
| v_3 \rangle = (-1, 1, -1, 1, 0)
| v_4 \rangle = (-1, -1, 1, 1, 0)

To compare the quantum and stochastic states, consider first |v_0\rangle. This is the only eigenvector that can be normalized to a stochastic state. Remember, a stochastic state must have nonnegative components. This rules out |v_1\rangle through to |v_4\rangle as valid stochastic states, no matter how we normalize them! However, these are allowed as states in quantum mechanics, once we normalize them correctly. For a stochastic system to be in a state other than the Perron–Frobenius state, it must be a linear combination of least two eigenstates. For instance,

\psi_a = (1-a)|v_0\rangle + a |v_1\rangle

can be normalized to give stochastic state only if 0 \leq a \leq \frac{1}{2}.

And, it’s easy to see that it works this way for any irreducible Dirichlet operator, thanks to our theorem. So, our thesis has been proved true!

Puzzles on irreducibility

Let us conclude with a couple more puzzles. There are lots of ways to characterize irreducible nonnegative matrices; we don’t need to mention graphs. Here’s one:

Puzzle 3. Let T be a nonnegative n \times n matrix. Show that T is irreducible if and only if for all i,j \ge 0, (T^m)_{i j} > 0 for some natural number m.

You may be confused because today we explained the usual concept of irreducibility for nonnegative matrices, but also defined a concept of irreducibility for Dirichlet operators. Luckily there’s no conflict: Dirichlet operators aren’t nonnegative matrices, but if we add a big multiple of the identity to a Dirichlet operator it becomes a nonnegative matrix, and then:

Puzzle 4. Show that a Dirichlet operator H is irreducible if and only if the nonnegative operator H + c I (where c is any sufficiently large constant) is irreducible.

Irreducibility is also related to the nonexistence of interesting conserved quantities. In Part 11 we saw a version of Noether’s Theorem for stochastic mechanics. Remember that an observable O in stochastic mechanics assigns a number O_i to each configuration i = 1, \dots, n. We can make a diagonal matrix with O_i as its diagonal entries, and by abuse of language we call this O as well. Then we say O is a conserved quantity for the Hamiltonian H if the commutator [O,H] = O H - H O vanishes.

Puzzle 5. Let H be a Dirichlet operator. Show that H is irreducible if and only if every conserved quantity O for H is a constant, meaning that for some c \in \mathbb{R} we have O_i = c for all i. (Hint: examine the proof of Noether’s theorem.)

In fact this works more generally:

Puzzle 6. Let H be an infinitesimal stochastic matrix. Show that H + c I is an irreducible nonnegative matrix for all sufficiently large c if and only if every conserved quantity O for H is a constant.


Symmetry and the Fourth Dimension (Part 5)

3 August, 2012

Last time we saw that Platonic solids come in dual pairs, with the tetrahedron being dual to itself:

When you have a dual pair, you can start chopping off the corners of one, more and more, and keep going until you reach the other. Along the way you get some interesting shapes:

At certain points along the way, we get semiregular polyhedra, meaning that:

• all the faces are regular polygons, and

• there’s a symmetry carrying any corner to any other.

Let’s see how it goes with the cube/octahedron pair. And on the way, I’ll show you some diagrams that summarize what’s going on. I’ll explain them later, but you can try to guess the pattern. As a clue, I’ll say they’re based on the Coxeter diagram for the cube, which I explained last time:

V—4—E—3—F

Here we go!

Cube: •—4—o—3—o

First we have the cube, with all square faces:


Truncated cube: •—4—•—3—o

Then we get the truncated cube, with octagons and triangles as faces:


Cuboctahedron: o—4—•—3—o

Halfway through we get the aptly named cuboctahedron, with squares and triangles as faces:


Truncated octahedron: o—4—•—3—•

Then we get the truncated octahedron, with squares and hexagons as faces:


Octahedron: o—4—o—3—•

Then finally we get the octahedron, with triangles as faces:


Partial flags

Can you see what’s going on with the diagrams here?

cube •—4—o—3—o
truncated cube •—4—•—3—o
cuboctahedron o—4—•—3—o
truncated octahedron o—4—•—3—•
octahedron o—4—o—3—•

Clearly the black dots tend to move from left to right as we move down the chart, but there’s something much cooler and more precise going on. The black dots secretly say where the corners of the shapes are!

Let’s see how quickly I can explain this, and how quickly you can get what I’m talking about. Remember how I defined a ‘flag’ in Part 3?

No? Good, because today I’m going to call that a ‘complete flag’. So, given a Platonic solid, we’ll say a complete flag is a vertex, edge and face where the vertex lies on the edge and the edge lies on the face.

For example, here is a complete flag for a cube:

It’s the black vertex lying on the blue edge lying on the yellow face.

But the term ‘complete flag’ hints that there are also ‘partial flags’. And there are! A vertex-edge flag is a vertex and edge where the vertex lies on the edge. Here’s a vertex-edge flag for the cube:

Similarly, an edge-face flag is an edge and a face where the edge lies on the face. Here’s an edge-face flag for the cube:

You can see why they’re called partial flags: they’re different parts of a complete flag.

Now, fix the Coxeter diagram for the cube firmly in mind:

V—4—E—3—F

V for vertex, E for edge and F for face.

Then:

• a cube obviously has one corner for each vertex of the cube, so we draw it like this:

•—4—o—3—o

• a truncated cube has one corner for each vertex-edge flag of the cube, so we draw it like this:

•—4—•—3—o

• a cuboctahedron has one corner for each edge of the cube, so we draw it like this:

o—4—•—3—o

• a truncated octahedron has one corner for each edge-face flag of the cube, so we draw it like this:

o—4—•—3—•

• an octahedron has one corner for each face of the cube, so we draw it like this:

o—4—o—3—•

Alas, I don’t have the patience to draw all the pictures needed to explain this clearly; I’ll just grab the pictures I can get for free on Wikicommons. Here’s how an octahedron has a corner for each face of the cube:

And here’s how the cuboctahedron has a corner for each edge of the cube:

But these are the least interesting cases! It’s more interesting to see how the truncated cube has one corner for each vertex-edge flag of the cube. Do you see how it works? You have to imagine this truncated cube sitting inside a cube:


Then, notice that the truncated cube has 2 corners on each edge of the cube, one near each end. So, it has one corner for each vertex-edge flag of the cube!

Similarly, the truncated octahedron has one corner for each edge-face flag of the cube. But since I don’t have a great picture to help you see that, lets use duality to change our point of view. A face of the cube corresponds to a vertex of the octahedron. So, think of this truncated octahedron as sitting inside an octahedron:


The truncated octahedron has 4 corners near each vertex of the octahedron, one on each edge touching that vertex. In short, it has one corner for each vertex-edge flag of the octahedron. So it’s got one corner for each edge-face flag of the cube!

This change of viewpoint can be justified more thoroughly:

Puzzle. The diagrams we’ve been using were based on the Coxeter diagram for the cube. What would they look like if we based them on the Coxeter diagram for the octahedron instead?

By the way, the pretty pictures of solids with brass balls at the vertices were made by Tom Ruen using Robert Webb’s Stella software. They’re available on Wikicommons, and you can find most of them by clicking on the images here and looking around on the Wikipedia articles you’ll reach that way. On this blog, I try hard to make most images take you to more information when you click on them.


Increasing the Signal-to-Noise Ratio With More Noise

30 July, 2012

guest post by Glyn Adgie and Tim van Beek

Or: Are the Milankovich Cycles Causing the Ice Ages?

The Milankovich cycles are periodic changes in how the Earth orbits the Sun. One question is: can these changes can be responsible for the ice ages? On the first sight it seems impossible, because the changes are simply too small. But it turns out that we can find a dynamical system where a small periodic external force is actually strengthened by random ‘noise’ in the system. This phenomenon has been dubbed ‘stochastic resonance’ and has been proposed as an explanation for the ice ages.

In this blog post we would like to provide an introduction to it. We will look at a bistable toy example. And you’ll even get to play with an online simulation that some members of the Azimuth Project created, namely Glyn Adgie, Allan Erskine and Jim Stuttard!

Equations with noise

In This Weeks Finds, back in ‘week308’, we had a look at a model with ‘noise’. A lot of systems can be described by ordinary differential equations:

\displaystyle{ \frac{d x}{d t} = f(x, t)}

If f is nice, the time evolution of the system will be a nice smooth function x(t), like the trajectory of a thrown stone. But there are situations where we have some kind of noise, a chaotic, fluctuating influence, that we would like to take into account. This could be, for example, turbulence in the air flow around a rocket. Or, in our case, short term fluctuations of the weather of the earth. If we take this into account, we get an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) + W(t) }

where the W is some model of noise. In all the examples above, the noise is actually a simplified way to take into account fast degrees of freedom of the system at hand. This way we do not have to explicitly model these degrees of freedom, which is often impossible. But we could still try to formulate a model where long term influences of these degrees of freedom are included.

A way to make the above mathematically precise is to write the equation in the form

d x_t = f(x, t) d t + g(x, t) d W_t

as an Ito stochastic differential equation driven by a Wiener process.

The mathematics behind this is somewhat sophisticated, but we don’t need to delve into it in this blog post. For those who are interested in it, we have gathered some material here:

Stochastic differential equation, Azimuth Library.

Physicists and engineers who model systems with ordinary and partial differential equations should think of stochastic differential equations as another tool in the toolbox. With this tool, it is possible to easily model systems that exhibit new behavior.

What is stochastic resonance?

Listening to your friend who sits across the table, the noise around you may become a serious distraction, although humans are famous for their ability to filter the signal—what your friend is saying – from the noise—all other sounds, people’s chatter, dishes clattering etc. But could you imagine a situation where more noise makes it easier to detect the signal?

Picture a marble that sits in a double well potential:

double well potential

This graphic is taken from

• Roberto Benzi, Stochastic resonance: from climate to biology.

This review paper is also a nice introduction to the topic.

This simple model can be applied to various situations. Since we think about the ice ages, we can interpret the two metastable states at the two minima of the potential in this way: the minimum to the left at -1 corresponds to the Earth in an ice age. The minimum to the right at 1 corresponds to the Earth at warmer temperatures, as we know it today.

Let us add a small periodic force to the model. This periodic force corresponds to different solar input due to periodic changes in Earth’s orbit: the Milankovich cycles. There are actually several different cycles with different periods, but in our toy model we will consider only one force with one period.

If the force itself is too small to get the marble over the well that is between the two potential minima, we will not observe any periodic change in the position of the marble. But if we add noise to the motion, it may happen that the force together with the noise succeed in getting the marble from one minimum to the other! This is more likely to happen when the force is pushing the right way. So, it should happen with roughly the same periodicity as the force: we should see a ‘quasi-periodic’ motion of the marble.

But if we increase the noise more and more, the influence of the external force will become unrecognizable, and we will not recognize any pattern at all.

There should be a noise strength somewhere in between that makes the Milankovich cycles as visible as possible from the Earth’s temperature record. In other words, if we think of the periodic force as a ‘signal’, there should be some amount of noise that maximizes the ‘signal-to-noise ratio’. Can we find out what it is?

In order to find out, let us make our model precise. Let’s take the time-independent potential to be

\displaystyle{ V(x) = \frac{1}{4} x^4 - \frac{1}{2} x^2 }

This potential has two local minima at x = \pm 1. Let’s take the time-dependent periodic forcing, or ‘signal’, to be

F(t) = - A \; \sin(t)

The constant A is the amplitude of the periodic forcing. The time-dependent effective potential of the system is therefore:

\displaystyle{ V(x, t) = \frac{1}{4} x^4 - \frac{1}{2} x^2 - A \; \sin(t) x }

Including noise leads to the equation of motion, which is usually called a Langevin equation by physicists and chemists:

\begin{array}{ccl} dX &=& -V'(X_t, t) \; dt + \sqrt{2D} \; dW_t \\  \\  &=& (X_t - X_t^3 + A \;  \sin(t)) \; dt  + \sqrt{2D} \; dW_t  \end{array}

For more about what a Langevin equation is, see our article on stochastic differential equations.

This model has two free parameters, A and D. As already mentioned, A is the amplitude of the forcing. D is called the diffusion coefficient and represents the strength of noise. In our case it is a constant, but in more general models it could depend on space and time.

It is possible to write programs that generate simulations aka numerical approximations to solutions of stochastic differential equations. For those interested in how this works, there is an addendum at the bottom of the blog post that explains this. We can use such a program to model the three cases of small, high and optimal noise level.

In the following graphs, green is the external force, while red is the marble position. Here is a simulation with a low level of noise:

low noise level

As you can see, within the time of the simulation there is no transition from the metastable state at 1 to the one at -1. That is, in this situation Earth stays in the warm state.

Here is a simulation with a high noise level:

high noise level

The marble jumps wildly between the states. By inspecting the graph with your eyes only, you don’t see any pattern in it, do you?

And finally, here is a simulation where the noise level seems to be about right to make the signal visible via a quasi-periodic motion:

high noise level

Sometimes the marble makes a transition in phase with the force, sometimes it does not, sometimes it has a phase lag. It can even happen that the marble makes a transition while the force acts in the opposite direction!

Some members of the Azimuth crew have created an online model that calculates simulations of the model, for different values of the involved constants. You can see it here:

Stochastic resonance example, Azimuth Code Project.

We especially thank Allan Erskine and Jim Stuttard.

You can change the values using the sliders under the graphic and see what happens. You can also choose different ‘random seeds’, which means that the random numbers used in the simulation will be different.

Above, we asked if one could calculate for a fixed A the level of noise D that maximizes the signal to noise ratio. We have defined the model system, but we still need to define ‘signal to noise ratio’ in order to address the question.

There is no general definition that makes sense for all kinds of systems. For our model system, let’s assume that the expectation value of x(t) has a Fourier expansion like this:

\displaystyle{ \langle x(t) \rangle = \sum_{n = - \infty}^{\infty}  M_n \exp{(i n t)} }

Then one useful definition of the signal to noise ratio \eta is

\displaystyle{ \eta := \frac{|M_1|^2}{A^2}}

So, can one calculate the value of D, depending on A that maximizes \eta? We don’t know! Do you?

If you would like to know more, have a look at this page:

Stochastic resonance, Azimuth Library.

Also, read on for more about how we numerically solved our stochastic differential equation, and the design decisions we had to make to create a simulation that runs on your web browser.

For übernerds only

Numerically solving stochastic differential equations

Maybe you are asking yourself what is behind the online model? How does one ‘numerically solve’ a stochastic differential equation and in what sense are the graphs that you see there ‘close’ to the exact solution? No problem, we will explain it here!

First, you need C code like this:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define znew ((z=36969*(z&65535)+(z>>16)) << 16)
#define wnew ((w = 18000*(w&65535)+(w>>16))&65535)

static unsigned long z = 362436069, w = 521288629;

float rnor() {
	float x, y, v;

	x = ((signed long) (znew + wnew)) * 1.167239E-9;

	if (fabs(x) < 1.17741)
		return (x);

	y = (znew + wnew) * 2.328306E-10;

	if (log(y) < 0.6931472 - 0.5 * x * x)
		return x;

	x = (x > 0) ? 0.8857913 * (2.506628 - x) : -0.8857913 * (2.506628 + x);

	if (log(1.8857913 - y) < 0.5718733 - 0.5 * (x * x))
		return (x);

	do {
		v = ((signed long) (znew + wnew)) * 4.656613E-10;

		x = -log(fabs(v)) * 0.3989423;

		y = -log((znew + wnew)* 2.328306E-10);

	} while (y + y < x * x);

	return (v > 0 ? 2.506628 + x : 2.506628 - x);
}

int main(void) {

	int index = 0;

	for(index = 0; index < 10; index++) {
		printf("%f\n", rnor());
	}
	return EXIT_SUCCESS;
}

Pretty scary, huh? Do you see what it does? We don’t think anyone stands a chance to understand all the ‘magical constants’ in it. We pasted it here so you can go around and show it to friends who claim that they know the programming language C and understand every program in seconds, and see what they can tell you about it. We’ll explain it below.

Since this section is for übernerds, we will assume that you know stochastic processes in continuous time. Brownian motion on \mathbb{R} can be viewed as a probability distribution on the space of all continuous functions on a fixed interval, C[0, T]. This is also true for solutions of stochastic differential equations in general. A ‘numerical approximation’ to a stochastic differential equation is an discrete time approximation that samples this probability space.

Another way to think about it is to start with numerical approximations of ordinary differential equations. When you are handed such an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) }

with an initial condition x(0) = x_0, you know that there are different discrete approximation schemes to calculate an approximate solution to it. The simplest one is the Euler scheme. You need to choose a step size h and calculate values for x recursively using the formula

x_{n + 1} = x_n + f(x_n, t_n) \; h

with t_{n+1} = t_n + h. Connect the discrete values of x_n with straight lines to get your approximating function x^h. For h \to 0, you get convergence for example in the supremum norm of x^h(t) to the exact solution x(t):

\displaystyle{  \lim_{h \to 0} \| x^h - x \|_\infty = 0 }

It is possible to extend some schemes from ordinary differential equations to stochastic differential equations that involve certain random variables in every step. The best textbook on the subject that I know is this one:

• Peter E. Kloeden and Eckhard Platen, Numerical Solution of Stochastic Differential Equations, Springer, Berlin, 2010. (ZMATH review.)

The extension of the Euler scheme is very simple, deceptively so! It is much less straightforward to determine the extensions of higher Runge–Kutta schemes, for example. You have been warned! The Euler scheme for stochastic differential equations of the form

d X_t = f(X, t) d t + g(X, t) d W_t

is simply

X_{n + 1} = X_n + f(X_n, t_n) \; h + g(X_n, t_n) \; w_n

where all the w_n are independent random variables with a normal distribution N(0, h). If you write a computer program that calculates approximations using this scheme, you need a pseudorandom number generator. This is what the code we posted above is about. Actually, it is an efficient algorithm to calculate pseudorandom numbers with a normal distribution. It’s from this paper:

• George Marsaglia and Wai Wan Tsang, The Monty Python method for generating random variables, ACM Transactions on Mathematical Software 24 (1998), 341–350.

While an approximation to an ordinary differential equation approximates a function, an approximation to a stochastic differential equation approximates a probability distribution on C[0, T]. So we need a different concept of ‘convergence’ of the approximation for h \to 0. The two most important ones are called ‘strong’ and ‘weak’ convergence.

A discrete approximation X^h to an Ito process X converges strongly at time T if

\lim_{h \to 0} \; E(|X_T^h - X_T|) = 0

It is said to be of strong order \gamma if there is a constant C such that

E(|X_T^h - X_T|) \leq C h^{\gamma}

As we see from the definition, a strong approximation needs to approximate the path of the original process X at the end of the time interval [0, T].

What about weak convergence?

A discrete approximation X^h to an Ito process X converges weakly for a given class of functions G at a time T if for every function g \in G we have

\lim_{h \to 0} \; E(g(X_T^h)) - E(g(X_T)) = 0

where we assume that all appearing expectation values exist and are finite.

It is said to be of weak order \gamma with respect to G if there is a constant C such that for every function g \in G:

E(g(X_T^h)) - E(g(X_T)) \leq C h^{\gamma}

One speaks simply of ‘weak convergence’ when G is the set of all polynomials.

So weak convergence means that the approximation needs to approximate the probability distribution of the original process X. Weak and strong convergence are indeed different concepts. For example, the Euler scheme is of strong order 0.5 and of weak order 1.0.

Our simple online model implements the Euler scheme to calculate approximations to our model equation. So now you know it what sense it is indeed an approximation to the solution process!

Implementation details of the interactive online model

There appear to be three main approaches to implementing an interactive online model. It is assumed that all the main browsers understand Javascript.

1) Implement the model using a server-side programme that produces graph data in response to parameter settings, and use Javascript in the browser to plot the graphs and provide interaction inputs.

2) Pre-compute the graph data for various sets of parameter values, send these graphs to the browser, and use Javascript to select a graph and plot it. The computation of the graph data can be done on the designers machine, using their preferred language.

3) Implement the model entirely in Javascript.

Option 1) is only practical if one can run interactive programs on the server. While there is technology to do this, it is not something that most ISPs or hosting companies provide. You might be provided with Perl and PHP, but neither of these is a language of choice for numeric coding.

Option 2) has been tried. The problem with this approach is the large amount of pre-computed data required. For example, if we have 4 independent parameters, each of which takes on 10 values, then we need 10000 graphs. Also, it is not practical to provide much finer control of each parameter, as this increases the data size even more.

Option 3) looked doable. Javascript has some nice language features, such as closures, which are a good fit for implementing numerical methods for solving ODEs. The main question then is whether a typical Javascript engine can cope with the compute load.

One problem that arose with a pure Javascript implementation is the production of normal random deviates required for the numerical solution of an SDE. Javascript has a random() function, that produces uniform random deviates, and there are algorithms that could be implemented in Javascript to produce normal deviates from uniform deviates. The problems here are that there is no guarantee that all Javascript engines use the same PRNG, or that the PRNG is actually any good, and we cannot set a specific seed in Javascript. Even if the PRNG is adequate, we have the problem that different users will see different outputs for the same parameter settings. Also, reloading the page will reseed the PRNG from some unknown source, so a user will not be able to replicate a particular output.

The chosen solution was to pre-compute a sequence of random normal deviates, using a PRNG of known characteristics. This sequence is loaded from the server, and will naturally be the same for all users and sessions. This is the C++ code used to create the fixed Javascript array:

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <iostream>

int main(int argc, char * argv[])
{
    boost::mt19937 engine(1234); // PRNG with seed = 1234
    boost::normal_distribution<double> rng(0.0, 1.0); // Mean 0.0, standard deviation 1.0
    std::cout << "normals = [\n";
    int nSamples = 10001;
    while(nSamples--)
    {
        std::cout << rng(engine) << ",\n";
    }
    std::cout << "];";
}

Deviates with a standard deviation other than 1.0 can be produced in Javascript by multiplying the pre-computed deviates by a suitable scaling factor.

The graphs only use 1000 samples from the Javascript array. To see the effects of different sample paths, the Javascript code selects one of ten slices from the pre-computed array.


The Noisy Channel Coding Theorem

28 July, 2012

Here’s a charming, easily readable tale of Claude Shannon and how he came up with information theory:

• Erico Guizzo, The Essential Message: Claude Shannon and the Making of Information Theory.

I hadn’t known his PhD thesis was on genetics! His master’s thesis introduced Boolean logic to circuit design. And as a kid, he once set up a telegraph line to a friend’s house half a mile away.

So, he was perfectly placed to turn information into a mathematical quantity, deeply related to entropy, and prove some now-famous theorems about it.

These theorems set limits on how much information we can transmit through a noisy channel. More excitingly, they say we can cook up coding schemes that let us come as close as we want to this limit, with an arbitrarily low probability of error.

As Erico Guizzo points out, these results are fundamental to the ‘information age’ we live in today:

Can we transmit, say, a high-resolution picture over a telephone line? How long will that take? Is there a best way to do it?

Before Shannon, engineers had no clear answers to these questions. At that time, a wild zoo of technologies was in operation, each with a life of its own—telephone, telegraph, radio, television, radar, and a number of other systems developed during the war. Shannon came up with a unifying, general theory of communication. It didn’t matter whether you transmitted signals using a copper wire, an optical fiber, or a parabolic dish. It didn’t matter if you were transmitting text, voice, or images. Shannon envisioned communication in abstract, mathematical terms; he defined what the once fuzzy concept of “information” meant for communication engineers and proposed a precise way to quantify it. According to him, the information content of any kind of message could be measured in binary digits, or just bits—a name suggested by a colleague at Bell Labs. Shannon took the bit as the fundamental unit in information theory. It was the first time that the term appeared in print.

So, I want to understand Shannon’s theorems and their proofs—especially because they clarify the relation between information and entropy, two concepts I’d like to be an expert on. It’s sort of embarrassing that I don’t already know this stuff! But I thought I’d post some preliminary remarks anyway, in case you too are trying to learn this stuff, or in case you can help me.

There are various different theorems I should learn. For example:

• The source coding theorem says it’s impossible to compress a stream of data to make the average number of bits per symbol in the compressed data less than the Shannon information of the source, without some of the data almost certainly getting lost. However, you can make the number of bits per symbols arbitrarily close to the Shannon entropy with a probability of error as small as you like.

• the noisy channel coding theorem is a generalization to data sent over a noisy channel.

The proof of the noisy channel coding theorem seems not so bad—there’s a sketch of a proof in the Wikipedia article on this theorem. But many theorems have a hard lemma at their heart, and for this one it’s a result in probability theory called the asymptotic equipartition property.

You should not try to dodge the hard lemma at the heart of the theorem you’re trying to understand: there’s a reason it’s there. So what’s the asymptotic equipartition property?

Here’s a somewhat watered-down statement that gets the basic idea across. Suppose you have a method of randomly generating letters—for example, a probability distribution on the set of letters. Suppose you randomly generate a string of n letters, and compute -(1/n) times the logarithm of the probability that you got that string. Then as n \to \infty this number ‘almost surely’ approaches some number S. What’s this number S? It’s the entropy of the probability distribution you used to generate those letters!

(Almost surely is probability jargon for ‘with probability 100%’, which is not the same as ‘always’.)

This result is really cool—definitely worth understanding in its own right! It says that while many strings are possible, the ones you’re most likely to see lie in a certain ‘typical set’. The ‘typical’ strings are the ones where when you compute -(1/n) times the log of their probability, the result is close to S. How close? Well, you get to pick that.

The typical strings are not individually the most probable strings! But if you randomly generate a string, it’s very probable that it lies in the typical set. That sounds a bit paradoxical, but if you think about it, you’ll see it’s not. Think of repeatedly flipping a coin that has a 90% chance of landing heads up. The most probable single outcome is that it lands heads up every time. But the typical outcome is that it lands up close to 90% of the time. And, there are lots of ways this can happen. So, if you flip the coin a bunch of times, there’s a very high chance that the outcome is typical.

It’s easy to see how this result is the key to the noisy channel coding theorem. The ‘typical set’ has few elements compared to the whole set of strings with n letters. So, you can make short codes for the strings in this set, and compress your message that way, and this works almost all the time. How much you can compress your message depends on the entropy S.

So, we’re seeing the link between information and entropy!

The actual coding schemes that people use are a lot trickier than the simple scheme I’m hinting at here. When you read about them, you see scary things like this:

But presumably they’re faster to implement, hence more practical.

The first coding schemes that come really close to the Shannon limit are the turbo codes. Surprisingly, these codes were developed only in 1993! They’re used in 3G mobile communications and deep space satellite communications.

One key trick is to use, not one decoder, but two. These two decoders keep communicating with each other and improving their guesses about the signal they’re received, until they agree:

This iterative process continues until the two decoders come up with the same hypothesis for the m-bit pattern of the payload, typically in 15 to 18 cycles. An analogy can be drawn between this process and that of solving cross-reference puzzles like crossword or sudoku. Consider a partially completed, possibly garbled crossword puzzle. Two puzzle solvers (decoders) are trying to solve it: one possessing only the “down” clues (parity bits), and the other possessing only the “across” clues. To start, both solvers guess the answers (hypotheses) to their own clues, noting down how confident they are in each letter (payload bit). Then, they compare notes, by exchanging answers and confidence ratings with each other, noticing where and how they differ. Based on this new knowledge, they both come up with updated answers and confidence ratings, repeating the whole process until they converge to the same solution.

This can be seen as “an instance of loopy belief propagation in Bayesian networks.”

By the way, the picture I showed you above is a flowchart of the decoding scheme for a simple turbo code. You can see the two decoders, and maybe the loop where data gets fed back to the decoders.

While I said this picture is “scary”, I actually like it because it’s an example of network theory applied to real-life problems.


Symmetry and the Fourth Dimension (Part 4)

26 July, 2012

Last time I posed a puzzle: figure out the Coxeter diagrams of the Platonic solids.

When you do this, it’s hard to help noticing a cool fact: if you take a Platonic solid and draw a dot in the center of each face, these dots are the vertices of another Platonic solid, called its dual. And if we do this again, we get back the same Platonic solid that we started with! These two solids have very similar Coxeter diagrams.

For example, starting with the cube, we get the octahedron:

Starting with the octahedron, we get back the cube:

These pictures were made by Alan Goodman, and if you go to his webpage you can see how all 5 Platonic solids work, and you can download his free book, which includes a good elementary introduction to group theory:

• Alan Goodman, Algebra: Abstract and Concrete, SemiSimple Press, Iowa City, 2012.

When we take the dual of a Platonic solid, or any other polyhedron, we replace:

• each vertex by a face,
• each edge by an edge,
• each face by a vertex.

So, it should not be surprising that in the Coxeter diagram, which records information about vertices, edges and faces, we just switch the letters V and F.

Here’s the story in detail.

Tetrahedron

 

The tetrahedron is its own dual, and its Coxeter diagram

V—3—E—3—F

doesn’t change when we switch the letters V and F. Remember, this diagram means that the tetrahedron has:

• 3 vertices and 3 edges around each face,
• 3 edges and 3 faces around each vertex.

Cube and Octahedron

 

   

The dual of the cube is the octahedron, and vice versa. The Coxeter diagram of the cube is:

V—4—E—3—F

because the cube has:

• 4 vertices and 4 edges around each face,
• 3 edges and 3 faces around each vertex.

On the other hand, the Coxeter diagram of the octahedron is:

V—3—E—4—F

because it has:

• 3 vertices and 3 edges around each face,
• 4 edges and 4 faces around each vertex.

If we switch the letters V and F in one of these Coxeter diagrams, we get the other one… drawn backwards, but that doesn’t count in this game.

Dodecahedron and Icosahedron

 

   

The dual of the dodecahedron is the icosahedron, and vice versa. The Coxeter diagram of the dodecahedron is:

V—5—E—3—F

because it has:

• 5 vertices and 5 edges around each face,
• 3 edges and 3 faces around each vertex.

The Coxeter diagram of the icosahedron is:

V—3—E—5—F

because it has:

• 3 vertices and 3 edges around each face,
• 5 edges and 5 faces around each vertex.

Again, you can get from either of these two Coxeter diagrams to the other by switching V and F. That’s duality.

The numbers

But now let’s think a bit about a deeper pattern lurking around here.

Puzzle. If we take the Coxeter diagrams we’ve just seen:

V—3—E—5—F
V—3—E—4—F
V—3—E—3—F
V—4—E—3—F
V—5—E—3—F

and strip off everything but the numbers, we get these ordered pairs:

(3,5),   (3,4),   (3,3),   (4,3),   (5,3)

Why do these pairs and only these pairs give Platonic solids? I’ve listed them in a cute way just for fun, but that’s not the point.

There could be a number of perfectly correct ways to tackle this puzzle. But I have one in mind, so maybe I should give you a couple of clues to nudge you toward my way of thinking—though I’d be happy to hear other ways, too!

First, what about this pair?

(3,6)

Well, last time we looked at the corresponding Coxeter diagram:

V—3—E—6—F

and we saw it doesn’t come from a Platonic solid. Instead, it comes from this tiling of the plane:

What I’m looking for is an equation or something like that, which holds only for the pairs of numbers that give Platonic solids. And it should work for some good reason, not by coincidence!

One more clue. If your equation, or whatever it is, allows extra solutions like (2,n) or (n,2), don’t be discouraged! There are weird degenerate Platonic solids called hosohedra, with just two vertices, like this:

You can’t make the faces flat, but you can still draw it on a sphere, and in some ways that’s more important. The Coxeter diagram for this guy is:

V—2—E—3—F

And each hosohedron has a dual, called a dihedron, with just two faces, like this:

The Coxeter diagram for this is:

V—3—E—2—F

So, if your answer to the puzzle allows for hosohedra and dihedra, it’s not actually bad. As you proceed deeper and deeper into this subject, you realize more and more that hosohedra and dihedra are important, even though they’re not polyhedra in the usual sense.


Carbon Cycle Box Models

24 July, 2012

guest post by Staffan Liljegren

What?

I think the carbon cycle must be the greatest natural invention, all things considered. It’s been the basis for all organic life on Earth through eons of time. Through evolution, it gradually creates more and more biodiversity. It is important to do more research on the carbon cycle for the earth sciences, biology and in particular global warming—or more generally, climate science and environmental science, which are among the foci of the Azimuth project.

It is a beautiful and complex nonlinear geochemical cycle, I decided to give a rough outline of its beauty and complexity. Plants eat water and carbon dioxide with help from the sun (photosynthesis) and while doing so they produce air and sugar for others to metabolize. These plants in turn may be eaten by vegan animals (herbivores), while animals may also be eaten by other animals like us humans, being meat eaters or animals that eat both animals and plants (carnivores or omnivores).

Here is an overview of the cycle, where yellow arrows show release of carbon dioxide and purple arrows show uptake:

carbon cycle

Say a plant gets eaten by an animal on land. Then the animal can use its carbon while breathing in air and breathing out water and carbon dioxide. Ruminant animals like cows and sheep also produce methane, which is a greenhouse gas like carbon dioxide. When a plant or animal dies it gets eaten by others, and any remains go down into the soil and sediments. A lot of the carbon in the sediments actually transforms into carbonate rock. This happens over millions of years. Some of this carbon makes it back into the air later through volcanoes.

Where?

Carbon is not a very abundant element on this planet: it’s only 0.08% of the total mass of the Earth. Nonetheless, we all know that many products of this atom are found throughout nature: for example in diamonds, marble, oil… and living organisms. If you remember your high school chemistry you might recall that the lab experiments with organic chemistry were the fun part of chemistry! The reason is that carbon has the ability to easily form compounds with other elements. So there is a tremendous global market that depends on the carbon cycle.

We humans are one fifth carbon. Other examples are trees, which we humans use for many things in our economic growth. But there are also fascinating flows inside the trees. I’ve read about these in Colin Tudge’s book The Secret Life of Trees – How They Live and why they Matter, so I will use this book for examples about forests and trees. You may already be familiar with these, but maybe not know a lot of details about their part in the carbon cycle.


When I stood in front of an tall monkey-puzzle tree in the genus Auracaria I was just flabbergasted by its age, and how it used to be widespread when the dinosaurs where around. But how does it manage to get the water to its leaves? Colin Tudge writes that during evolution trees invented stem-cell usage to grow the new outer layer, and developed microtechnology before we even existed as a species, where the leaves pull on several micron sized channels through osmosis and respiration to get the water up through the roots and trunk to the leaves at speeds typically around 6 meters per hour. But if needed, they can crank it up to 40 meters per hour to get it to the top in an hour or two!

Why?

Global warming is a fact and there are several remote sensing technologies that have confirmed this. You can see it nicely by clicking on this—you should see a NASA animation of satellite measurements superposed on top of Keeling’s famous graph of CO2 measured at Mauna Loa measurements from 2002 to 2009. Here’s more of that graph:

Many of the greenhouse gases that contribute to increasing temperature contains carbon: carbon dioxide, methane and carbon monoxide. I will focus on carbon dioxide. Its behavior is vastly different in air or water. In air it doesn’t react with other chemicals so its stays around for a longer time in the atmosphere. In the ocean and on land the carbon dioxide reacts a lot more, so there’s an uptake of carbon in both. But not in the ocean where it stays a lot longer mainly due to ocean buffering. I will have a lot more to say about the ocean geochemistry in the upcoming blog postings.

The carbon dioxide levels in the atmosphere in 2011 are soon approaching 400 parts per million (ppm) and the growth is increasing for every year. The parts per million is in relation to the volume of the atmosphere. David Archer says that if all the carbon dioxide were to fall as frozen carbon dioxide—’dry ice’—it would just be around 10 centimeters deep. But the important thing to understand is that we have thrown the carbon cycle seriously out of balance with our human emissions, so we might be close to some climate tipping points.

Colin my fellow ‘tree-hugger’ has looked at global warming and its implication for the trees. Intuitively it might seem that warmer temperatures and higher levels of CO2 might be beneficial for their growth. Indeed, the climate predictions of the International Panel on Climate Change assume this will happen. But there is a point where the micro-channels (stomata) start to close, due to too much photosynthesis and carbon dioxide. Taken together with higher temperature, this can make the trees’ respiration faster than its photosynthesis, so they end up supplying more carbon dioxide to atmosphere.

Trees also are very excellent at preventing floods, since one tree can divert 500 litres per day through transpiration. This easily adds up to 5000 cubic metres per square kilometre, making trees very good at reducing flood and and reducing our need for disaster preventions if they are left alone to do do their job.

How?

One way of understanding how the carbon cycle works is to use simple models like box models where we treat the carbon as contained in various ‘boxes’ and look at how it moves between boxes as time passes. A box can represent the Earth, the ocean, the atmosphere, or depending on what I want to study, any other part of the carbon cycle.

I’ll just mention a few examples of flows in the carbon cycle, to give you a feeling for them: breathing, photosynthesis, erosion, emission and decay. Breathing is easy to grasp—try to stop doing it yourself for a short moment! But how is photosynthesis a flow? This wonderful process was invented by the cyanobacteria 3.5 billion years ago and it has been used by plants ever since. It takes carbon out of the atmosphere and moves it into plant tissues.

In a box model, the average time something stays in a box is called its residence time, e-folding time, or response time by scientists. The rest of the flows in my list I leave up to you to think about: which are uptakes which are releases, and where do they occur?

The basic equation in a box model is called the mass balance equation:

\dot m = \sum \textrm{sources} - \sum \textrm{sinks}

Here m is the mass of some substance in some box. The sources are what flows into that box together with any internal sources (production). The sinks are what flows out together with any internal sinks (loss and deposition).

In my initial experiments where I used the year 2008, when I looked at a 1-dimensional global box model of CO2 in the atmosphere with only the fossil fuel as source, I get similar results to this diagram from the Global Carbon Project (petagram of carbon per year, which is the same as gigatonnes per year):

global carbon budget 2000 - 2010

I used the observed value from measurements at Mauna Loa. The atmosphere sink is 3.9 gigatonnes of carbon per year and the fossil fuel emission source is 8.7 GtC per year. The ocean also absorbs 2.1 GtC per year, and the land acts as a sink at 2.5 GtC per year.

I hope this will be the first of a series of posts! Next time I want to talk about a box model for the ocean’s role in the carbon cycle.

References

• Colin Tudge, The Secret Life of Trees: How They Live and Why They Matter, Penguin, London, 2005.

• David Archer, The Global Carbon Cycle, Princeton U. Press, Princeton, NJ, 2011.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers