Network Theory (Part 11)

4 October, 2011

jointly written with Brendan Fong

Noether proved lots of theorems, but when people talk about Noether’s theorem, they always seem to mean her result linking symmetries to conserved quantities. Her original result applied to classical mechanics, but today we’d like to present a version that applies to ‘stochastic mechanics’—or in other words, Markov processes.

What’s a Markov process? We’ll say more in a minute—but in plain English, it’s a physical system where something hops around randomly from state to state, where its probability of hopping anywhere depends only on where it is now, not its past history. Markov processes include, as a special case, the stochastic Petri nets we’ve been talking about.

Our stochastic version of Noether’s theorem is copied after a well-known quantum version. It’s yet another example of how we can exploit the analogy between stochastic mechanics and quantum mechanics. But for now we’ll just present the stochastic version. Next time we’ll compare it to the quantum one.

Markov processes

We should and probably will be more general, but let’s start by considering a finite set of states, say X. To describe a Markov process we then need a matrix of real numbers H = (H_{i j})_{i, j \in X}. The idea is this: suppose right now our system is in the state i. Then the probability of being in some state j changes as time goes by—and H_{i j} is defined to be the time derivative of this probability right now.

So, if \psi_i(t) is the probability of being in the state i at time t, we want the master equation to hold:

\displaystyle{ \frac{d}{d t} \psi_i(t) = \sum_{j \in X} H_{i j} \psi_j(t) }

This motivates the definition of ‘infinitesimal stochastic’, which we recall from Part 5:

Definition. Given a finite set X, a matrix of real numbers H = (H_{i j})_{i, j \in X} is infinitesimal stochastic if

i \ne j \implies H_{i j} \ge 0

and

\displaystyle{ \sum_{i \in X} H_{i j} = 0 }

for all j \in X.

The inequality says that if we start in the state i, the probability of being found in some other state, which starts at 0, can’t go down, at least initially. The equation says that the probability of being somewhere or other doesn’t change. Together, these facts imply that that:

H_{i i} \le 0

That makes sense: the probability of being in the state $i$, which starts at 1, can’t go up, at least initially.

Using the magic of matrix multiplication, we can rewrite the master equation as follows:

\displaystyle{\frac{d}{d t} \psi(t) = H \psi(t) }

and we can solve it like this:

\psi(t) = \exp(t H) \psi(0)

If H is an infinitesimal stochastic operator, we will call \exp(t H) a Markov process, and H its Hamiltonian.

(Actually, most people call \exp(t H) a Markov semigroup, and reserve the term Markov process for another way of looking at the same idea. So, be careful.)

Noether’s theorem is about ‘conserved quantities’, that is, observables whose expected values don’t change with time. To understand this theorem, you need to know a bit about observables. In stochastic mechanics an observable is simply a function assigning a number O_i to each state i \in X.

However, in quantum mechanics we often think of observables as matrices, so it’s nice to do that here, too. It’s easy: we just create a matrix whose diagonal entries are the values of the function O. And just to confuse you, we’ll also call this matrix O. So:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

One advantage of this trick is that it lets us ask whether an observable commutes with the Hamiltonian. Remember, the commutator of matrices is defined by

[O,H] = O H - H O

Noether’s theorem will say that [O,H] = 0 if and only if O is ‘conserved’ in some sense. What sense? First, recall that a stochastic state is just our fancy name for a probability distribution \psi on the set X. Second, the expected value of an observable O in the stochastic state \psi is defined to be

\displaystyle{ \sum_{i \in X} O_i \psi_i }

In Part 5 we introduced the notation

\displaystyle{ \int \phi = \sum_{i \in X} \phi_i }

for any function \phi on X. The reason is that later, when we generalize X from a finite set to a measure space, the sum at right will become an integral over X. Indeed, a sum is just a special sort of integral!

Using this notation and the magic of matrix multiplication, we can write the expected value of O in the stochastic state \psi as

\int O \psi

We can calculate how this changes in time if \psi obeys the master equation… and we can write the answer using the commutator [O,H]:

Lemma. Suppose H is an infinitesimal stochastic operator and O is an observable. If \psi(t) obeys the master equation, then

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

Proof. Using the master equation we have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O \frac{d}{d t} \psi(t) = \int O H \psi(t) } \qquad \qquad \qquad \; (1)

But since H is infinitesimal stochastic,

\displaystyle{ \sum_{i \in X} H_{i j} = 0  }

so for any function \phi on X we have

\displaystyle{ \int H \phi = \sum_{i, j \in X} H_{i j} \phi_j = 0 }

and in particular

\int H O \psi(t) = 0   \quad \; \qquad \qquad \qquad \qquad   \qquad \qquad \qquad \qquad (2)

Since [O,H] = O H - H O , we conclude from (1) and (2) that

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

as desired.   █

The commutator doesn’t look like it’s doing much here, since we also have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O H \psi(t) }

which is even simpler. But the commutator will become useful when we get to Noether’s theorem!

Noether’s theorem

Here’s a version of Noether’s theorem for Markov processes. It says an observable commutes with the Hamiltonian iff the expected values of that observable and its square don’t change as time passes:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int O\psi(t) = 0 }

and

\displaystyle{ \frac{d}{d t} \int O^2\psi(t) = 0 }

for all \psi(t) obeying the master equation.

If you know Noether’s theorem from quantum mechanics, you might be surprised that in this version we need not only the observable but also its square to have an unchanging expected value! We’ll explain this, but first let’s prove the theorem.

Proof. The easy part is showing that if [O,H]=0 then \frac{d}{d t} \int O\psi(t) = 0 and \frac{d}{d t} \int O^2\psi(t) = 0. In fact there’s nothing special about these two powers of t; we’ll show that

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) = 0 }

for all n. The point is that since H commutes with O, it commutes with all powers of O:

[O^n, H] = 0

So, applying the Lemma to the observable O^n, we see

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) =  \int [O^n, H] \psi(t) = 0 }

The backward direction is a bit trickier. We now assume that

\displaystyle{ \frac{d}{d t} \int O\psi(t) = \frac{d}{d t} \int O^2\psi(t) = 0 }

for all solutions \psi(t) of the master equation. This implies

\int O H\psi(t) = \int O^2 H\psi(t) = 0

or since this holds for all solutions,

\displaystyle{ \sum_{i \in X} O_i H_{i j} = \sum_{i \in X} O_i^2H_{i j} = 0 }  \qquad \qquad \qquad \qquad  \qquad \qquad (3)

We wish to show that [O,H]= 0.

First, recall that we can think of O is a diagonal matrix with:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

So, we have

\begin{array}{ccl} [O,H]_{i j} &=& \displaystyle{ \sum_{k \in X} (O_{i k}H_{k j} - H_{i k} O_{k j}) } \\ \\ &=& O_i H_{i j} - H_{i j}O_j \\ \\ &=& (O_i-O_j)H_{i j} \end{array}

To show this is zero for each pair of elements i, j \in X, it suffices to show that when H_{i j} \ne 0, then O_j = O_i. That is, we need to show that if the system can move from state j to state i, then the observable takes the same value on these two states.

In fact, it’s enough to show that this sum is zero for any j \in X:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} }

Why? When i = j, O_j-O_i = 0, so that term in the sum vanishes. But when i \ne j, (O_j-O_i)^2 and H_{i j} are both non-negative—the latter because H is infinitesimal stochastic. So if they sum to zero, they must each be individually zero. Thus for all i \ne j, we have (O_j-O_i)^2H_{i j}=0. But this means that either O_i = O_j or H_{i j} = 0, which is what we need to show.

So, let’s take that sum and expand it:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} = \sum_i (O_j^2 H_{i j}- 2O_j O_i H_{i j} +O_i^2 H_{i j}) }

which in turn equals

\displaystyle{  O_j^2\sum_i H_{i j} - 2O_j \sum_i O_i H_{i j} + \sum_i O_i^2 H_{i j} }

The three terms here are each zero: the first because H is infinitesimal stochastic, and the latter two by equation (3). So, we’re done!   █

Markov chains

So that’s the proof… but why do we need both O and its square to have an expected value that doesn’t change with time to conclude [O,H] = 0? There’s an easy counterexample if we leave out the condition involving O^2. However, the underlying idea is clearer if we work with Markov chains instead of Markov processes.

In a Markov process, time passes by continuously. In a Markov chain, time comes in discrete steps! We get a Markov process by forming \exp(t H) where H is an infinitesimal stochastic operator. We get a Markov chain by forming the operator U, U^2, U^3, \dots where U is a ‘stochastic operator’. Remember:

Definition. Given a finite set X, a matrix of real numbers U = (U_{i j})_{i, j \in X} is stochastic if

U_{i j} \ge 0

for all i, j \in X and

\displaystyle{ \sum_{i \in X} U_{i j} = 1 }

for all j \in X.

The idea is that U describes a random hop, with U_{i j} being the probability of hopping to the state i if you start at the state j. These probabilities are nonnegative and sum to 1.

Any stochastic operator gives rise to a Markov chain U, U^2, U^3, \dots . And in case it’s not clear, that’s how we’re defining a Markov chain: the sequence of powers of a stochastic operator. There are other definitions, but they’re equivalent.

We can draw a Markov chain by drawing a bunch of states and arrows labelled by transition probabilities, which are the matrix elements U_{i j}:

Here is Noether’s theorem for Markov chains:

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int O U \psi = \int O \psi }

and

\displaystyle{ \int O^2 U \psi = \int O^2 \psi }

for all stochastic states \psi.

In other words, an observable commutes with U iff the expected values of that observable and its square don’t change when we evolve our state one time step using U.

You can probably prove this theorem by copying the proof for Markov processes:

Puzzle. Prove Noether’s theorem for Markov chains.

But let’s see why we need the condition on the square of observable! That’s the intriguing part. Here’s a nice little Markov chain:

where we haven’t drawn arrows labelled by 0. So, state 1 has a 50% chance of hopping to state 0 and a 50% chance of hopping to state 2; the other two states just sit there. Now, consider the observable O with

O_i = i

It’s easy to check that the expected value of this observable doesn’t change with time:

\displaystyle{  \int O U \psi = \int O \psi }

for all \psi. The reason, in plain English, is this. Nothing at all happens if you start at states 0 or 2: you just sit there, so the expected value of O doesn’t change. If you start at state 1, the observable equals 1. You then have a 50% chance of going to a state where the observable equals 0 and a 50% chance of going to a state where it equals 2, so its expected value doesn’t change: it still equals 1.

On the other hand, we do not have [O,U] = 0 in this example, because we can hop between states where O takes different values. Furthermore,

\displaystyle{  \int O^2 U \psi \ne \int O^2 \psi }

After all, if you start at state 1, O^2 equals 1 there. You then have a 50% chance of going to a state where O^2 equals 0 and a 50% chance of going to a state where it equals 4, so its expected value changes!

So, that’s why \int O U \psi = \int O \psi for all \psi is not enough to guarantee [O,U] = 0. The same sort of counterexample works for Markov processes, too.

Finally, we should add that there’s nothing terribly sacred about the square of the observable. For example, we have:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int f(O) \psi(t) = 0 }

for all smooth f: \mathbb{R} \to \mathbb{R} and all \psi(t) obeying the master equation.

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int f(O) U \psi = \int f(O) \psi }

for all smooth f: \mathbb{R} \to \mathbb{R} and all stochastic states \psi.

These make the ‘forward direction’ of Noether’s theorem stronger… and in fact, the forward direction, while easier, is probably more useful! However, if we ever use Noether’s theorem in the ‘reverse direction’, it might be easier to check a condition involving only O and its square.


The Network of Global Corporate Control

3 October, 2011

While protesters are trying to occupy Wall Street and spread their movement to other cities…

… others are trying to mathematically analyze the network of global corporate control:

• Stefania Vitali, James B. Glattfelder and Stefano Battiston, The network of global corporate control.

Here’s a little ‘directed graph’:

Very roughly, a directed graph consists of some vertices and some edges with arrows on them. Vitali, Glattfelder and Battiston built an enormous directed graph by taking 43,060 transnational corporations and seeing who owns a stake in whom:


If we zoom in on the financial sector, we can see the companies those protestors are upset about:


Zooming out again, we could check that the graph as a whole consists of many pieces. But the largest piece contains 3/4 of all the corporations studied, including all the top by economic value, and accounting for 94.2% of the total operating revenue.

Within this there is a large ‘core’, containing 1347 corporations each of whom owns directly and/or indirectly shares in every other member of the core. On average, each member of the core has direct ties to 20 others. As a result, about 3/4 of the ownership of firms in the core remains in the hands of firms of the core itself. As the authors put it:

This core can be seen as an economic “super-entity” that raises new important issues both for researchers and policy makers.

If you’ve never thought much about modern global capitalism, the existence of this ‘core’ may seem shocking and scary… like an enormous invisible spiderweb wrapping around the globe, dominating us, controlling every move we make. Or maybe you can see a tremendous new business opportunity, waiting to be exploited!

But if you’ve already thought about these things, the existence of this core probably seems obvious. What’s new here is the use of certain ideas in math—graph theory, to be precise—to study it quantitatively.

So, let me say a bit more about the math! What’s a directed graph, exactly? It’s a set V and a subset E of V \times V. We call the elements of V vertices and the elements of E edges. Since an edge is an ordered pair of vertices, it has a ‘starting point’ and an ‘endpoint’—that’s why we call this kind of graph ‘directed’.

(Note that we can have an edge going from a vertex to itself, but we cannot have more than one edge going from some vertex v to some vertex v'. If you don’t like this, use some other kind of graph: there are many kinds!)

I spoke about ‘pieces’ of a directed graph, but that’s not a precise term, since there are various kinds of pieces:

• A connected component is a maximal set of vertices such that we can get from any one to any other by an undirected path, meaning a path of edges where we don’t care which way the arrows point.

• A strongly connected component is a maximal set of vertices such that we can get from any one to any other by an directed path, meaning a path of edges where at each step we walk ‘forwards’, along with the arrow.

I didn’t state these definitions very precisely, but I hope you can fill in the details. Maybe an example will help! This graph has three strongly connected components, shaded in blue, but just one connected component:

So when I said this:

The graph consists of many pieces, but the largest contains 3/4 of all the corporations studied, including all the top by economic value, and accounting for 94.2% of the total operating revenue.

I was really talking about the largest connected component. But when I said this:

Within this there is a large ‘core’ containing 1347 corporations each of whom owns directly and/or indirectly shares in every other member of the core.

I was really talking about a strongly connected component. When you look at random directed graphs, there often turns out to be one strongly connected component that’s a lot bigger than all the rest. This is called the core, or the giant strongly connected component.

In fact there’s a whole study of random directed graphs, which is relevant not only to corporations, but also to webpages! Webpages link to other webpages, giving a directed graph. (True, one webpage can link to another more than once, but we can either ignore that subtlety or use a different concept of graph that handles this.)

And it turns out that for various types of random directed graphs, we tend to get a so-called ‘bowtie structure’, like this:

In the middle you see the core, or giant strongly connected component, labelled SCC. (Yes, that’s where Exxon sits, like a spider in the middle of the web!)

Connected to this by paths going in, we have the left half of the bowtie, labelled IN. Connected to the core by paths going out, we have the right half of the bowtie, labelled OUT

There are also usually some IN-tendrils going out of the IN region, and some OUT-tendrils going into the ‘OUT’ region.

There may also be tubes going from IN to OUT while avoiding the core.

All this is one connected component: the largest one. But finally, not shown here, there may be a bunch of other smaller connected components. Presumably if these are large enough they have a similar structure.

Now: can we use this knowledge to do something good? Or it all too obvious so far? After all, so far we’re just saying the network of global corporate control is a fairly ordinary sort of random directed graph. Maybe we need to go beyond this, and think about ways in which it’s not ordinary. In fact, I should reread the paper with that in mind.

Or… well, maybe you have some ideas.

(By the way, I don’t think ‘overthrowing’ the network of global corporate control is a feasible or even desirable project. I’m not espousing any sort of revolutionary ideology, and I’m not interested in discussing politics here. I’m more interested in understanding the world and looking for some leverage points where we can gently nudge things in slightly better directions. If there were a way to do this by taking advantage of the power of corporations, that would be cool.)


Network Theory (Part 10)

16 September, 2011

Last time Brendan showed us a proof of the Anderson-Craciun-Kurtz theorem on equilibrium states. Today we’ll look at an example that illustrates this theorem. This example brings up an interesting ‘paradox’—or at least a puzzle. Resolving this will get us ready to think about a version of Noether’s theorem relating conserved quantities and symmetries. Next time Brendan will state and prove a version of Noether’s theorem that applies to stochastic Petri nets.

A reversible reaction

In chemistry a type of atom, molecule, or ion is called a chemical species, or species for short. Since we’re applying our ideas to both chemistry and biology, it’s nice that ‘species’ is also used for a type of organism in biology. This stochastic Petri net describes the simplest reversible reaction of all, involving two species:

We have species 1 turning into species 2 with rate constant \beta, and species 2 turning back into species 1 with rate constant \gamma. So, the rate equation is:

\begin{array}{ccc} \displaystyle{ \frac{d x_1}{d t} } &=& -\beta x_1 + \gamma x_2  \\  \qquad \mathbf{} \\ \displaystyle{ \frac{d x_2}{d t} } &=&  \beta x_1 - \gamma x_2  \end{array}

where x_1 and x_2 are the amounts of species 1 and 2, respectively.

Equilibrium solutions of the rate equation

Let’s look for equilibrium solutions of the rate equation, meaning solutions where the amount of each species doesn’t change with time. Equilibrium occurs when each species is getting created at the same rate at which it’s getting destroyed.

So, let’s see when

\displaystyle{ \frac{d x_1}{d t} = \frac{d x_2}{d t} = 0 }

Clearly this happens precisely when

\beta x_1 = \gamma x_2

This says the rate at which 1’s are turning into 2’s equals the rate at which 2’s are turning back into 1’s. That makes perfect sense.

Complex balanced equilibrium

In general, a chemical reaction involves a bunch of species turning into a bunch of species. Since ‘bunch’ is not a very dignified term, a bunch of species is usually called a complex. We saw last time that it’s very interesting to study a strong version of equilibrium: complex balanced equilibrium, in which each complex is being created at the same rate at which it’s getting destroyed.

However, in the Petri net we’re studying today, all the complexes being produced or destroyed consist of a single species. In this situation, any equilibrium solution is automatically complex balanced. This is great, because it means we can apply the Anderson-Craciun-Kurtz theorem from last time! This says how to get from a complex balanced equilibrium solution of the rate equation to an equilibrium solution of the master equation.

First remember what the master equation says. Let \psi_{n_1, n_2}(t) be the probability that we have n_1 things of species 1 and n_2 things of species 2 at time t. We summarize all this information in a formal power series:

\Psi(t) = \sum_{n_1, n_2 = 0}^\infty \psi_{n_1, n_2}(t) z_1^{n_1} z_2^{n_2}

Then the master equation says

\frac{d}{d t} \Psi(t) = H \Psi (t)

where following the general rules laid down in Part 8,

\begin{array}{ccl} H &=& \beta (a_2^\dagger - a_1^\dagger) a_1 + \gamma (a_1^\dagger - a_2^\dagger )a_2 \end{array}

This may look scary, but the annihilation operator a_i and the creation operator a_i^\dagger are just funny ways of writing the partial derivative \partial/\partial z_i and multiplication by z_i, so

\begin{array}{ccl} H &=& \displaystyle{ \beta (z_2 - z_1) \frac{\partial}{\partial z_1} + \gamma (z_1 - z_2) \frac{\partial}{\partial z_2} } \end{array}

or if you prefer,

\begin{array}{ccl} H &=& \displaystyle{ (z_2 - z_1) \, (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) } \end{array}

The first term describes species 1 turning into species 2. The second describes species 2 turning back into species 1.

Now, the Anderson-Cranciun-Kurtz theorem says that whenever (x_1,x_2) is a complex balanced solution of the rate equation, this recipe gives an equilibrium solution of the master equation:

\displaystyle{ \Psi = \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

In other words: whenever \beta x_1 = \gamma x_2, we have

we have

H\Psi = 0

Let’s check this! For starters, the constant in the denominator of \Psi doesn’t matter here, since H is linear. It’s just a normalizing constant, put in to make sure that our probabilities \psi_{n_1, n_2} sum to 1. So, we just need to check that

\displaystyle{ (z_2 - z_1) (\beta \frac{\partial}{\partial z_1} - \gamma \frac{\partial}{\partial z_2}) e^{x_1 z_1 + x_2 z_2} = 0 }

If we do the derivatives on the left hand side, it’s clear we want

\displaystyle{ (z_2 - z_1) (\beta x_1 - \gamma x_2) e^{x_1 z_1 + x_2 z_2} = 0 }

And this is indeed true when \beta x_1 = \gamma x_2 .

So, the theorem works as advertised. And now we can work out the probability \psi_{n_1,n_2} of having n_1 things of species 1 and n_2 of species 2 in our equilibrium state \Psi. To do this, we just expand the function \Psi as a power series and look at the coefficient of z_1^{n_1} z_2^{n_2}. We have

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} } = \displaystyle{ \frac{1}{e^{x_1} e^{x_2}} \sum_{n_1,n_2}^\infty \frac{(x_1 z_1)^{n_1}}{n_1!} \frac{(x_2 z_2)^{n_2}}{n_2!} }

so we get

\psi_{n_1,n_2} = \displaystyle{ \frac{1}{e^{x_1}} \frac{x_1^{n_1}}{n_1!} \; \cdot \; \frac{1}{e^{x_2}} \frac{x_1^{n_2}}{n_2!} }

This is just a product of two independent Poisson distributions!

In case you forget, a Poisson distribution says the probability of k events occurring in some interval of time if they occur with a fixed average rate and independently of the time since the last event. If the expected number of events is \lambda, the Poisson distribution is

\displaystyle{ \frac{1}{e^\lambda} \frac{\lambda^k}{k!} }

and it looks like this for various values of \lambda:

It looks almost like a Gaussian when \lambda is large, but when \lambda is small it becomes very lopsided.

Anyway: we’ve seen that in our equilibrium state, the number of things of species i = 1,2 is given by a Poisson distribution with mean x_i. That’s very nice and simple… but the amazing thing is that these distributions are independent.

Mathematically, this means we just multiply them to get the probability of finding n_1 things of species 1 and n_2 of species 2. But it also means that knowing how many things there are of one species says nothing about the number of the other.

But something seems odd here. One transition in our Petri net consumes a 1 and produces a 2, while the other consumes a 2 and produces a 1. The total number of particles in the system never changes. The more 1’s there are, the fewer 2’s there should be. But we just said knowing how many 1’s we have tells us nothing about how many 2’s we have!

At first this seems like a paradox. Have we made a mistake? Not exactly. But we’re neglecting something.

Conserved quantities

Namely: the equilibrium solutions of the master equation we’ve found so far are not the only ones! There are other solutions that fit our intuitions better.

Suppose we take any of our equilibrium solutions \Psi and change it like this: set the probability \psi_{n_1,n_2} equal to 0 unless

n_1 + n_2 = n

but otherwise leave it unchanged. Of course the probabilities no longer sum to 1, but we can rescale them so they do.

The result is a new equilibrium solution, say \Psi_n. Why? Because, as we’ve already seen, no transitions will carry us from one value of n_1 + n_2 to another. And in this new solution, the number of 1’s is clearly not independent from the number of 2’s. The bigger one is, the smaller the other is.

Puzzle 1. Show that this new solution \Psi_n depends only on n and the ratio x_1/x_2 = \gamma/\beta, not on anything more about the values of x_1 and x_2 in the original solution

\Psi = \displaystyle{ \frac{e^{x_1 z_1 + x_2 z_2}}{e^{x_1 + x_2}} }

Puzzle 2. What is this new solution like when \beta = \gamma? (This particular choice makes the problem symmetrical when we interchange species 1 and 2.)

What’s happening here is that this particular stochastic Petri net has a ‘conserved quantity’: the total number of things never changes with time. So, we can take any equilibrium solution of the master equation and—in the language of quantum mechanics—`project down to the subspace’ where this conserved quantity takes a definite value, and get a new equilibrium solution. In the language of probability theory, we say it a bit differently: we’re ‘conditioning on’ the conserved quantity taking a definite value. But the idea is the same.

This important feature of conserved quantities suggests that we should try to invent a new version of Noether’s theorem. This theorem links conserved quantities and symmetries of the Hamiltonian.

There are already a couple versions of Noether’s theorem for classical mechanics, and for quantum mechanics… but now we want a version for stochastic mechanics. And indeed one exists, and it’s relevant to what we’re doing here. We’ll show it to you next time.


Network Theory (Part 9)

13 September, 2011

jointly written with Brendan Fong

Last time we reviewed the rate equation and the master equation. Both of them describe processes where things of various kinds can react and turn into other things. But:

• In the rate equation, we assume the number of things varies continuously and is known precisely.

• In the master equation, we assume the number of things varies discretely and is known only probabilistically.

This should remind you of the difference between classical mechanics and quantum mechanics. But the master equation is not quantum, it’s stochastic: it involves probabilities, but there’s no uncertainty principle going on.

Still, a lot of the math is similar.

Now, given an equilibrium solution to the rate equation—one that doesn’t change with time—we’ll try to find a solution to the master equation with the same property. We won’t always succeed—but we often can! The theorem saying how was proved here:

• D. F. Anderson, G. Craciun and T. G. Kurtz, Product-form stationary distributions for deficiency zero chemical reaction networks.

To emphasize the analogy to quantum mechanics, we’ll translate their proof into the language of annihilation and creation operators. In particular, our equilibrium solution of the master equation is just like what people call a ‘coherent state’ in quantum mechanics.

So, if you know about quantum mechanics and coherent states, you should be happy. But if you don’t, fear not!—we’re not assuming you do.

The rate equation

To construct our equilibrium solution of the master equation, we need a special type of solution to our rate equation. We call this type a ‘complex balanced solution’. This means that not only is the net rate of production of each species zero, but the net rate of production of each possible bunch of species is zero.

Before we make this more precise, let’s remind ourselves of the basic setup.

We’ll consider a stochastic Petri net with a finite set S of species and a finite set T of transitions. For convenience let’s take S = \{1,\dots, k\}, so our species are numbered from 1 to k. Then each transition \tau has an input vector m(\tau) \in \mathbb{N}^k and output vector n(\tau) \in \mathbb{N}^k. These say how many things of each species go in, and how many go out. Each transition also has rate constant r(\tau) \in [0,\infty), which says how rapidly it happens.

The rate equation concerns a vector x(t) \in [0,\infty)^k whose ith component is the number of things of the ith species at time t. Note: we’re assuming this number of things varies continuously and is known precisely! This should remind you of classical mechanics. So, we’ll call x(t), or indeed any vector in [0,\infty)^k, a classical state.

The rate equation says how the classical state x(t) changes with time:

\displaystyle{  \frac{d x}{d t} = \sum_{\tau \in T} r(\tau)\, (n(\tau)-m(\tau)) \, x^{m(\tau)} }

You may wonder what x^{m(\tau)} means: after all, we’re taking a vector to a vector power! It’s just an abbreviation, which we’ve seen plenty of times before. If x \in \mathbb{R}^k is a list of numbers and m \in \mathbb{N}^k is a list of natural numbers, we define

x^m = x_1^{m_1} \cdots x_k^{m_k}

We’ll also use this notation when x is a list of operators.

Complex balance

The vectors m(\tau) and n(\tau) are examples of what chemists call complexes. A complex is a bunch of things of each species. For example, if the set S consists of three species, the complex (1,0,5) is a bunch consisting of one thing of the first species, none of the second species, and five of the third species.

For our Petri net, the set of complexes is the set \mathbb{N}^k, and the complexes of particular interest are the input complex m(\tau) and the output complex n(\tau) of each transition \tau.

We say a classical state c \in [0,\infty)^k is complex balanced if for all complexes \kappa \in \mathbb{N}^k we have

\displaystyle{ \sum_{\{\tau : m(\tau) = \kappa\}} r(\tau) c^{m(\tau)} =\sum_{\{\tau : n(\tau) = \kappa\}} r(\tau) c^{m(\tau)}  }

The left hand side of this equation, which sums over the transitions with input complex \kappa, gives the rate of consumption of the complex \kappa. The right hand side, which sums over the transitions with output complex \kappa, gives the rate of production of \kappa . So, this equation requires that the net rate of production of the complex \kappa is zero in the classical state c .

Puzzle. Show that if a classical state c is complex balanced, and we set x(t) = c for all t, then x(t) is a solution of the rate equation.

Since x(t) doesn’t change with time here, we call it an equilibrium solution of the rate equation. Since x(t) = c is complex balanced, we call it complex balanced equilibrium solution.

The master equation

We’ve seen that any complex balanced classical state gives an equilibrium solution of the rate equation. The Anderson–Craciun–Kurtz theorem says that it also gives an equilibrium solution of the master equation.

The master equation concerns a formal power series

\displaystyle{ \Psi(t) = \sum_{n \in \mathbb{N}^k} \psi_n(t) z^n }

where

z^n = z_1^{n_1} \cdots z_k^{n_k}

and

\psi_n(t) = \psi_{n_1, \dots,n_k}(t)

is the probability that at time t we have n_1 things of the first species, n_2 of the second species, and so on.

Note: now we’re assuming this number of things varies discretely and is known only probabilistically! So, we’ll call \Psi(t), or indeed any formal power series where the coefficients are probabilities summing to 1, a stochastic state. Earlier we just called it a ‘state’, but that would get confusing now: we’ve got classical states and stochastic states, and we’re trying to relate them.

The master equation says how the stochastic state \Psi(t) changes with time:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

where the Hamiltonian H is:

\displaystyle{ H = \sum_{\tau \in T} r(\tau) \, \left({a^\dagger}^{n(\tau)} - {a^\dagger}^{m(\tau)} \right) \, a^{m(\tau)}  \label{master} }

The notation here is designed to neatly summarize some big products of annihilation and creation operators. For any vector n \in \mathbb{N}^k, we have

a^n = a_1^{n_1} \cdots  a_k^{n_k}

and

\displaystyle{  {a^\dagger}^n = {a_1^\dagger }^{n_1} \cdots  {a_k^\dagger}^{n_k} }

Coherent states

Now suppose c \in [0,\infty)^k is a complex balanced equilibrium solution of the rate equation. We want to get an equilibrium solution of the master equation. How do we do it?

For any c \in [0,\infty)^k there is a stochastic state called a coherent state, defined by

\displaystyle{ \Psi_c = \frac{e^{c z}}{e^c} }

Here we are using some very terse abbreviations. Namely, we are defining

e^{c} = e^{c_1} \cdots e^{c_k}

and

e^{c z} = e^{c_1 z_1} \cdots e^{c_k z_k}

Equivalently,

\displaystyle{ e^{c z} = \sum_{n \in \mathbb{N}^k} \frac{c^n}{n!}z^n }

where c^n and z^n are defined as products in our usual way, and

n! = n_1! \, \cdots \, n_k!

Either way, if you unravel the abbrevations, here’s what you get:

\displaystyle{  \Psi_c = e^{-(c_1 + \cdots + c_k)} \, \sum_{n \in \mathbb{N}^k} \frac{c_1^{n_1} \cdots c_k^{n_k}} {n_1! \, \cdots \, n_k! } \, z_1^{n_1} \cdots z_k^{n_k} }

Maybe now you see why we like the abbreviations.

The name ‘coherent state’ comes from quantum mechanics. In quantum mechanics, we think of a coherent state \Psi_c as the ‘quantum state’ that best approximates the classical state c. But we’re not doing quantum mechanics now, we’re doing probability theory. \Psi_c isn’t a ‘quantum state’, it’s a stochastic state.

In probability theory, people like Poisson distributions. In the state \Psi_c, the probability of having n_i things of the ith species is equal to

\displaystyle{  e^{-c_i} \, \frac{c_i^{n_i}}{n_i!} }

This is precisely the definition of a Poisson distribution with mean equal to c_i. We can multiply a bunch of factors like this, one for each species, to get

\displaystyle{ e^{-c} \frac{c^n}{n!} }

This is the probability of having n_1 things of the first species, n_2 things of the second, and so on, in the state \Psi_c. So, the state \Psi_c is a product of independent Poisson distributions. In particular, knowing how many things there are of one species says nothing all about how many things there are of any other species!

It is remarkable that such a simple state can give an equilibrium solution of the master equation, even for very complicated stochastic Petri nets. But it’s true—at least if c is complex balanced.

The Anderson–Craciun–Kurtz theorem

Now we’re ready to state and prove the big result:

Theorem (Anderson–Craciun–Kurtz). Suppose c \in [0,\infty)^k is a complex balanced equilibrium solution of the rate equation. Then H \Psi_c = 0.

It follows that \Psi_c is an equilibrium solution of the master equation. In other words, if we take \Psi(t) = \Psi_c for all times t, the master equation holds:

\displaystyle{ \frac{d}{d t} \Psi(t) = H \Psi(t) }

since both sides are zero.

Proof. To prove the Anderson–Craciun–Kurtz theorem, we just need to show that H \Psi_c = 0. Since \Psi_c is a constant times e^{c z}, it suffices to show H e^{c z} = 0. Remember that

\displaystyle{ H e^{c z} =  \sum_{\tau \in T} r(\tau) \left( {a^\dagger}^{n(\tau)} -{a^\dagger}^{m(\tau)} \right) \, a^{m(\tau)} \, e^{c z} }

Since the annihilation operator a_i is given by differentiation with respect to z_i, while the creation operator a^\dagger_i is just multiplying by z_i, we have:

\displaystyle{ H e^{c z} = \sum_{\tau \in T} r(\tau) \, c^{m(\tau)} \left( z^{n(\tau)} - z^{m(\tau)} \right) e^{c z} }

Expanding out e^{c z} we get:

\displaystyle{ H e^{c z} = \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\left(z^{n(\tau)}\frac{c^i}{i!}z^i - z^{m(\tau)}\frac{c^i}{i!}z^i\right) }

Shifting indices and defining negative powers to be zero:

\displaystyle{ H e^{c z}  = \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\left(\frac{c^{i-n(\tau)}}{(i-n(\tau))!}z^i - \frac{c^{i-m(\tau)}}{(i-m(\tau))!}z^i\right) }

So, to show H e^{c z} = 0, we need to show this:

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\frac{c^{i-n(\tau)}}{(i-n(\tau))!}z^i =\sum_{i \in \mathbb{N}^k} \sum_{\tau \in T} r(\tau)c^{m(\tau)}\frac{c^{i-m(\tau)}}{(i-m(\tau))!}z^i }

We do this by splitting the sum over T according to output and then input complexes, making use of the complex balanced condition:

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \sum_{\{\tau : n(\tau)=\kappa\}}  r(\tau)c^{m(\tau)}\frac{c^{i-n(\tau)}}{(i-n(\tau))!} \, z^i  = }

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \frac{c^{i-\kappa}}{(i-\kappa)!}\, z^i \sum_{\{\tau : n(\tau) = \kappa\}}  r(\tau)c^{m(\tau)} = }

\displaystyle{\sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \frac{c^{i-\kappa}}{(i-\kappa)!}\, z^i \sum_{\{\tau : m(\tau) = \kappa\}}  r(\tau)c^{m(\tau)}  = }

\displaystyle{ \sum_{i \in \mathbb{N}^k} \sum_{\kappa \in \mathbb{N}^k} \sum_{\{\tau : m(\tau) = \kappa\}}  r(\tau)c^{m(\tau)}\frac{c^{i-m(\tau)}}{(i-m(\tau))!}\, z^i }

This completes the proof! It’s just algebra, but it seems a bit magical, so we’re trying to understand it better.

I hope you see how amazing this result is. If you know quantum mechanics and coherent states you’ll understand what I mean. A coherent state is the "best quantum approximation" to a classical state, but we don’t expect this quantum state to be exactly time-independent when the corresponding classical state is, except in very special cases, like when the Hamiltonian is quadratic in the creation and annihilation operators. Here we are getting a result like that much more generally… but only given the "complex balanced" condition.

An example

We’ve already seen one example of the theorem, back in Part 7. We had this stochastic Petri net:

We saw that the rate equation is just the logistic equation, familiar from population biology. The equilibrium solution is complex balanced, because pairs of amoebas are getting created at the same rate as they’re getting destroyed, and single amoebas are getting created at the same rate as they’re getting destroyed.

So, the Anderson–Craciun–Kurtz theorem guarantees that there’s an equilibrium solution of the master equation where the number of amoebas is distributed according to a Poisson distribution. And, we actually checked that this was true!

Next time we’ll look at another example.


Fool’s Gold

12 September, 2011

My favorite Platonic solids are the regular dodecahedron, with 12 faces and 20 corners:

and the regular icosahedron, with 12 corners and 20 faces:

They are close relatives, with all the same symmetries… but what excites me most is that they have 5-fold symmetry. It’s a theorem that no crystal can have 5-fold symmetry. So, we might wonder whether these shapes occur in nature… and if they don’t, how people dreamt up these shapes in the first place.

It’s widely believed that the Pythagoreans dreamt up the regular dodecahedron after seeing crystals of iron pyrite—the mineral also known as ‘fool’s gold’. Nobody has any proof of this. However, there were a lot of Pythagoreans in Sicily back around 500 BC, and also a lot of pyrite. And, it’s fairly common for pyrite to form crystals like this:


This crystal is a ‘pyritohedron’. It looks similar to regular dodecahedron—but it’s not! At the molecular level, iron pyrite has little crystal cells with cubic symmetry. But these cubes can form a pyritohedron:


(By the way, you can click on any of these pictures for more information.)

You’ll notice that the front face of this pyritohedron is like a staircase with steps that go up 2 cubes for each step forwards. In other words, it’s a staircase with slope 2. That’s the key to the math here! By definition, the pyritohedron has faces formed by planes at right angles to these 12 vectors:

\begin{array}{cccc} (2,1,0) &  (2,-1,0) &  (-2,1,0) &  (-2,-1,0) \\ (1,0,2) &  (-1,0,2) &  (1,0,-2) &  (-1,0,-2) \\ (0,2,1) &  (0,2,-1) &  (0,-2,1) &  (0,-2,-1) \\ \end{array}

On the other hand, a regular dodecahedron has faces formed by the planes at right angles to some very similar vectors, where the number 2 has been replaced by this number, called the golden ratio:

\displaystyle {\Phi = \frac{\sqrt{5} + 1}{2}}

Namely, these vectors:

\begin{array}{cccc} (\Phi,1,0) &  (\Phi,-1,0) &  (-\Phi,1,0) &  (-\Phi,-1,0) \\ (1,0,\Phi) &  (-1,0,\Phi) &  (1,0,-\Phi) &  (-1,0,-\Phi) \\ (0,\Phi,1) &  (0,\Phi,-1) &  (0,-\Phi,1) &  (0,-\Phi,-1) \\ \end{array}

Since

\Phi \approx 1.618...

the golden ratio is not terribly far from 2. So, the pyritohedron is a passable attempt at a regular dodecahedron. Perhaps it was even good enough to trick the Pythagoreans into inventing the real thing.

If so, we can say: fool’s gold made a fool’s golden ratio good enough to fool the Greeks.

At this point I can’t resist a digression. You get the Fibonacci numbers by starting with two 1’s and then generating a list of numbers where each is the sum of the previous two:

1, 1, 2, 3, 5, 8, 13, …

The ratios of consecutive Fibonacci numbers get closer and closer to the golden ratio. For example:

\begin{array}{ccl}  1/1 &=& 1 \\ 2/1 &=& 2  \\ 3/2 &=& 1.5   \\  5/3 &=& 1.6666..   \\ 8/5 &=& 1.6 \\ \end{array}

and so on. So, in theory, we could use these ratios to make cubical crystals that come closer and closer to a regular dodecahedron!

And in fact, pyrite doesn’t just form the 2/1 pyritohedron I showed you earlier. Sometimes it forms a 3/2 pyritohedron! This is noticeably better. The 2/1 version looks like this:

while the 3/2 version looks like this:

Has anyone ever seen a 5/3 pyritohedron? That would be even better. It would be quite hard to distinguish by eye from a true regular dodecahedron. Unfortunately, I don’t think iron pyrite forms such subtle crystals.

Okay. End of digression. But there’s another trick we can play!

These 12 vectors I mentioned:

\begin{array}{cccc} (\Phi,1,0) &  (\Phi,-1,0) &  (-\Phi,1,0) &  (-\Phi,-1,0) \\ (1,0,\Phi) &  (-1,0,\Phi) &  (1,0,-\Phi) &  (-1,0,-\Phi) \\ (0,\Phi,1) &  (0,\Phi,-1) &  (0,-\Phi,1) &  (0,-\Phi,-1) \\ \end{array}

besides being at right angles to the faces of the dodecahedron, are also the corners of the icosahedron!

And if we use the number 2 here instead of the number \Phi, we get the vertices of a so-called pseudoicosahedron. Again, this can be made out of cubes:



However, nobody seems to think the Greeks ever saw a crystal shaped like a pseudoicosahedron! The icosahedron is first mentioned in Book XIII of Euclid’s Elements, which speaks of:

the five so-called Platonic figures which, however, do not belong to Plato, three of the five being due to the Pythagoreans, namely the cube, the pyramid, and the dodecahedron, while the octahedron and the icosahedron are due to Theaetetus.

So, maybe Theaetetus discovered the icosahedron. Indeed, Benno Artmann has argued that this shape was the first mathematical object that was a pure creation of human thought, not inspired by anything people saw!

That idea is controversial. It leads to some fascinating puzzles, like: did the Scots make stone balls shaped like Platonic solids back in 2000 BC? For more on these puzzles, try this:

• John Baez, Who discovered the icosahedron?

But right now I want to head in another direction. It turns out iron pyrite can form a crystal shaped like a pseudoicosahedron! And as Johan Kjellman pointed out to me, one of these crystals was recently auctioned off… for only 47 dollars!

It’s beautiful:

So: did the Greeks ever seen one of these? Alas, we may never know.

For more on these ideas, see:

• John Baez, My favorite numbers: 5.

• John Baez, Tales of the dodecahedron: from Pythagoras to Plato to Poincaré.

• John Baez, This Week’s Finds in Mathematical Physics, "week241" and "week283".

• Ian O. Angell and Moreton Moore, Projections of cubic crystals, International Union of Crystallography.

To wrap up, I should admit that icosahedra and dodecahedra show up in many other places in nature—but probably too small for the ancient Greeks to see. Here are some sea creatures magnified 50 times:

And here’s a virus:


The gray bar on top is 10 nanometers long, while the bar on bottom is just 5 nanometers long.

The mathematics of viruses with 5-fold symmetry is fascinating. Just today, I learned of Reidun Twarock‘s recent discoveries in this area:

• Reidun Twarock, Mathematical virology: a novel approach to the structure and assembly of viruses, Phil. Trans. R. Soc. A 364 (2006), 3357-3373.

Most viruses with 5-fold symmetry have protein shells in patterns based on the same math as geodesic domes:

 

But some more unusual viruses, like polyomavirus and simian virus 40, use more sophisticated patterns made of two kinds of tiles:

They still have 5-fold symmetry, but these patterns are spherical versions of Penrose tilings! A Penrose tiling is a nonrepeating pattern, typically with approximate 5-fold symmetry, made out of two kinds of tiles:

To understand these more unusual viruses, Twarock needed to use some very clever math:

• Thomas Keef and Reidun Twarock, Affine extensions of the icosahedral group with applications to the three-dimensional organisation of simple viruses, J. Math. Biol. 59 (2009), 287-313.

But that’s another story for another day!



Mathematics of Planet Earth at Banff

7 September, 2011

A while back, I mentioned that 2013 will be a special year for programs on the Mathematics of Planet Earth. I also mentioned that the Centre de Recerca Matematica in Barcelona is inviting mathematicians to organize conferences and workshops on this theme.

They’re also inviting mathematicians to organize workshops on this theme at the Banff International Research Station for Mathematical Innovation and Discovery, or BIRS. This is a famous and beautiful research center in the Canadian Rockies.

The deadline is coming up on September 30th, and I want to apply. If you’d like to join me, please drop me a note, either here on this blog or by email!

I’m open to all sorts of ideas, and I’d love help from biologists or climate scientists. If you don’t give me a better idea, I’ll probably do an application on network theory. It might look a bit like this:

Diagrammatic languages for describing complex networks made of interacting parts are used throughout ecology, biology, climate science, engineering, and many other fields. Examples include Systems Biology Graphical Notation, Petri nets in computer science, stochastic Petri nets and chemical reaction networks in chemistry and biochemistry, bond graphs in electrical, chemical and mechanical engineering, Bayesian networks in probabilistic reasoning, box models in climate science, and Harold Odum’s Energy Systems Language for systems ecology. Often these diagrammatic languages are invented by practitioners in a given field without reference to previous work in other fields. Recently mathematicians have set up the theoretical infrastructure needed to formalize, rigorously relate, and some cases unify these various languages. Doing this will help interdisciplinary work of the sort that is becoming important in theoretical ecology, climate science and ‘the mathematics of planet Earth’. The goal of this workshop is to bring together experts on various diagrammatic languages and mathematicians who study the general theory of diagrammatic reasoning.

If you’d be interested in coming to a workshop on this subject, let me know. Banff provides accommodation, full board, and research facilities—but not, I believe, travel funds! So, “interested in coming” means “interested enough to pay for your own flight”.

Banff does “full workshops” with 42 people for 5 days, and “half workshops” with 20 people for 5 days. Part of why I’m asking you to express your interest is to gauge which seems more appropriate.

Here’s what they say:

With a growing global population competing for the same global resources, an increased frequency and intensity of dramatic climatic events, and evidence pointing to more long-term patterns of general climate change, the pressure to comprehend nature and its trends is greater than ever. Leaders in politics, sociology and economics have begun to seriously take note of issues which before were confined to the natural sciences alone, and mathematical modeling is at the heart of much of the research undertaken. The year 2013 has thus been earmarked by mathematical sciences institutes around the world as a time for a special emphasis on the study of the “Mathematics of Planet Earth” (MPE 13). This theme is to be interpreted as broadly as possible, in the aim of creating new partnerships with related disciplines and casting new light on the many ways in which the mathematical sciences can help to comprehend and tackle some of the world’s most pressing problems.

The Banff International Research Station (BIRS) is a full partner in this important initiative, as the goals of MPE 13 are completely in line with the station’s commitment to pursuing excellence in a broad range of mathematical sciences and applications. BIRS has already planned to host five workshops in 2012 which deal with the themes of MPE 13:

“Emergent Behavior in Multi-particle Systems with Non-local Interactions” (January 22-27).

“Frontiers in the Detection and Attribution of Climate Change” (May 29–June 1).

“Tissue Growth and Morphogenesis: from Genetics to Mechanics and Back” (July 22-27).

“Model Reduction in Continuum Thermodynamics: Modeling, Analysis and Computation” (September 16-21).

“Thin Liquid Films and Fluid Interfaces: Models, Experiments and Applications” (December 9-14).

BIRS also invites interested applicants to use the opportunities of its 2013 program and submit proposals in line of the MPE 2013 theme, in conjunction with BIRS’ regular format for programming. Proposals should be made using the BIRS online submission process.


Hierarchical Organization and Biological Evolution (Part 1)

29 August, 2011

guest post by Cameron Smith

Consider these quotes:

My thesis has been that one path to the construction of a non-trivial theory of complex systems is by way of a theory of hierarchy. Empirically, a large proportion of the complex systems we observe in nature exhibit hierarchic structure. On theoretical grounds we could expect complex systems to be hierarchies in a world in which complexity had to evolve from simplicity.Herbert Simon, 1962

(Many of the concepts that) have dominated scientific thinking for three hundred years, are based upon the understanding that at smaller and smaller scales—both in space and in time—physical systems become simple, smooth and without detail. A more careful articulation of these ideas would note that the fine scale structure of planets, materials and atoms is not without detail. However, for many problems, such detail becomes irrelevant at the larger scale. Since the details (become) irrelevant (at such larger scales), formulating theories in a way that assumes that the detail does not exist yields the same results as (theories that do not make this assumption).Yaneer Bar-Yam

Thoughts like these lead me to believe that, as a whole, we humans need to reassess some of our approaches to understanding. I’m not opposed to reductionism, but I think it would be useful to try to characterize those situations that might require something more than an exclusively reductionist approach. One way to do that is to break down some barriers that we’ve constructed between disciplines. So I’m here on Azimuth trying to help out this process.

Indeed, Azimuth is just one of many endeavors people are beginning to work on that might just lead to the unification of humanity into a superorganism. Regardless of the external reality, a fear of climate change could have a unifying effect. And, if we humans are simply a set of constituents of the superorganism that is Earth’s biosphere, it appears we are its only candidate germ line. So, assuming we’d like our descendants to have a chance at existence in the universe, we need to figure out either how to keep this superorganism alive or help it reproduce.

We each have to recognize our own individual limitations of time, commitment, and brainpower. So, I’m trying to limit my work to the study of biological evolution rather than conjuring up a ‘pet theory of everything’. However, I’m also trying not to let those disciplinary and institutional barriers limit the tools I find valuable, or the people I interact with.

So, the more I’ve thought about the complexity of evolution (for now let’s just say ‘complexity’ = ‘anything humans don’t yet understand’), the more I’ve been driven to search for new languages. And in that search, I’ve been driven toward pure mathematics, where there are many exciting languages lurking around. Perhaps one of these languages has already obviated the need to invent new ideas to understand biological evolution… or perhaps an altogether new language needs to be constructed.

The prospects of a general theory of evolution point to the same intellectual challenge that we see in the quote above from Bar-Yam: assuming we’d like to be able to consistently manipulate the universe, when can we neglect details and when can’t we?

Consider the level of organization concept. Since different details of a system can be effectively ignored at different scales, our scientific theories have themselves become ‘stratified’:

• G. L. Farre, The energetic structure of observation: a philosophical disquisition, American Behavioral Scientist 40 (May 1997), 717-728.

In other words, science tends to be organized in ‘layers’. These layers have come to be conceived of as levels of organization, and each scientific theory tends to address only one of these levels (click the image to see the flash animation that ascends through many scales or levels):

It might be useful to work explicitly on connecting theories that tell us about particular levels of organization in order to attempt to develop some theories that transcend levels of organization. One type of insight that could be gained from this approach is an understanding of the mutual development of bottom-up ostensibly mechanistic models of simple systems and top-down initially phenomenological models of complex ones.

Simon has written an interesting discussion of the quasi-continuum that ranges from simple systems to complex ones:

• H. A. Simon, The architecture of complexity, Proceedings of the American Philosophical Society 106 (1962), 467–482.

But if we take an ideological perspective on science that says “let’s unify everything!” (scientific monism), a significant challenge is the development of a language able to unify our descriptions of simple and complex systems. Such a language might help communication among scientists who work with complex systems that apparently involve multiple levels of organization. Something like category theory may provide the nucleus of the framework necessary to formally address this challenge. But, in order to head in that direction, I’ll try out a few examples in a series of posts, albeit from the somewhat limited perspective of a biologist, from which some patterns might begin to surface.

In this introductory post, I’ll try to set a basis for thinking about this tension between simple and complex systems without wading through any treatises on ‘complexity’. It will be remarkably imprecise, but I’ll try to describe the ways in which I think it provides a useful metaphor for thinking about how we humans have dealt with this simple ↔ complex tension in science.

Another tack that I think could accomplish a similar goal, perhaps in a clearer way, would be to discuss fractals, power laws and maybe even renormalization. I might try that out in a later post if I get a little help from my new Azimuth friends, but I don’t think I’m qualified yet to do it alone.

Simple and complex systems

What is the organizational structure of the products of evolutionary processes? Herbert Simon provides a perspective that I find intuitive in his parable of two watchmakers.

He argues that the systems containing modules that don’t instantaneously fall apart (‘stable intermediates’) and can be assembled hierarchically take less time to evolve complexity than systems that lack stable intermediates. Given a particular set of internal and environmental constraints that can only be satisfied by some relatively complex system, a hierarchically organized one will be capable of meeting those constraints with the fewest resources and in the least time (i.e. most efficiently). The constraints any system is subject to determine the types of structures that can form. If hierarchical organization is an unavoidable outcome of evolutionary processes, it should be possible to characterize the causes that lead to its emergence.

Simon describes a property that some complex systems have in common, which he refers to as ‘near decomposability’:

• H. A. Simon, Near decomposability and the speed of evolution, Industrial and Corporate Change 11 (June 2002), 587-599.

A system is nearly decomposable if it’s made of parts that interact rather weakly with each other; these parts in turn being made of smaller parts with the same property.

For example, suppose we have a system modelled by a first-order linear differential equation. To be concrete, consider the fictitious building imagined by Simon: the Mellon Institute, with 12 rooms. Suppose the temperature of the ith room at time t is T_i(t). Of course most real systems seem to be nonlinear, but for the sake of this metaphor we can imagine that the temperatures of these rooms interact in a linear way, like this:

\displaystyle{ \frac{d}{d t}T_i(t) = \sum_{j}a_{ij}\left(T_{j}(t)-T_{i}(t)\right),}

where a_{ij} are some numbers. Suppose also that the matrix a_{ij} looks like this:

For the sake of the metaphor I’m trudging through here, let’s also assume

a\;\gg\;\epsilon_l\;\gg\;\epsilon_2

Then our system is nearly decomposable. Why? It has three ‘layers’, with two cells at the top level, each divided into two subcells, and each of these subdivided into three sub-subcells. The numbers of the rows and columns designate the cells, cells 1–6 and 7–12 constitute the two top-level subsystems, cells 1–3, 4–6, 7–9 and 10–12 the four second-level sub- systems. The interactions within the latter subsystems have intensity a, those within the former two subsystems, intensity \epsilon_l, and those between components of the largest subsystems, intensity \epsilon_2. This is why Simon states that this matrix is in near-diagonal form. Another, probably more common, terminology for this would be near block diagonal form. This terminology is a bit sloppy, but it basically means that we have a square matrix whose diagonal entries are square matrices and all other elements are approximately zero. That ‘approximately’ there is what differentiates near block diagonal matrices from honest block diagonal matrices whose off diagonal matrix elements are precisely zero.

This is a trivial system, but it illustrates that the near decomposability of the coefficient matrix allows these equations to be solved in a near hierarchical fashion. As an approximation, rather than simulating all the equations at once (e.g. all twelve in this example) one can take a recursive approach and solve the four systems of three equations (each of the blocks containing a‘s), and average the results to produce initial conditions for two systems of two equations with coefficients:

\begin{array}{cccc} \epsilon_1 & \epsilon_1 & \epsilon_2 & \epsilon_2 \\         \epsilon_1 & \epsilon_1 & \epsilon_2 & \epsilon_2\\     \epsilon_2 & \epsilon_2 & \epsilon_1 & \epsilon_1 \\     \epsilon_2 & \epsilon_2 & \epsilon_1 & \epsilon_1        \end{array}

and then average those results to produce initial conditions for a single system of two equations with coefficients:

\begin{array}{cc} \epsilon_2 & \epsilon_2 \\     \epsilon_2 & \epsilon_2    \end{array}

This example of simplification indicates that the study of a nearly decomposable systems system can be reduced to a series of smaller modules, which can be simulated in less computational time, if the error introduced in this approximation is tolerable. The degree to which this method saves time depends on the relationship between the size of the whole system and the size and number of hierarchical levels. However, as an example, given that the time complexity for matrix inversion (i.e. solving a system of linear equations) is O(n^2), then the hierarchical decomposition would lead to an algorithm with time complexity

\displaystyle{ O\left(\left(\frac{n}{L}\right)^2\right)}

where L is the number of levels in the decomposition. (For example, L=4 in the Mellon Institute, assuming the individual rooms are the lowest level).

All of this deserves to be made much more precise. However, there are some potential metaphorical consequences for the evolution of complex systems:

If we begin with a population of systems of comparable complexity, some of which are nearly decomposable and some of which are not, the nearly decomposable systems will, on average, increase their fitness through evolutionary processes much faster than the remaining systems, and will soon come to dominate the entire population. Notice that the claim is not that more complex systems will evolve more rapidly than less complex systems, but that, at any level of complexity, nearly decomposable systems will evolve much faster than systems of comparable complexity that are not nearly decomposable. – Herbert Simon, 2002

The point I’d like to make is that in this system, the idea of switching back and forth between simple and complex perspectives is made explicit: we get a sort of conceptual parallax:

In this simple case, the approximation that Simon suggests works well; however, for some other systems, it wouldn’t work at all. If we aren’t careful, we might even become victims of the Dunning-Kruger effect. In other words: if we don’t understand a system well from the start, we may overestimate how well we understand the limitations inherent to the simplifications we employ in studying it.

But if we at least recognize the potential of falling victim to the Dunning-Kruger effect, we can vigilantly guard against it in trying to understand, for example, the currently paradoxical tension between ‘groups’ and ‘individuals’ that lies at the heart of evolutionary theory… and probably also the caricatures of evolution that breed social controversy.

Keeping this in mind, my starting point in the next post in this series will be to provide some examples of hierarchical organization in biological systems. I’ll also set the stage for a discussion of evolution viewed as a dynamic process involving structural and functional transitions in hierarchical organization—or for the physicists out there, something like phase transitions!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers