Hexagonal Hyperbolic Honeycombs

14 May, 2014

This post is just for fun.

Roice Nelson likes geometry, and he makes plastic models of interesting objects using a 3d printer. He recently created some great pictures of ‘hexagonal hyperbolic honeycombs’. With his permission, I wrote about them on my blog Visual Insight. Here I’ve combined those posts into a single more polished article.

But the pictures are the star of the show. They deserve to be bigger than the 450-pixel width of this blog, so please click on them and see the full-sized versions!

The {6,3,3} honeycomb

 

This is the {6,3,3} honeycomb.

How do you build this structure? Take 4 rods and glue them together so their free ends lie at the corners of a regular tetrahedron. Make lots of copies of this thing. Then stick them together so that as you go around from one intersection to the next, following the rods, the shortest possible loop is always a hexagon!

This is impossible in ordinary flat 3-dimensional space. But you can succeed if you work in hyperbolic space, a non-Euclidean space where the angles of a triangle add up to less than 180°. The result is the {6,3,3} honeycomb, shown here.

Of course, this picture has been projected onto your flat computer screen. This distorts the rods, so they look curved. But they’re actually straight… inside curved space.

The {6,3,3} honeycomb is an example of a ‘hyperbolic honeycomb’. In general, a 3-dimensional honeycomb is a way of filling 3d space with polyhedra. It’s the 3-dimensional analogue of a tiling of the plane. Besides honeycombs in 3d Euclidean space, we can also have honeycombs in 3d hyperbolic space. The {6,3,3} honeycomb is one of these.

But actually, when I said a honeycomb is a way of filling 3d space with polyhedra, I was lying slightly. It’s often true—but not in this example!

For comparison, in the {5,3,4} honeycomb, space really is filled with polyhedra:

You can see a lot of pentagons, and if you look carefully you’ll see these pentagons are faces of dodecahedra:

In the honeycomb, these dodecahedra fill hyperbolic space.

But in the {6,3,3} honeycomb, all the hexagons lie on infinite sheets. You can see one near the middle of this picture:

These sheets of hexagons are not polyhedra in the usual sense, because they have infinitely many polygonal faces! So, the {6,3,3} honeycomb is called a paracompact honeycomb.

But what does the symbol {6,3,3} mean?

It’s an example of a Schläfli symbol. It’s defined in a recursive way. The symbol for the hexagon is {6}. The symbol for the hexagonal tiling of the plane is {6,3} because 3 hexagons meet at each vertex. Finally, the hexagonal tiling honeycomb has symbol {6,3,3}, because 3 hexagonal tilings meet at each edge of this honeycomb.

So, we can build a honeycomb if we know its Schläfli symbol. And there’s a lot of information in this symbol.

For example, just as the {6,3} inside {6,3,3} describes the hexagonal tilings inside the {6,3,3} honeycomb, the {3,3} describes the vertex figure of this honeycomb: that is, the way the edges meet at each vertex. {3,3} is the Schläfli symbol for the regular tetrahedron, and in the {6,3,3} honeycomb each vertex has 4 edges coming out, just like the edges going from the center of a tetrahedron to its corners!

The {6,3,4} honeycomb

 

This is the {6,3,4} honeycomb.

How do you build this structure? Make 3 intersecting rods at right angles to each other. Make lots of copies of this thing. Then stick them together so that as you go around from one intersection to the next, following the rods, the shortest possible loop is always a hexagon!

This is impossible in ordinary flat 3-dimensional space. You can only succeed if the shortest possible loop is a square. Then you get the familiar cubic honeycomb, also called the the {4,3,4} honeycomb:


To get hexagons instead of squares, space needs to be curved! You can succeed if you work in hyperbolic space, where it’s possible to create a hexagon whose internal angles are all 90°. In ordinary flat space, only a square can have all its internal angles be 90°.

Here’s the tricky part: the hexagons in the {6,3,4} honeycomb form infinite sheets where 3 hexagons meet at each corner. You can see one of these sheets near the center of the picture. The corners of the hexagons in one sheet lie on a flat plane in hyperbolic space, called a horosphere.

That seems to make sense, because in flat space hexagons can have all their internal angles be 120°… so three can meet snugly at a corner. But I just said these hexagons have 90° internal angles!

Puzzle 1. What’s going on? Can you resolve the apparent contradiction?

The Schläfli symbol of this honeycomb is {6,3,4}, and we can see why using ideas I’ve already explained. It’s made of hexagonal tilings of the plane, which have Schläfli symbol {6,3} because 3 hexagons meet at each vertex. On the other hand, the vertex figure of this honeycomb is an octahedron: if you look at the picture you can can see that each vertex has 6 edges coming out, just like the edges going from the center of an octahedron to its corners. The octahedron has Schläfli symbol {3,4}, since it has 4 triangles meeting at each corner. Take {6,3} and {3,4} and glue them together and you get {6,3,4}!

We can learn something from this. Since this honeycomb has Schläfli symbol {6,3,4}, it has 4 hexagonal tilings meeting at each edge! That’s a bit hard to see from the picture.

All the honeycombs I’ve been showing you are ‘regular’. This is the most symmetrical kind of honeycomb. A flag in a honeycomb is a vertex lying on an edge lying on a face lying on a cell (which could be a polyhedron or an infinite sheet of polygons). A honeycomb is regular if there’s a symmetry sending any flag to any other flag.

The {6,3,3} and {6,3,4} honeycombs are also ‘paracompact’. Remember, this means they have infinite cells, which in this case are the hexagonal tilings {6,3}. There are 15 regular honeycombs in 3d hyperbolic space, of which 11 are paracompact. For a complete list of regular paracompact honeycombs, see:

Regular paracompact honeycombs, Wikipedia.

The {6,3,5} honeycomb

 

This is the {6,3,5} honeycomb. It’s built from sheets of regular hexagons, and 5 of these sheets meet along each edge of the honeycomb. That explains the Schläfli symbol {6,3,5}.

If you look very carefully, you’ll see 12 edges coming out of each vertex here, grouped in 6 opposite pairs. These edges go out from the vertex to its 12 neighbors, which are arranged like the corners of a regular icosahedron!

In other words, the vertex figure of this honeycomb is an icosahedron. And even if you can’t see this in the picture, you can deduce that it’s true, because {3,5} is the Schläfli symbol for the regular icosahedron, and it’s sitting inside {6,3,5}, at the end.

But now for a puzzle. This is for people who like probability theory:

Puzzle 2. Say you start at one vertex in this picture, a place where edges meet. Say you randomly choose an edge and walk down it to the next vertex… each edge being equally likely. Say you keep doing this. This is the most obvious random walk you can do on the {6,3,5} honeycomb. Is the probability that eventually you get back where you started equal to 1? Or is it less than 1?

If that’s too hard, try the same sort of question with the usual cubical honeycomb in ordinary flat 3d space. Or the square lattice on the plane!

In one dimension, where you just take steps back and forth on the integers, with equal chances of going left or right each time, you have a 100% chance of eventually getting back where you started. But the story works differently in different dimensions—and it also depends on whether space is flat, spherical or hyperbolic.

The {6,3,6} honeycomb

 

This is the {6,3,6} honeycomb. It has a lot of sheets of regular hexagons, and 6 sheets meet along each edge of the honeycomb.

The {6,3,6} honeycomb has a special property: it’s ‘self-dual’. The tetrahedron is a simpler example of a self-dual shape. If we draw a vertex in the middle of each face of the tetrahedron, and draw an edge crossing each edge, we get a new shape with a face for each vertex of the tetrahedron… but this new shape is again a tetrahedron!

If we do a similar thing one dimension up for the {6,3,6} honeycomb, this amounts to creating a new honeycomb with:

• one vertex for each infinite sheet of hexagons in the original honeycomb;

• one edge for each hexagon in the original honeycomb;

• one hexagon for each edge in the original honeycomb;

• one infinite sheet of hexagons for each vertex in the original honeycomb.

But this new honeycomb turns out to be another {6,3,6} honeycomb!

This is hard to visualize, at least for me, but it implies something cool. Just as each sheet of hexagons has infinitely many hexagons on it, each vertex has infinitely many edges going through it.

This self-duality comes from the symmetry of the Schläfli symbol {6,3,6}: if you reverse it, you get the same thing!

Okay. I’ve showed you regular hyperbolic honeycombs where 3, 4, 5, or 6 sheets of hexagons meet along each edge. Sometimes in math patterns go on forever, but sometimes they end—just like life itself. And indeed, we’ve reached the end of something here! You can’t build a regular honeycomb in hyperbolic space with 7 sheets of hexagons meeting at each edge.

Puzzle 3. What do you get if you try?

I’m not sure, but it’s related to a pattern we’ve been seeing. The hexagonal hyperbolic honeycombs I’ve shown you are the ‘big brothers’ of the tetrahedron, the octahedron, the icosahedron and the triangular tiling of the plane! Here’s how it goes:

• You can build a tetrahedron where 3 triangles meet at each corner:

For this reason, the Schläfli symbol of the tetrahedron is {3,3}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of a tetrahedron… and these edges form hexagons. This is the {6,3,3} honeycomb.

• You can build an octahedron where 4 triangles meet at each corner:

The Schläfli symbol of the octahedron is {3,4}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of an octahedron… and these edges form hexagons. This is the {6,3,4} honeycomb.

• You can build an icosahedron where 5 triangles meet at each corner:

The Schläfli symbol of the icosahedron is called {3,5}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of an icosahedron… and these edges form hexagons. This is the {6,3,5} honeycomb.

• You can build a tiling of a flat plane where 6 triangles meet at each corner:

This triangular tiling is also called {3,6}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of a triangular tiling… and these edges form hexagons. This is the {6,3,6} honeycomb.

The last one is a bit weird! The triangular tiling has infinitely many corners, so in the picture here, there are infinitely many edges coming out of each vertex.

But what happens when we get to {6,3,7}? That’s the puzzle.

Coxeter groups

I’ve been telling you about Schläfli symbols, but these are closely related to another kind of code, which is deeper and in many ways better. It’s called a Coxeter diagram. The Coxeter diagram of the {6,3,3} honeycomb is

●—6—o—3—o—3—o

What does this mean? It looks a lot like the Schläfli symbol, and that’s no coincidence, but there’s more to it.

The symmetry group of the {6,3,3} honeycomb is a discrete subgroup of the symmetry group of hyperbolic space. This discrete group has generators and relations summarized by the unmarked Coxeter diagram:

o—6—o—3—o—3—o

This diagram says there are four generators s_1, \dots, s_4 obeying relations encoded in the edges of the diagram:

(s_1 s_2)^6 = 1
(s_2 s_3)^3 = 1
(s_3 s_4)^3 = 1

together with relations

s_i^2 = 1

and

s_i s_j = s_j s_i \; \textrm{ if } \; |i - j| > 1

Marking the Coxeter diagram in different ways lets us describe many honeycombs with the same symmetry group as the hexagonal tiling honeycomb—in fact, 24 – 1 = 15 of them, since there are 4 dots in the Coxeter diagram! For the theory of how this works, illustrated by some simpler examples, try this old post of mine:

Symmetry and the Fourth Dimension (Part 9).

or indeed the whole series. The series is far from done; I have a pile of half-written episodes that I need to finish up and publish. This post should, logically, come after all those… but life is not fully governed by logic.

Similar remarks apply to all the hexagonal hyperbolic honeycombs I’ve shown you today:


{6,3,3} honeycomb


●—6—o—3—o—3—o
3 hexagonal tilings meeting at each edge
vertex figure: tetrahedron

 

{6,3,4} honeycomb


●—6—o—3—o—4—o
4 hexagonal tilings meeting at each edge
vertex figure: octahedron

 

{6,3,5} honeycomb


●—6—o—3—o—5—o
5 hexagonal tilings meeting at each edge
vertex figure: icosahedron

 

{6,3,6} honeycomb


●—6—o—3—o—6—o
6 hexagonal tilings meeting at each edge
vertex figure: hexagonal tiling

Finally, one more puzzle, for people who like algebra and number theory:

Puzzle 4. The symmetry group of 3d hyperbolic space, not counting reflections, is \mathrm{PSL}(2,\mathbb{C}). Can you explicitly describe the subgroups that preserve the four hexagonal hyperbolic honeycombs?

For the case of {6,3,3}, Martin Weissman gave an answer on G+:

Well, it’s \mathrm{PSL}_2(\mathbb{Z}[e^{2 \pi i / 3}]), of course!

Since he’s an expert on arithmetic Coxeter groups, this must be about right! Theorem 10.2 in this paper he showed me:

• Norman W. Johnson and Asia Ivic Weiss, Quadratic integers and Coxeter Groups, Canad. J. Math. Vol. 51 (1999), 1307–1336.

is a bit more precise. It gives a nice description of the even part of the Coxeter group discussed in this article, that is, the part generated by products of pairs of reflections. To get this group, we start with 2 × 2 matrices with entries in the Eisenstein integers: the integers with a cube root of -1 adjoined. We look at the matrices where the absolute value of the determinant is 1, and then we ‘projectivize’ it, modding out by its center. That does the job!

They call the even part of the Coxeter group [3,3,6]+, and they call the group it’s isomorphic to \mathrm{P\overline{S}L}_2(\mathbb{E}), where \mathbb{E} is their notation for the Eisenstein integers, also called \mathbb{Z}[e^{2 \pi i / 3}]. The weird little line over the \mathrm{S} is a notation of theirs: \mathrm{SL}_2 stands for 2 × 2 matrices with determinant 1, but \mathrm{\overline{S}L}_2 is their notation for 2 × 2 matrices whose determinant has absolute value 1.

Can you say more about this case? What about the other cases?


Noether’s Theorem: Quantum vs Stochastic

3 May, 2014

guest post by Ville Bergholm

In 1915 Emmy Noether discovered an important connection between the symmetries of a system and its conserved quantities. Her result has become a staple of modern physics and is known as Noether’s theorem.

Photo of Emmy Noether

The theorem and its generalizations have found particularly wide use in quantum theory. Those of you following the Network Theory series here on Azimuth might recall Part 11 where John Baez and Brendan Fong proved a version of Noether’s theorem for stochastic systems. Their result is now published here:

• John Baez and Brendan Fong, A Noether theorem for stochastic mechanics, J. Math. Phys. 54:013301 (2013).

One goal of the network theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. John and Brendan proved their stochastic version of Noether’s theorem by exploiting ‘stochastic mechanics’ which was formulated in the network theory series to mathematically resemble quantum theory. Their result, which we will outline below, was different than what would be expected in quantum theory, so it is interesting to try to figure out why.

Recently Jacob Biamonte, Mauro Faccin and myself have been working to try to get to the bottom of these differences. What we’ve done is prove a version of Noether’s theorem for Dirichlet operators. As you may recall from Parts 16 and 20 of the network theory series, these are the operators that generate both stochastic and quantum processes. In the language of the series, they lie in the intersection of stochastic and quantum mechanics. So, they are a subclass of the infinitesimal stochastic operators considered in John and Brendan’s work.

The extra structure of Dirichlet operators—compared with the wider class of infinitesimal stochastic operators—provided a handle for us to dig a little deeper into understanding the intersection of these two theories. By the end of this article, astute readers will be able to prove that Dirichlet operators generate doubly stochastic processes.

Before we get into the details of our proof, let’s recall first how conservation laws work in quantum mechanics, and then contrast this with what John and Brendan discovered for stochastic systems. (For a more detailed comparison between the stochastic and quantum versions of the theorem, see Part 13 of the network theory series.)

The quantum case

I’ll assume you’re familiar with quantum theory, but let’s start with a few reminders.

In standard quantum theory, when we have a closed system with n states, the unitary time evolution of a state |\psi(t)\rangle is generated by a self-adjoint n \times n matrix H called the Hamiltonian. In other words, |\psi(t)\rangle satisfies Schrödinger’s equation:

i \hbar \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle.

The state of a system starting off at time zero in the state |\psi_0 \rangle and evolving for a time t is then given by

|\psi(t) \rangle = e^{-i t H}|\psi_0 \rangle.

The observable properties of a quantum system are associated with self-adjoint operators. In the state |\psi \rangle, the expected value of the observable associated to a self-adjoint operator O is

\langle O \rangle_{\psi} = \langle \psi | O | \psi \rangle

This expected value is constant in time for all states if and only if O commutes with the Hamiltonian H:

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t.

In this case we say O is a ‘conserved quantity’. The fact that we have two equivalent conditions for this is a quantum version of Noether’s theorem!

The stochastic case

In stochastic mechanics, the story changes a bit. Now a state |\psi(t)\rangle is a probability distribution: a vector with n nonnegative components that sum to 1. Schrödinger’s equation gets replaced by the master equation:

\displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle

If we start with a probability distribution |\psi_0 \rangle at time zero and evolve it according to this equation, at any later time have

|\psi(t)\rangle = e^{t H} |\psi_0 \rangle.

We want this always be a probability distribution. To ensure that this is so, the Hamiltonian H must be infinitesimal stochastic: that is, a real-valued n \times n matrix where the off-diagonal entries are nonnegative and the entries of each column sum to zero. It no longer needs to be self-adjoint!

When H is infinitesimal stochastic, the operators e^{t H} map the set of probability distributions to itself whenever t \ge 0, and we call this family of operators a continuous-time Markov process, or more precisely a Markov semigroup.

In stochastic mechanics, we say an observable O is a real diagonal n \times n matrix, and its expected value is given by

\langle O\rangle_{\psi} = \langle \hat{O} | \psi \rangle

where \hat{O} is the vector built from the diagonal entries of O. More concretely,

\langle O\rangle_{\psi} = \displaystyle{ \sum_i O_{i i} \psi_i }

where \psi_i is the ith component of the vector |\psi\rangle.

Here is a version of Noether’s theorem for stochastic mechanics:

Noether’s Theorem for Markov Processes (Baez–Fong). Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0

and

\displaystyle{\frac{d}{d t}}\langle O^2 \rangle_{\psi(t)} = 0

for all t \ge 0 and all \psi(t) obeying the master equation.   █

So, just as in quantum mechanics, whenever [O,H]=0 the expected value of O will be conserved:

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

for any \psi_0 and all t \ge 0. However, John and Brendan saw that—unlike in quantum mechanics—you need more than just the expectation value of the observable O to be constant to obtain the equation [O,H]=0. You really need both

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

together with

\displaystyle{\frac{d}{d t}} \langle O^2\rangle_{\psi(t)} = 0

for all initial data \psi_0 to be sure that [O,H]=0.

So it’s a bit subtle, but symmetries and conserved quantities have a rather different relationship than they do in quantum theory.

A Noether theorem for Dirichlet operators

But what if the infinitesimal generator of our Markov semigroup is also self-adjoint? In other words, what if H is both an infinitesimal stochastic matrix but also its own transpose: H = H^\top? Then it’s called a Dirichlet operator… and we found that in this case, we get a stochastic version of Noether’s theorem that more closely resembles the usual quantum one:

Noether’s Theorem for Dirichlet Operators. If H is a Dirichlet operator and O is an observable, then

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

Proof. The \Rightarrow direction is easy to show, and it follows from John and Brendan’s theorem. The point is to show the \Leftarrow direction. Since H is self-adjoint, we may use a spectral decomposition:

H = \displaystyle{ \sum_k E_k |\phi_k \rangle \langle \phi_k |}

where \phi_k are an orthonormal basis of eigenvectors, and E_k are the corresponding eigenvalues. We then have:

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = \langle \hat{O} | H e^{t H} |\psi_0 \rangle = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

\iff \quad \langle \hat{O}| H e^{t H} = 0 \quad \forall t \ge 0

\iff \quad \sum_k \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} \langle \phi_k| = 0 \quad \forall t \ge 0

\iff \quad \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} = 0 \quad \forall t \ge 0

\iff \quad |\hat{O} \rangle \in \mathrm{Span}\{|\phi_k \rangle \, : \; E_k = 0\} = \ker \: H,

where the third equivalence is due to the vectors |\phi_k \rangle being linearly independent. For any infinitesimal stochastic operator H the corresponding transition graph consists of m connected components iff we can reorder (permute) the states of the system such that H becomes block-diagonal with m blocks. Now it is easy to see that the kernel of H is spanned by m eigenvectors, one for each block. Since H is also symmetric, the elements of each such vector can be chosen to be ones within the block and zeros outside it. Consequently

|\hat{O} \rangle \in \ker \: H

implies that we can choose the basis of eigenvectors of O to be the vectors |\phi_k \rangle, which implies

[O, H] = 0

Alternatively,

|\hat{O} \rangle \in \ker \, H

implies that

|\hat{O^2} \rangle \in \ker \: H \; \iff \; \cdots \; \iff \; \displaystyle{\frac{d}{d t}} \langle O^2 \rangle_{\psi(t)} = 0 \; \forall \: |\psi_0 \rangle, \forall t \ge 0,

where we have used the above sequence of equivalences backwards. Now, using John and Brendan’s original proof, we can obtain [O, H] = 0.   █

In summary, by restricting ourselves to the intersection of quantum and stochastic generators, we have found a version of Noether’s theorem for stochastic mechanics that looks formally just like the quantum version! However, this simplification comes at a cost. We find that the only observables O whose expected value remains constant with time are those of the very restricted type described above, where the observable has the same value in every state in a connected component.

Puzzles

Suppose we have a graph whose graph Laplacian matrix H generates a Markov semigroup as follows:

U(t) = e^{t H}

Puzzle 1. Suppose that also H = H^\top, so that H is a Dirichlet operator and hence i H generates a 1-parameter unitary group. Show that the indegree and outdegree of any node of our graph must be equal. Graphs with this property are called balanced.

Puzzle 2. Suppose that U(t) = e^{t H} is doubly stochastic Markov semigroup, meaning that for all t \ge 0 each row and each column of U(t) sums to 1:

\displaystyle{ \sum_i U(t)_{i j} = \sum_j U(t)_{i j} = 1 }

and all the matrix entries are nonnegative. Show that the Hamiltonian H obeys

\displaystyle{\sum_i H_{i j} = \sum_j H_{i j} = 0 }

and all the off-diagonal entries of H are nonnegative. Show the converse is also true.

Puzzle 3. Prove that any doubly stochastic Markov semigroup U(t) is of the form e^{t H} where H is the graph Laplacian of a balanced graph.

Puzzle 4. Let O(t) be a possibly time-dependent observable, and write \langle O(t) \rangle_{\psi(t)} for its expected value with respect to some initial state \psi_0 evolving according to the master equation. Show that

\displaystyle{ \frac{d}{d t}\langle O(t)\rangle_{\psi(t)} = \left\langle [O(t), H] \right\rangle_{\psi(t)} + \left\langle \frac{\partial O(t)}{\partial t}\right\rangle_{\psi(t)} }

This is a stochastic version of the Ehrenfest theorem.


Programming with Chemical Reaction Networks

23 March, 2014

 

There will be a 5-day workshop on Programming with Chemical Reaction Networks: Mathematical Foundation at BIRS from Sunday, June 8 to Friday June 13, 2014 It’s being organized by

Anne Condon (University of British Columbia)
David Doty (California Institute of Technology)
Chris Thachuk (University of Oxford).

BIRS is the Banff International Research Station, in the mountains west of Calgary, in Alberta, Canada.

Description

Here’s the workshop proposal on the BIRS website. It’s a pretty interesting proposal, especially if you’ve already read Luca Cardelli’s description of computing with chemical reaction networks, at the end of our series of posts on chemical reaction networks. The references include a lot of cool papers, so I’ve created links to those to help you get ahold of them.

This workshop will explore three of the most important research themes concerning stochastic chemical reaction networks (CRNs). Below we motivate each theme and highlight key questions that the workshop will address. Our main objective is to bring together distinct research communities in order to consider new problems that could not be fully appreciated in isolation. It is also our aim to determine commonalities between different disciplines and bodies of research. For example, research into population protocols, vector addition systems, and Petri networks provide a rich body of theoretical results that may already address contemporary problems arising in the study of CRNs.

Computational power of CRNs

Before designing robust and practical systems, it is useful to know the limits to computing with a chemical soup. Some interesting theoretical results are already known for stochastic chemical reaction networks. The computational power of CRNs depend upon a number of factors, including: (i) is the computation deterministic, or probabilistic, and (ii) does the CRN have an initial context — certain species, independent of the input, that are initially present in some exact, constant count.

In general, CRNs with a constant number of species (independent of the input length) are capable of Turing universal computation [17], if the input is represented by the exact (unary) count of one molecular species, some small probability of error is permitted and an initial context in the form of a single-copy leader molecule is used.

Could the same result hold in the absence of an initial context? In a surprising result based on the distributed computing model of population protocols, it has been shown that if a computation must be error-free, then deterministic computation with CRNs having an initial context is limited to computing semilinear predicates [1], later extended to functions outputting natural numbers encoded by molecular counts [5].

Furthermore, any semilinear predicate or function can be computed by that class of CRNs in expected time polylogarithmic in the input length. Building on this result, it was recently shown that by incurring an expected time linear in the input length, the same result holds for “leaderless” CRNs [8] — CRNs with no initial context. Can this result be improved to sub-linear expected time? Which class of functions can be computed deterministically by a CRN without an initial context in expected time polylogarithmic in the input length?

While (restricted) CRNs are Turing-universal, current results use space proportional to the computation time. Using a non-uniform construction, where the number of species is proportional to the input length and each initial species is present in some constant count, it is known that any S(n) space-bounded computation can be computed by a logically-reversible tagged CRN, within a reaction volume of size poly(S(n)) [18]. Tagged CRNs were introduced to model explicitly the fuel molecules in physical realizations of CRNs such as DNA strand displacement systems [6] that are necessary to supply matter and energy for implementing reactions such as X → X + Y that violate conservation of mass and/or energy.

Thus, for space-bounded computation, there exist CRNs that are time-efficient or are space-efficient. Does there exist time- and space-efficient CRNs to compute any space-bounded function?

Designing and verifying robust CRNs

While CRNs provide a concise model of chemistry, their physical realizations are often more complicated and more granular. How can one be sure they accurately implement the intended network behaviour? Probabilistic model checking has already been employed to find and correct inconsistencies between CRNs and their DNA strand displacement system (DSD) implementations [9]. However, at present, model checking of arbitrary CRNs is only capable of verifying the correctness of very small systems. Indeed, verification of these types of systems is a difficult problem: probabilistic state reachability is undecidable [17, 20] and general state reachability is EXPSPACE-hard [4].

How can larger systems be verified? A deeper understanding of CRN behaviour may simplify the process of model checking. As a motivating example, there has been recent progress towards verifying that certain DSD implementations correctly simulate underlying CRNs [16, 7, 10]. This is an important step to ensuring correctness, prior to experiments. However, DSDs can also suffer from other errors when implementing CRNs, such as spurious hybridization or strand displacement. Can DSDs and more generally CRNs be designed to be robust to such predictable errors? Can error correcting codes and redundant circuit designs used in traditional computing be leveraged in these chemical computers? Many other problems arise when implementing CRNs. Currently, unique types of fuel molecules must be designed for every reaction type. This complicates the engineering process significantly. Can a universal type of fuel be designed to smartly implement any reaction?

Energy efficient computing with CRNs

Rolf Landauer showed that logically irreversible computation — computation as modeled by a standard Turing machine — dissipates an amount of energy proportional to the number of bits of information lost, such as previous state information, and therefore cannot be energy efficient [11]. However, Charles Bennett showed that, in principle, energy efficient computation is possible, by proposing a universal Turing machine to perform logically-reversible computation and identified nucleic acids (RNA/DNA) as a potential medium to realize logically-reversible computation in a physical system [2].

There have been examples of logically-reversible DNA strand displacement systems — a physical realization of CRNs — that are, in theory, capable of complex computation [12, 19]. Are these systems energy efficient in a physical sense? How can this argument be made formally to satisfy both the computer science and the physics communities? Is a physical experiment feasible, or are these results merely theoretical footnotes?

References

[1] D. Angluin, J. Aspnes, and D. Eisenstat. Stably computable predicates are semilinear. In PODC, pages 292–299, 2006.

[2] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and Development, 17 (6):525–532, 1973.

[3] L. Cardelli and A. Csikasz-Nagy. The cell cycle switch computes approximate majority. Scientific Reports, 2, 2012.

[4] E. Cardoza, R. Lipton, A. R. Meyer. Exponential space complete problems for Petri nets and commutative semigroups (Preliminary Report). Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pages 507–54, 1976.

[5] H. L. Chen, D. Doty, and D. Soloveichik. Deterministic function computation with chemical reaction networks. DNA Computing and Molecular Programming, pages 25–42, 2012.

[6] A. Condon, A. J. Hu, J. Manuch, and C. Thachuk. Less haste, less waste: on recycling and its limits in strand displacement systems. Journal of the Royal Society: Interface Focus, 2 (4):512–521, 2012.

[7] Q. Dong. A bisimulation approach to verification of molecular implementations of formal chemical reaction network. Master’s thesis. SUNY Stony Brook, 2012.

[8] D. Doty and M. Hajiaghayi. Leaderless deterministic chemical reaction networks. In Proceedings of the 19th International Meeting on DNA Computing and Molecular Programming, 2013.

[9] M. R. Lakin, D. Parker, L. Cardelli, M. Kwiatkowska, and A. Phillips. Design and analysis of DNA strand displacement devices using probabilistic model checking. Journal of The Royal Society Interface, 2012.

[10] M. R. Lakin, D. Stefanovic and A. Phillips. Modular Verification of Two-domain DNA Strand Displacement Networks via Serializability Analysis. In Proceedings of the 19th Annual conference on DNA computing, 2013.

[11] R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of research and development, 5 (3):183–191, 1961.

[12] L. Qian, D. Soloveichik, and E. Winfree. Efficient Turing-universal computation with DNA polymers (extended abstract) . In Proceedings of the 16th Annual conference on DNA computing, pages 123–140, 2010.

[13] L. Qian and E. Winfree. Scaling up digital circuit computation with DNA strand displacement cascades. Science, 332 (6034):1196–1201, 2011.

[14] L. Qian, E. Winfree, and J. Bruck. Neural network computation with DNA strand displacement cascades. Nature, 475 (7356):368–372, 2011.

[15] G. Seelig, D. Soloveichik, D.Y. Zhang, and E. Winfree. Enzyme-free nucleic acid logic circuits. Science, 314 (5805):1585–1588, 2006.

[16] S. W. Shin. Compiling and verifying DNA-based chemical reaction network implementations. Master’s thesis. California Insitute of Technology, 2011.

[17] D. Soloveichik, M. Cook, E. Winfree, and J. Bruck. Computation with finite stochastic chemical reaction networks. Natural Computing, 7 (4):615–633, 2008.

[18] C. Thachuk. Space and energy efficient molecular programming. PhD thesis, University of British Columbia, 2012.

[19] C. Thachuk and A. Condon. Space and energy efficient computation with DNA strand displacement systems. In Proceedings of the 18th Annual International Conference on DNA computing and Molecular Programming, 2012.

[20] G. Zavattaro and L. Cardelli. Termination Problems in Chemical Kinetics. In Proceedings of the 2008 Conference on Concurrency Theory, pages 477–491, 2008.


Networks of Dynamical Systems

18 March, 2014

guest post by Eugene Lerman

Hi, I’m Eugene Lerman. I met John back in the mid 1980s when John and I were grad students at MIT. John was doing mathematical physics and I was studying symplectic geometry. We never talked about networks. Now I teach in the math department at the University of Illinois at Urbana, and we occasionally talk about networks on his blog.

A few years ago a friend of mine who studies locomotion in humans and other primates asked me if I knew of any math that could be useful to him.

I remember coming across an expository paper on ‘coupled cell networks’:

• Martin Golubitsky and Ian Stewart, Nonlinear dynamics of networks: the groupoid formalism, Bull. Amer. Math. Soc. 43 (2006), 305–364.

In this paper, Golubitsky and Stewart used the study of animal gaits and models for the hypothetical neural networks called ‘central pattern generators’ that give rise to these gaits to motivate the study of networks of ordinary differential equations with symmetry. In particular they were interested in ‘synchrony’. When a horse trots, or canters, or gallops, its limbs move in an appropriate pattern, with different pairs of legs moving in synchrony:


They explained that synchrony (and the patterns) could arise when the differential equations have finite group symmetries. They also proposed several systems of symmetric ordinary differential equations that could generate the appropriate patterns.

Later on Golubitsky and Stewart noticed that there are systems of ODEs that have no group symmetries but whose solutions nonetheless exhibit certain synchrony. They found an explanation: these ODEs were ‘groupoid invariant’. I thought that it would be fun to understand what ‘groupoid invariant’ meant and why such invariance leads to synchrony.

I talked my colleague Lee DeVille into joining me on this adventure. At the time Lee had just arrived at Urbana after a postdoc at NYU. After a few years of thinking about these networks Lee and I realized that strictly speaking one doesn’t really need groupoids for these synchrony results and it’s better to think of the social life of networks instead. Here is what we figured out—a full and much too precise story is here:

• Eugene Lerman and Lee DeVille, Dynamics on networks of manifolds.

Let’s start with an example of a class of ODEs with a mysterious property:

Example. Consider this ordinary differential equation for a function \vec{x} : \mathbb{R} \to {\mathbb{R}}^3

\begin{array}{rcl}  \dot{x}_1&=& f(x_1,x_2)\\  \dot{x}_2&=& f(x_2,x_1)\\  \dot{x}_3&=& f(x_3, x_2)  \end{array}

for some function f:{\mathbb{R}}^2 \to {\mathbb{R}}. It is easy to see that a function x(t) solving

\displaystyle{  \dot{x} = f(x,x)  }

gives a solution of these equations if we set

\vec{x}(t) = (x(t),x(t),x(t))

You can think of the differential equations in this example as describing the dynamics of a complex system built out of three interacting subsystems. Then any solution of the form

\vec{x}(t) = (x(t),x(t),x(t))

may be thought of as a synchronization: the three subsystems are evolving in lockstep.

One can also view the result geometrically: the diagonal

\displaystyle{  \Delta = \{(x_1,x_2, x_3)\in {\mathbb{R}}^3 \mid x_1 =x_2 = x_3\}  }

is an invariant subsystem of the continuous-time dynamical system defined by the differential equations. Remarkably enough, such a subsystem exists for any choice of a function f.

Where does such a synchronization or invariant subsystem come from? There is no apparent symmetry of {\mathbb{R}}^3 that preserves the differential equations and fixes the diagonal \Delta, and thus could account for this invariant subsystem. It turns out that what matters is the structure of the mutual dependencies of the three subsystems making up the big system. That is, the evolution of x_1 depends only on x_1 and x_2, the evolution of x_2 depends only on x_2 and x_3, and the evolution of x_3 depends only on x_3 and x_2.

These dependencies can be conveniently pictured as a directed graph:

The graph G has no symmetries. Nonetheless, the existence of the invariant subsystem living on the diagonal \Delta can be deduced from certain properties of the graph G. The key is the existence of a surjective map of graphs

\varphi :G\to G'

from our graph G to a graph G' with exactly one node, call it a, and one arrow. It is also crucial that \varphi has the following lifting property: there is a unique way to lift the one and only arrow of G' to an arrow of G once we specify the target node of the lift.

We now formally define the notion of a ‘network of phase spaces’ and of a continuous-time dynamical system on such a network. Equivalently, we define a ‘network of continuous-time dynamical systems’. We start with a directed graph

G=\{G_1\rightrightarrows G_0\}

Here G_1 is the set of edges, G_0 is the set of nodes, and the two arrows assign to an edge its source and target, respectively. To each node we attach a phase space (or more formally a manifold, perhaps with boundary or corners). Here ‘attach’ means that we choose a function {\mathcal P}:G_0 \to {\mathsf{PhaseSpace}}; it assigns to each node a\in G_0 a phase space {\mathcal P}(a).

In our running example, to each node of the graph G we attach the real line {\mathbb{R}}. (If we think of the set G_0 as a discrete category and {\mathsf{PhaseSpace}} as a category of manifolds and smooth maps, then {\mathcal P} is simply a functor.)

Thus a network of phase spaces is a pair (G,{\mathcal P}), where G is a directed graph and {\mathcal P} is an assignment of phase spaces to the nodes of G.

We think of the collection \{{\mathcal P}(a)\}_{a\in G_0} as the collection of phase spaces of the subsystems constituting the network (G, {\mathcal P}). We refer to {\mathcal P} as a phase space function. Since the state of the network should be determined completely and uniquely by the states of its subsystems, it is reasonable to take the total phase space of the network to be the product

\displaystyle{  {\mathbb{P}}(G, {\mathcal P}):= \bigsqcap_{a\in G_0} {\mathcal P}(a).  }

In the example the total phase space of the network (G,{\mathcal P}) is {\mathbb{R}}^3, while the phase space of the network (G', {\mathcal P}') is the real line {\mathbb{R}}.

Finally we need to interpret the arrows. An arrow b\xrightarrow{\gamma}a in a graph G should encode the fact that the dynamics of the subsystem associated to the node a depends on the states of the subsystem associated to the node b. To make this precise requires the notion of an ‘open system’, or ‘control system’, which we define below. It also requires a way to associate an open system to the set of arrows coming into a node/vertex a.

To a first approximation an open system (or control system, I use the two terms interchangeably) is a system of ODEs depending on parameters. I like to think of a control system geometrically: a control system on a phase space M controlled by the the phase space U is a map

F: U\times M \to TM

where TM is the tangent bundle of the space M, so that for all (u,m)\in U\times M, F(u,m) is a vector tangent to M at the point m. Given phase spaces U and M the set of all corresponding control systems forms a vector space which we denote by

\displaystyle{ \mathsf{Ctrl}(U\times M \to M)}

(More generally one can talk about the space of control systems associated with a surjective submersion Q\to M. For us, submersions of the form U\times M \to M are general enough.)

To encode the incoming arrows, we introduce the input tree I(a) (this is a very short tree, a corolla if you like). The input tree of a node a of a graph G is a directed graph whose arrows are precisely the arrows of G coming into the vertex a, but any two parallel arrows of G with target a will have disjoint sources in I(a). In the example the input tree I of the one node of a of G' is the tree

There is always a map of graphs \xi:I(a) \to G. For instance for the input tree in the example we just discussed, \xi is the map

Consequently if (G,{\mathcal P}) is a network and I(a) is an input tree of a node of G, then (I(a), {\mathcal P}\circ \xi) is also a network. This allows us to talk about the phase space {\mathbb{P}} I(a) of an input tree. In our running example,

{\mathbb{P}} I(a) = {\mathbb{R}}^2

Given a network (G,{\mathcal P}), there is a vector space \mathsf{Ctrl}({\mathbb{P}} I(a)\to {\mathbb{P}} a) of open systems associated with every node a of G.

In our running example, the vector space associated to the one node a of (G',{\mathcal P}') is

\mathsf{Ctrl}({\mathbb{R}}^2, {\mathbb{R}})  \simeq C^\infty({\mathbb{R}}^2, {\mathbb{R}})

In the same example, the network (G,{\mathcal P}) has three nodes and we associate the same vector space C^\infty({\mathbb{R}}^2, {\mathbb{R}}) to each one of them.

We then construct an interconnection map

\displaystyle{  {\mathcal{I}}: \bigsqcap_{a\in G_0} \mathsf{Ctrl}({\mathbb{P}} I(a)\to {\mathbb{P}} a) \to \Gamma (T{\mathbb{P}}(G, {\mathcal P})) }

from the product of spaces of all control systems to the space of vector fields

\Gamma (T{\mathbb{P}} (G, {\mathcal P}))

on the total phase space of the network. (We use the standard notation to denote the tangent bundle of a manifold R by TR and the space of vector fields on R by \Gamma (TR)). In our running example the interconnection map for the network (G',{\mathcal P}') is the map

\displaystyle{  {\mathcal{I}}: C^\infty({\mathbb{R}}^2, {\mathbb{R}}) \to C^\infty({\mathbb{R}}, {\mathbb{R}}), \quad f(x,u) \mapsto f(x,x).  }

The interconnection map for the network (G,{\mathcal P}) is the map

\displaystyle{  {\mathcal{I}}: C^\infty({\mathbb{R}}^2, {\mathbb{R}})^3 \to C^\infty({\mathbb{R}}^3, {\mathbb{R}}^3)}

given by

\displaystyle{  ({\mathscr{I}}(f_1,f_2, f_3))\,(x_1,x_2, x_3) = (f_1(x_1,x_2), f_2(x_2,x_1),  f_3(x_3,x_2)).  }

To summarize: a dynamical system on a network of phase spaces is the data (G, {\mathcal P}, (w_a)_{a\in G_0} ) where G=\{G_1\rightrightarrows G_0\} is a directed graph, {\mathcal P}:{\mathcal P}:G_0\to {\mathsf{PhaseSpace}} is a phase space function and (w_a)_{a\in G_0} is a collection of open systems compatible with the graph and the phase space function. The corresponding vector field on the total space of the network is obtained by interconnecting the open systems.

Dynamical systems on networks can be made into the objects of a category. Carrying this out gives us a way to associate maps of dynamical systems to combinatorial data.

The first step is to form the category of networks of phase spaces, which we call {\mathsf{Graph}}/{\mathsf{PhaseSpace}}. In this category, by definition, a morphism from a network (G,{\mathcal P}) to a network (G', {\mathcal P}') is a map of directed graphs \varphi:G\to G' which is compatible with the phase space functions:

\displaystyle{  {\mathcal P}'\circ \varphi = {\mathcal P}.  }

Using the universal properties of products it is easy to show that a map of networks \varphi: (G,{\mathcal P})\to (G',{\mathcal P}') defines a map {\mathbb{P}}\varphi of total phase spaces in the opposite direction:

\displaystyle{  {\mathbb{P}} \varphi: {\mathbb{P}} G' \to {\mathbb{P}} G.  }

In the category theory language the total phase space assignment extends to a contravariant functor

\displaystyle{ {\mathbb{P}}:  {({\mathsf{Graph}}/{\mathsf{Region}})}^{\mbox{\sf {\tiny {op}}}} \to  {\mathsf{Region}}.  }

We call this functor the total phase space functor. In our running example, the map

{\mathbb{P}}\varphi:{\mathbb{R}} = {\mathbb{P}}(G',{\mathcal P}') \to  {\mathbb{R}}^3 = {\mathbb{P}} (G,{\mathcal P})

is given by

\displaystyle{  {\mathbb{P}} \varphi (x) = (x,x,x).  }

Continuous-time dynamical systems also form a category, which we denote by \mathsf{DS}. The objects of this category are pairs consisting of a phase space and a vector field on this phase space. A morphism in this category is a smooth map of phase spaces that intertwines the two vector fields. That is:

\displaystyle{  \mathrm{Hom}_\mathsf{DS} ((M,X), (N,Y))   = \{f:M\to N \mid Df \circ X = Y\circ f\}  }

for any pair of objects (M,X), (N,Y) in \mathsf{DS}.

In general, morphisms in this category are difficult to determine explicitly. For example if (M, X) = ((a,b), \frac{d}{dt}) then a morphism from (M,X) to some dynamical system (N,Y) is simply a piece of an integral curve of the vector field Y defined on an interval (a,b). And if (M, X) = (S^1, \frac{d}{d\theta}) is the constant vector field on the circle then a morphism from (M,X) to (N,Y) is a periodic orbit of Y. Proving that a given dynamical system has a periodic orbit is usually hard.

Consequently, given a map of networks

\varphi:(G,{\mathcal P})\to (G',{\mathcal P}')

and a collection of open systems

\{w'_{a'}\}_{a'\in G'_0}

on (G',{\mathcal P}') we expect it to be very difficult if not impossible to find a collection of open systems \{w_a\}_{a\in G_0} so that

\displaystyle{  {\mathbb{P}} \varphi: ({\mathbb{P}} G', {\mathscr{I}}' (w'))\to ({\mathbb{P}} G, {\mathscr{I}} (w))  }

is a map of dynamical systems.

It is therefore somewhat surprising that there is a class of maps of graphs for which the above problem has an easy solution! The graph maps of this class are known by several different names. Following Boldi and Vigna we refer to them as graph fibrations. Note that despite what the name suggests, graph fibrations in general are not required to be surjective on nodes or edges. For example, the inclusion

is a graph fibration. We say that a map of networks

\varphi:(G,{\mathcal P})\to (G',{\mathcal P}')

is a fibration of networks if \varphi:G\to G' is a graph fibration. With some work one can show that a fibration of networks induces a pullback map

\displaystyle{  \varphi^*: \bigsqcap_{b\in G_0'} \mathsf{Ctrl}({\mathbb{P}} I(b)\to {\mathbb{P} b) \to  \bigsqcap_{a\in G_0} \mathsf{Ctrl}({\mathbb{P}}} I(a)\to {\mathbb{P}} a)  }

on the sets of tuples of the associated open systems. The main result of Dynamics on networks of manifolds is a proof that for a fibration of networks \varphi:(G,{\mathcal P})\to (G',{\mathcal P}') the maps \varphi^*, {\mathbb{P}} \varphi and the two interconnection maps {\mathcal{I}} and {\mathcal{I}}' are compatible:

Theorem. Let \varphi:(G,{\mathcal P})\to (G',{\mathcal P}') be a fibration of networks of manifolds. Then the pullback map

\displaystyle{ \varphi^*: \mathsf{Ctrl}(G',{\mathcal P}')\to \mathsf{Ctrl}(G,{\mathcal P})  }

is compatible with the interconnection maps

\displaystyle{  {\mathcal{I}}': \mathsf{Ctrl}(G',{\mathcal P}')) \to \Gamma (T{\mathbb{P}} G') }

and

\displaystyle{  {\mathcal{I}}:  (\mathsf{Ctrl}(G,{\mathcal P})) \to \Gamma (T{\mathbb{P}} G)  }

Namely for any collection w'\in \mathsf{Ctrl}(G',{\mathcal P}') of open systems on the network (G', {\mathcal P}') the diagram

commutes. In other words,

\displaystyle{ {\mathbb{P}} \varphi: ({\mathbb{P}}  (G',{\mathcal P}'), {\mathscr{I}}' (w'))\to ({\mathbb{P}} (G,  {\mathcal P}), {\mathscr{I}} (\varphi^* w')) }

is a map of continuous-time dynamical systems, a morphism in \mathsf{DS}.

At this point the pure mathematician in me is quite happy: I have a theorem! Better yet, the theorem solves the puzzle at the beginning of this post. But if you have read this far, you may well be wondering: “Ok, you told us about your theorem. Very nice. Now what?”

There is plenty to do. On the practical side of things, the continuous-time dynamical systems are much too limited for contemporary engineers. Most of the engineers I know care a lot more about hybrid systems. These kinds of systems exhibit both continuous time behavior and abrupt transitions, hence the name. For example, anti-lock breaks on a car is just that kind of a system — if a sensor detects that the car is skidding and the foot break is pressed, it starts pulsing the breaks. This is not your father’s continuous time dynamical system! Hybrid dynamical systems are very hard to understand. They have been all the rage in the engineering literature for the last 10-15 years. Sadly, most pure mathematicians never heard of them. It would be interesting to extend the theorem I am bragging about to hybrid systems.

Here is another problem that may be worth thinking about: how much of the theorem holds up to numerical simulation? My feeling is that any explicit numerical integration method will behave well. Implicit methods I am not sure about.

And finally a more general issue. John has been talking about networks quite a bit on this blog. But his networks and my networks look like very different mathematical structures. What do they have in common besides nodes and arrows? What mathematical structure are they glimpses of?


Network Theory II

12 March, 2014

 

Chemists are secretly doing applied category theory! When chemists list a bunch of chemical reactions like

C + O₂ → CO₂

they are secretly describing a ‘category’.

That shouldn’t be surprising. A category is simply a collection of things called objects together with things called morphisms going from one object to another, often written

f: x → y

The rules of a category say:

1) we can compose a morphism f: x → y and another morphism g: y → z to get an arrow gf: x → z,

2) (hg)f = h(gf), so we don’t need to bother with parentheses when composing arrows,

3) every object x has an identity morphism 1ₓ: x → x that obeys 1ₓ f = f and f 1ₓ = f.

Whenever we have a bunch of things (objects) and processes (arrows) that take one thing to another, we’re likely to have a category. In chemistry, the objects are bunches of molecules and the arrows are chemical reactions. But we can ‘add’ bunches of molecules and also add reactions, so we have something more than a mere category: we have something called a symmetric monoidal category.

My talk here, part of a series, is an explanation of this viewpoint and how we can use it to take ideas from elementary particle physics and apply them to chemistry! For more details try this free book:

• John Baez and Jacob Biamonte, A Course on Quantum Techniques for Stochastic Mechanics.

as well as this paper on the Anderson–Craciun–Kurtz theorem (discussed in my talk):

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!


Network Theory I

2 March, 2014

 

Here’s a video of a talk I gave last Tuesday—part of a series. You can see the slides here:

Network Theory I: electrical circuits and signal-flow graphs.

Click on items in blue, or pictures, for more information.

One reason I’m glad I gave this talk is because afterwards Jamie Vicary pointed out some very interesting consequences of the relations among signal-flow diagrams listed in my talk. It turns out they imply equations familiar from the theory of complementarity in categorical quantum mechanics!

This is the kind of mathematical surprise that makes life worthwhile for me. It seemed utterly shocking at first, but I think I’ve figured out why it happens. Now is not the time to explain… but I’ll have to do it soon, both here and in the paper that Jason Eberle are writing about control theory.

For now, besides the slides, the best place to read more about this program is here:

• Brendan Fong, A compositional approach to control theory.


Markov Models of Social Change (Part 1)

24 February, 2014

guest post by Alastair Jamieson-Lane

The world is complex, and making choices in a complex world is sometimes difficult.

As any leader knows, decisions must often be made with incomplete information. To make matters worse, the experts and scientists who are meant to advise on these important matters are also doing so with incomplete information—usually limited to only one or two specialist fields. When decisions need to be made that are dependent on multiple real-world systems, and your various advisors find it difficult to communicate, this can be problematic!

The generally accepted approach is to listen to whichever advisor tells you the things you want to hear.

When such an approach fails (for whatever mysterious and inexplicable reason) it might be prudent to consider such approaches as Bayesian inference, analysis of competing hypotheses or cross-impact balance analysis.

Because these methods require experts to formalize their opinions in an explicit, discipline neutral manner, we avoid many of the problems mentioned above. Also, if everything goes horribly wrong, you can blame the algorithm, and send the rioting public down to the local university to complain there.

In this blog article I will describe cross-impact balance analysis and a recent extension to this method, explaining its use, as well as some basic mathematical underpinnings. No familiarity with cross-impact balance analysis will be required.

Wait—who is this guy?

Since this is my first time writing a blog post here, I hear introductions are in order.

Hi. I’m Alastair.

I am currently a Master’s student at the University of British Columbia, studying mathematics. In particular, I’m aiming to use evolutionary game theory to study academic publishing and hiring practices… and from there hopefully move on to studying governments (we’ll see how the PhD goes). I figure that both those systems seem important to solving the problems we’ve built for ourselves, and both may be under increasing pressure in coming years.

But that’s not what I’m here for today! Today I’m here to tell the story of cross-impact balance analysis, a tool I was introduced to at the complex systems summer school in Santa Fe.

The story

Suppose (for example) that the local oracle has foretold that burning the forests will anger the nature gods

… and that if you do not put restrictions in place, your crops will wither and die.

Well, that doesn’t sound very good.

The merchant’s guild claims that such restrictions will cause all trade to grind to a halt.

Your most trusted generals point out that weakened trade will leave you vulnerable to invasion from all neighboring kingdoms.

The sailors guild adds that the wrath of Poseidon might make nautical trade more difficult.

The alchemists propose alternative sources of heat…

… while the druids propose special crops as a way of resisting the wrath of the gods…

… and so on.

Given this complex web of interaction, it might be a good time to consult the philosophers.

Overview of CIB

This brings us to the question of what CIB (Cross-Impact Balance) analysis is, and how to use it.

At its heart, CIB analysis demands this: first, you must consider what aspects of the world you are interested in studying. This could be environmental or economic status, military expenditure, or the laws governing genetic modification. These we refer to as “descriptors”. For each “descriptor” we must create a list of possible “states”.

For example, if the descriptor we are interested in were “global temperature change” our states might be “+5 degree”, “+4 degrees” and so on down to “-2 degrees”.

The states of a descriptor are not meant to be all-encompassing, or offer complete detail, and they need not be numerical. For example, the descriptor “Agricultural policy” might have such states as “Permaculture subsidy”, “Genetic engineering”, “Intensive farming” or “No policy”.

For each of these states, we ask our panel of experts whether such a state would increase or decrease the tendency for some other descriptor to be in a particular state.

For example, we might ask: “On a scale from -3 to 3, how much does the agricultural policy of Intensive farming increase the probability that we will see global temperature increases of +2 degrees?”

By combining the opinions of a variety of experts in each field, and weighting based on certainty and expertise, we are able to construct matrices, much like the one below:

The above matrix is a description of my ant farm. The health of my colony is determined by the population, income, and education levels of my ants. For a less ant focused version of the above, please refer to:

• Elisabeth A. Lloyd and Vanessa J. Schweizer, Objectivity and a comparison of methodological scenario approaches for climate change research, Synthese (2013).

For any possible combination of descriptor states (referred to as a scenario) we can calculate the total impact on all possible descriptors. In the current scenario we have low population, high income and medium education (see highlighted rows).

Because the current scenario has high ant income, this strongly influences us to have low population (+3) and prevents a jump to high population (-3). This combined with the non-influence from education (zeros) leads to low population being the most favoured state for our population descriptor. Thus we expect no change. We say this is “consistent”.

Education however sees a different story. Here we have a strong influence towards high education levels (summing the column gives a total of 13). Thus our current state (medium education) is inconsistent, and we would expect the abundance of ant wealth to lead to an improvements in the ant schooling system.

Classical CIB analysis acts as a way to classify which hypothetical situations are consistent, and which are not.

Now, it is all well and good to claim that some scenarios are stable, but the real use of such a tool is in predicting (and influencing) the future.

By applying a deterministic rule that determines how inconsistencies are resolved, we can produce a “succession rule”. The most straight-forward example is to replace all descriptor states with whichever state is most favoured by the current scenario. In the example above we would switch to “Low population, medium income, high education”. A generation later we would switch back to “Low population, High income, medium education”, soon finding ourselves trapped in a loop.

All such rules will always lead to either a loop or a “sink”: a self consistent scenario which is succeeded only by itself.

So, how can we use this? How will this help us deal with the wrath of the gods (or ant farms)?

Firstly: we can identify loops and consistent scenarios which we believe are most favourable. It’s all well and good imagining some future utopia, but if it is inconsistent with itself, and will immediately lead to a slide into less favourable scenarios then we should not aim for it, we should find that most favourable realistic scenario and aim for that one.

Secondly: We can examine all our consistent scenarios, and determine whose “basin of attraction” we find ourselves in: that is, which scenario are we likely to end up in.

Thirdly: Suppose we could change our influence matrix slightly? How would we change it to favour scenarios we most prefer? If you don’t like the rules, change the game—or at the very least find out WHAT we would need to change to have the best effect.

Concerns and caveats

So… what are the problems we might encounter? What are the drawbacks?

Well, first of all, we note that the real world does not tend to reach any form of eternal static scenario or perfect cycle. The fact that our model does might be regarded as reason for suspicion.

Secondly, although the classical method contains succession analysis, this analysis is not necessarily intended as a completely literal “prediction” of events. It gives a rough idea of the basins of attraction of our cycles and consistent scenarios, but is also somewhat arbitrary. What succession rule is most appropriate? Do all descriptors update simultaneously? Or only the one with the most “pressure”? Are our descriptors given in order of malleability, and only the fastest changing descriptor will change?

Thirdly, in collapsing our description of the world down into a finite number of states we are ignoring many tiny details. Most of these details are not important, but in assuming that our succession rules are deterministic, we imply that these details have no impact whatsoever.

If we instead treat succession as a somewhat random process, the first two of these problems can be solved, and the third somewhat reduced.

Stochastic succession

In the classical CIB succession analysis, some rule is selected which deterministically decides which scenario follows from the present. Stochastic succession analysis instead tells us the probability that a given scenario will lead to another.

The simplest example of a stochastic succession rule is to simply select a single descriptor at random each time step, and only consider updates that might happen to that descriptor. This we refer to as dice succession. This (in some ways) represents hidden information: two systems that might look identical on the surface from the point of view of our very blockish CIB analysis might be different enough underneath to lead to different outcomes. If we have a shaky agricultural system, but a large amount of up-and-coming research, then which of these two factors becomes important first is down to the luck of the draw. Rather than attempt to model this fine detail, we instead merely accept it and incorporate this uncertainty into our model.

Even this most simplistic change leads to dramatics effects on our system. Most importantly, almost all cycles vanish from our results, as forks in the road allow us to diverge from the path of the cycle.

We can take stochastic succession further and consider more exotic rules for our transitions, ones that allow any transition to take place, not merely those that are most favored. For example:

P(x,y) = A e^{I_x(y)/T}

Here x is our current scenario, y is some possible future scenario, and I_x(y) is the total impact score of y from the perspective of x. A is a simple normalizing constant, and T is our system’s temperature. High temperature systems are dominated by random noise, while low temperature systems are dominated by the influences described by our experts.

Impact score is calculated by summing the impact of each state of our current scenario, on each state of our target scenario. For example, for the above, suppose we want to find I_x(y) when x is the given scenario “Low population, High income, medium education” and y was the scenario “Medium population, medium income, High education”. We consider all numbers that are in rows which were states of x and in columns that are states of y. This would give:

I_x(y)= (0+0+0) + (-2 +0 +10) +(6+7+0) = 21

Here each bracket refers to the sum of a particular column.
More generically we can write the formula as:

\displaystyle{ I_x(y)= \sum_{i \subset x, \;j \subset y} M_{i,j} }

Here M_{i,j} refers to an entry in our cross-impact balance matrix, i and j are both states, and i \subset x reads as “i is a state of x”.

We refer to this function for computing transition probabilities as the Boltzmann succession law, due to its similarity to the Boltzmann distribution found in physics. We use it merely as an example, and by no means wish to imply that we expect the transitions for our true system to act in a precisely Boltzmann-like manner. Alternative functions can, and should, be experimented with. The Boltzmann succession law is however an effective example and has a number of nice properties: P(x,y) is always positive, unchanged by adding a constant to every element of the cross-impact balance matrix, contains adjustable parameters, and unbounded above.

The Boltzmann succession rule is what I will refer to as fully stochastic: it allows transitions even against our experts’ judgement (with low probability). This is in contrast to dice succession which picks a direction at random, but still contains scenarios from which our system can not escape.

Effects of stochastic succession

‘Partially stochastic’ processes such as the dice rule have very limited effect on the long term behavior of the model. Aside from removing most cycles, they behave almost exactly like our deterministic succession rules. So, let us instead discuss the more interesting fully stochastic succession rules.

In the fully stochastic system we can ask “after a very long time, what is the probability we will be in scenario x?”

By asking this question we can get some idea of the relative importance of all our future scenarios and states.

For example, if the scenario “high population, low education, low income” has a 40% probability in the long term, while most other scenarios have a probability of 0.2%, we can see that this scenario is crucial to the understanding of our system. Often scenarios already identified by deterministic succession analysis are the ones with the greatest long term probability—but by looking at long term probability we also gain information about the relative importance of each scenario.

In addition, we can encounter scenarios which are themselves inconsistent, but form cycles and/or clusters of interconnected scenarios. We can also notice scenarios that while technically ‘consistent’ in the deterministic rules are only barely so, and have limited weight due to a limited basin of attraction. We might identify scenarios that seem familiar in the real world, but are apparently highly unlikely in our analysis, indicating either that we should expect change… or perhaps suggesting a missing descriptor or a cross-impact in need of tweaking.

Armed with such a model, we can investigate what we can do to increase the short term and long term likelihood of desirable scenarios, and decrease the likelihood of undesirable scenarios.

Some further reading

As a last note, here are a few freely available resources that may prove useful. For a more formal introduction to CIB, try:

• Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

• Wolfgang Weimer-Jehle, Properties of cross-impact balance analysis.

You can find free software for doing a classical CIB analysis here:

• ZIRIUS, ScenarioWizard.

ZIRIUS is the Research Center for Interdisciplinary Risk and Innovation Studies of the University of Stuttgart.

Here are some examples of CIB in action:

• Gerhard Fuchs, Ulrich Fahl, Andreas Pyka, Udo Staber, Stefan Voegele and Wolfgang Weimer-Jehle, Generating innovation scenarios using the cross-impact methodology, Department of Economics, University of Bremen, Discussion-Papers Series No. 007-2008.

• Ortwin Renn, Alexander Jager, Jurgen Deuschle and Wolfgang Weimer-Jehle, A normative-functional concept of sustainability and its indicators, International Journal of Global Environmental Issues, 9 (2008), 291–317.

Finally, this page contains a more complete list of articles, both practical and theoretical:

• ZIRIUS, Cross-impact balance analysis: publications.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers