The Logic of Real and Complex Numbers

8 September, 2014

I’ve always liked logic. I studied it a bunch in high school and college. Nowadays it’s a kind of hobby. I turn to it for relief sometimes when I become frustrated trying to figure out what I can do about global warming. Lately I’ve been digging a bit deeper into the logic behind the real and complex numbers. And I’m teaching a graduate course on real analysis this fall, so I actually have a slight excuse for doing this.

There’s something about logic that’s both fascinated and terrified me ever since I was a kid: it’s how we can’t fully pin down infinite structures, like the real or complex number systems, using a language with finitely many symbols and a theory with finitely many axioms.

It’s terrifying that we don’t fully know what we’re talking about when we’re talking about numbers! But it’s fascinating that we can understand a lot about the limitations.

There are many different things to say about this, depending on what features of these number systems we want to describe, and what kind of logic we want to use.

Maybe I should start with the natural numbers, since that story is more famous. This can also serve as a lightning review of some basic concepts which I’ll pretend you already vaguely know: first-order versus second-order logic, proofs versus models, and so on. If you don’t know these, you can either fake it or read some of the many links in this article!

Natural numbers

When Peano originally described the natural numbers he did so using axioms phrased in second-order logic. In first-order logic we can quantify over variables: for example, we can say

\forall x \; (P(x)) \; \Rightarrow \; P(y) \;

which means that if the predicate P holds for all x it holds for any variable y. In second-order logic we can also quantify over predicates: for example, we can say

\forall P \; (P(x) \Leftrightarrow P(y)) \; \Leftrightarrow \; x = y

which says that x = y if and only if for every predicate P, P(x) is true precisely when P(y) is true. Leibniz used this principle, called the identity of indiscernibles, to define equality… and this is a nice example of the greater power of second-order logic. In first-order logic we typically include equality as part of the language and add axioms describing its properties, like

\forall x \; \forall y \; (x = y \Leftrightarrow y = x)

In second-order logic we can define equality and prove these properties starting from the properties we already have for \Leftrightarrow.

Anyway, in his axioms for the natural numbers, Peano used second-order logic to formulate the principle of mathematical induction in this sort of way:

\forall P \; \big[ P(0) \; \& \; \forall n \; ((P(n) \Rightarrow P(n+1)) \; \; \Rightarrow \; \; \forall n \; P(n))\big]

This says that if you’ve got any predicate that’s true for 0 and is true for n+1 whenever it’s true for n, then it’s true for all natural numbers.

In 1888, Dedekind showed that Peano’s original axioms for the natural numbers are categorical, meaning all its models are isomorphic.

The concept of ‘model’ involves set theory. In a model you pick a set S for your variables to range over, pick a subset of S for each predicate—namely the subset where that predicate is true —and so on, in such a way that all the axioms in that theory are satisfied. If two models are isomorphic, they’re the same for all practical purposes.

So, in simple rough terms, a categorical theory is one that gives a full description of the mathematical structure it’s talking about.

This makes Dedekind’s result sound like great news. It sounds like Peano’s original second-order axioms for arithmetic completely describe the natural numbers.

However, there’s an important wrinkle. There are many inherently undetermined things about set theory! So in fact, a categorical theory only gives a full description of the mathematical structure it’s talking about relative to a choice of what sets are like.

So, Dedekind’s result just shoves everything mysterious and undetermined about the natural numbers under the carpet: they become mysterious and undetermined things about set theory. This became clear much later, thanks to Gödel and others. And in the process, it became clear that second-order logic is a bit problematic compared to first-order logic.

You see, first-order logic has a set of deduction rules that are:

sound: Every provable sentence holds in every model.

semantically complete: Every sentence that holds in every model is provable.

effective: There is an algorithm that can correctly decide whether any given sequence of symbols is a proof.

Second-order logic does not! It’s ‘too powerful’ to also have all three of these nice properties.

So, these days people often work with a first-order version of Peano’s axioms for arithmetic. Instead of writing down a single axiom for mathematical induction:

\forall P \; \big[ P(0) \; \& \; \forall n \; P(n) \Rightarrow P(n+1)) \; \; \Rightarrow \;\; \forall n \; (P(n))\big]

we write down an axiom schema—an infinite list of axioms—with one axiom like this:

\phi(0) \; \& \; \forall n \; (\phi(n) \Rightarrow \phi(n+1)) \; \; \Rightarrow \;\; \forall n \; (\phi(n))

for each formula \phi that we can actually write down using the language of arithmetic.

This first-order version of Peano arithmetic is not categorical: it has lots of nonisomorphic models. People often pretend there’s one ‘best’ model: they call it the ‘standard’ natural numbers, and call all the others ‘nonstandard’. But there’s something a bit fishy about this.

Indeed, Gödel’s first incompleteness theorem says there are many statements about natural numbers that can neither be proved nor disproved starting from Peano’s axioms. It follows that for any such statement we can find a model of the Peano axioms in which that statement holds, and also a model in which it does not.

Furthermore, this remains true even if we add any list of extra axioms to Peano arithmetic, as long as there’s some algorithm that can list all these axioms.

So, I’d prefer to say there are many different ‘versions’ of the natural numbers, just as there are many different groups.

We can study these different versions, and it’s a fascinating subject:

• Wikipedia, Nonstandard models of arithmetic.

However, I want to talk about the situation for other number systems!

The real numbers

The situation is better for the real numbers—at least if we are willing to think about them in a ‘purely algebraic’ way, leaving most analysis behind.

To do this, we can use the theory of a ‘real closed field’. This is a list of axioms, formulated in first-order logic, which describe how +, \times, 0, 1 and \le work for the real numbers. You can think of these axioms as consisting of three parts:

• the field axioms: the usual algebraic identities involving +, \times, 0 and 1 together with laws saying that everything has an additive inverse and everything except 0 has a multiplicative inverse.

• the formally real field axiom, saying that -1 is not the square of anything. This implies that we can equip the field with a concept of \le that makes it into an ordered field—but not necessarily in a unique way.

• the real closed field axioms, which says that also for any number x, either x or -x has a square root, and every polynomial of odd degree has a root. Among other things this implies our field can be made into an ordered field in a unique way. To do this, we say x \le y if and only if y - x has a square root.

Tarski showed this theory is complete: any first-order sentence involving only the operations +, \times, 0, 1 and the relation \le can either be proved or disproved starting from the above axioms.

Nonetheless, the theory of real closed fields is not categorical: besides the real numbers, there are many other models! These models are all elementarily equivalent: any sentence involving just +, \times, 0, 1, \le and first-order logic that holds in one model holds in all the rest. But these models are not all isomorphic: we can’t get a bijection between them that preserves +, \times, 0, 1 and \le.

Indeed, only finite-sized mathematical structures can be ‘nailed down’ up to isomorphism by theories in first-order logic. You see, the Löwenheim–Skolem theorem says that if a first-order theory in a countable language has an infinite model, it has at least one model of each infinite cardinality. So, if we’re trying to use this kind of theory to describe an infinitely big mathematical structure, the most we can hope for is that after we specify its cardinality, the axioms completely determine it.

However, the real closed field axioms aren’t even this good. For starters, they have infinitely many nonisomorphic countable models. Here are a few:

• the algebraic real numbers: these are the real numbers that obey polynomial equations with integer coefficients.

• the computable real numbers: these are the real numbers that can be computed to arbitrary precision by a computer program.

• the arithmetical real numbers: these are the numbers definable in the language of arithmetic. More precisely, a real number x is arithmetical if there is a formula \phi in the language of first-order Peano arithmetic, with two free variables, such that

\displaystyle{ \forall m \; \forall n \; (\frac{m}{n} \le x \; \; \Leftrightarrow \; \; \phi(n,m)) }

Every computable real number is arithmetical, but not vice versa: just because you can define a real number in the above way does not mean you can actually compute it to arbitrary precision!

And indeed, there are other even bigger countable real closed fields, consisting of real numbers that are definable using more powerful methods, like second-order Peano arithmetic.

We can also get countable real closed fields using tricks like this: take the algebraic real numbers and throw in the number \pi along with just enough other numbers to get a real closed field again. Or, we could throw in both \pi and e. This probably gives a bigger real closed field—but nobody knows, because for all we know, \pi could equal e plus some rational number! Everyone believes this is false, but nobody has proved it.

There are also lots of nonisomorphic uncountable real closed fields, including ones that include the usual real numbers.

For example, we can take the real numbers and throw in an element \infty that is bigger than 1, 2, 3, \dots, and so on—and then do what it takes to get another real closed field. This involves throwing in elements like

-\infty, \; \infty + 1, \; 1/\infty, \; \infty^2, \; \sqrt{\infty}, \; \sqrt{\infty^2 + 17} , \dots

and so on. So, we get lots of infinities and infinitesimals.

It gets a bit confusing here, trying to figure out what equals what. But there’s another real closed field containing an infinite element that seems easier to manage. It’s called the field of real Puiseux series. These are series of the form

\sum_{i = k}^\infty a_i z^{i/n}

where k is any integer, perhaps negative, n is any
positive integer, and the coefficients a_i are real.

What’s z? It’s just a formal variable. But the real Puiseux series are real closed field, and z acts like 1/\infty: it’s positive, but smaller than any positive real number.

With considerably more work, we can make up a real closed field that:

• contains the real numbers,

• contains an element \infty bigger than 1,2,3, \dots, and

• obeys the transfer principle, which says that a first-order statement phrased in the usual language of set theory holds for the real numbers if and only if it holds for this other number system.

Any real closed field with these properties is called a system of hyperreal numbers. In the 1960s, the logician Abraham Robinson used them to make Leibniz’s old idea of infinitesimals in calculus fully rigorous. The resulting theory is called nonstandard analysis.

So, I hope you see there’s an exciting—or perhaps appalling—diversity of real closed fields. But don’t forget: they’re all elementarily equivalent. If a sentence involving just +, \times, 0, 1, \le and first-order logic holds in any one of these real closed fields, it holds in all of them!

You might wonder what second-order logic has to say about this.

Here the situation looks very different. In second-order logic we can do analysis, because we can quantify over predicates, which allows us to talk about subsets of real numbers. And in second-order logic we can write down a theory of real numbers that’s categorical! It’s called the theory of a Dedekind-complete ordered field. Again, we can group the axioms in three bunches:

• the field axioms: the usual algebraic identities involving +, \times, 0 and 1 together with laws saying that everything has an additive inverse and everything except 0 has a multiplicative inverse.

• the ordered field axiom, saying there is a total ordering \le such that x \le y and x' \le y' implies x + x' \le y + y' and x,y \ge 0 implies x y \ge 0.

• the Dedekind completeness axiom, which says that every nonempty subset with an upper bound has a least upper bound. But instead of talking about subsets, we talk about the predicates that hold on those subsets, so we say “for all predicates P such that…”

Because they’re categorical, people often use these axioms to define the real numbers. But because they’re second-order, the problem of many nonisomorphic models has really just been swept under the rug. If we use second-order logic, we won’t have a concept of ‘proof’ that’s sound, semantically complete and effective. And if we use first-order axioms for set theory to explicitly talk about subsets instead of predicates, then our set theory will have many models! Each model will have a version of the real numbers in it that’s unique up to isomorphism… but the versions in different models will be really different.

In fact, there’s a precise sense in which the ‘standard real numbers’ in one model of set theory can be the ‘hyperreals’ in another. This was first shown by Abraham Robinson.

The complex numbers

I mentioned that when we’re studying an infinite mathematical structure using first-order logic, the best we can hope for is to have one model of each size (up to isomorphism). The real numbers are far from being this nice… but the complex numbers come much closer!

More precisely, say \kappa is some cardinal. A first-order theory describing structure on a single set is called κ-categorical if it has a unique model of cardinality \kappa. And 1965, a logician named Michael Morley showed that if a list of axioms is \kappa-categorical for some uncountable \kappa, it’s \kappa-categorical for every uncountable \kappa. I haven’t worked my way through the proof, which seems to be full of interesting ideas. But such theories are called uncountably categorical.

A great example is the ‘purely algebraic’ theory of the complex numbers. By this I mean we only write down axioms involving +, \times, 0 and 1. We don’t include anything about \le this time, nor anything about complex conjugation. You see, if we start talking about complex conjugation we can pick out the real numbers inside the complex numbers, and then we’re more or less back to the story we had for real numbers.

This theory is called the theory of an algebraically closed field of characteristic zero. Yet again, the axioms come in three bunches:

• the field axioms.

• the characteristic zero axioms: these are an infinite list of axioms saying that

1 \ne 0, \quad 1+1 \ne 0, \quad 1+1+1 \ne 0, \dots

• the algebraically closed axioms: these say that every non-constant polynomial has a root.

Pretty much any mathematician worth their salt knows that the complex numbers are a model of these axioms, whose cardinality is that of the continuum. There are lots of different countable models: the algebraic complex numbers, the computable complex numbers, and so on. But because the above theory is uncountably categorical, there is exactly one algebraically closed field of characteristic zero of each uncountable cardinality… up to isomorphism.

This implies some interesting things.

For example, we can take the complex numbers, throw in an extra element, and let it freely generate a bigger algebraically closed field. It’s ‘bigger’ in the sense that it contains the complex numbers as a proper subset, indeed a subfield. But since it has the same cardinality as the complex numbers, it’s isomorphic to the complex numbers!

And then, because this ‘bigger’ field is isomorphic to the complex numbers, we can turn this argument around. We can take the complex numbers, remove a lot of carefully chosen elements, and get a subfield that’s isomorphic to the complex numbers.

Or, if we like, we can take the complex numbers, adjoin a really huge set of extra elements, and let them freely generate an algebraically closed field of characteristic zero. The cardinality of this field can be as big as we want. It will be determined up to isomorphism by its cardinality.

One piece of good news is that thanks to a result of Tarski, the theory of an algebraically closed field of characteristic zero is complete, and thus, all its models are elementarily equivalent. In other words, all the same first-order sentences written in the language of +, \times, 0 and 1 hold in every model.

But here’s a piece of strange news.

As I already mentioned, the theory of a real closed field is not uncountably categorical. This implies something really weird. Besides the ‘usual’ real numbers \mathbb{R} we can choose another real closed field \mathbb{R}', not isomorphic to \mathbb{R}, with the same cardinality. We can build the complex numbers \mathbb{C} using pairs of real numbers. We can use the same trick to build a field \mathbb{C}' using pairs of guys in \mathbb{R}'. But it’s easy to check that this funny field \mathbb{C}' is algebraically closed and of characteristic zero. Since it has the same cardinality as \mathbb{C}, it must be isomorphic to \mathbb{C}.

In short, different ‘versions’ of the real numbers can give rise to the same version of the complex numbers!

References

So, I hope you see that the logical foundations of the real and complex number systems are quite slippery… yet with work, we can understand a lot about this slipperiness.

Besides the references I’ve given, I just want to mention two more. First, here’s a free introductory calculus textbook based on nonstandard analysis:

• H. Jerome Keisler, Elementary Calculus: an Infinitesimal Approach, available as a website or in PDF.

And here’s an expository paper that digs deeper into uncountably categorical theories:

• Nick Ramsey, Morley’s categoricity theorem.


Exploring Climate Data (Part 1)

1 August, 2014

joint with Dara O Shayda

Emboldened by our experiments in El Niño analysis and prediction, people in the Azimuth Code Project have been starting to analyze weather and climate data. A lot of this work is exploratory, with no big conclusions. But it’s still interesting! So, let’s try some blog articles where we present this work.

This one will be about the air pressure on the island of Tahiti and in a city called Darwin in Australia: how they’re correlated, and how each one varies. This article will also be a quick introduction to some basic statistics, as well as ‘continuous wavelet transforms’.

Darwin, Tahiti and El Niños

The El Niño Southern Oscillation is often studied using the air pressure in Darwin, Australia versus the air pressure in Tahiti. When there’s an El Niño, it gets stormy in the eastern Pacific so the air temperatures tend to be lower in Tahiti and higher in Darwin. When there’s a La Niña, it’s the other way around:



The Southern Oscillation Index or SOI is a normalized version of the monthly mean air pressure anomaly in Tahiti minus that in Darwin. Here anomaly means we subtract off the mean, and normalized means that we divide by the standard deviation.

So, the SOI tends to be negative when there’s an El Niño. On the other hand, when there’s an El Niño the Niño 3.4 index tends to be positive—this says it’s hotter than usual in a certain patch of the Pacific.

Here you can see how this works:



When the Niño 3.4 index is positive, the SOI tends to be negative, and vice versa!

It might be fun to explore precisely how well correlated they are. You can get the data to do that by clicking on the links above.

But here’s another question: how similar are the air pressure anomalies in Darwin and in Tahiti? Do we really need to take their difference, or are they so strongly anticorrelated that either one would be enough to detect an El Niño?

You can get the data to answer such questions here:

Southern Oscillation Index based upon annual standardization, Climate Analysis Section, NCAR/UCAR. This includes links to monthly sea level pressure anomalies in Darwin and Tahiti, in either ASCII format (click the second two links) or netCDF format (click the first one and read the explanation).

In fact this website has some nice graphs already made, which I might as well show you! Here’s the SOI and also the sum of the air pressure anomalies in Darwin and Tahiti, normalized in some way:


(Click to enlarge.)

If the sum were zero, the air pressure anomalies in Darwin and Tahiti would contain the same information and life would be simple. But it’s not!

How similar in character are the air pressure anomalies in Darwin and Tahiti? There are many ways to study this question. Dara tackled it by taking the air pressure anomaly data from 1866 to 2012 and computing some ‘continuous wavelet transforms’ of these air pressure anomalies. This is a good excuse for explaining how a continuous wavelet transform works.

Very basic statistics

It helps to start with some very basic statistics. Suppose you have a list of numbers

x = (x_1, \dots, x_n)

You probably know how to take their mean, or average. People often write this with angle brackets:

\displaystyle{ \langle x \rangle = \frac{1}{n} \sum_{i = 1}^n x_i }

You can also calculate the mean of their squares:

\displaystyle{  \langle x^2 \rangle = \frac{1}{n} \sum_{i = 1}^n x_i^2 }

If you were naive you might think \langle x^2 \rangle = \langle x \rangle^2, but in fact we have:

\langle x^2 \rangle \ge \langle x \rangle^2

and they’re equal only if all the x_i are the same. The point is that if the numbers x_i are spread out, the squares of the big ones (positive or negative) contribute more to the average of the squares than if we had averaged them out before squaring. The difference

\langle x^2 \rangle - \langle x \rangle^2

is called the variance; it says how spread out our numbers are. The square root of the variance is the standard deviation:

\sigma_x = \sqrt{\langle x^2 \rangle - \langle x \rangle^2 }

and this has the slight advantage that if you multiply all the numbers x_i by some constant c, the standard deviation gets multiplied by |c|. (The variance gets multiplied by c^2.)

We can generalize the variance to a situation where we have two lists of numbers:

x = (x_1, \dots, x_n)

y = (y_1, \dots, y_n)

Namely, we can form the covariance

\langle x y \rangle - \langle x \rangle \langle y \rangle

This reduces to the variance when x = y. It measures how much x and y vary together — ‘hand in hand’, as it were. A bit more precisely: if x_i is greater than its mean value mainly for i such that y_i is greater than its mean value, the covariance is positive. On the other hand, if x_i tends to be greater than average when y_i is smaller than average — like with the air pressures at Darwin and Tahiti — the covariance will be negative.

For example, if

x = (1,-1), \quad y = (1,-1)

then they ‘vary hand in hand’, and the covariance

\langle x y \rangle - \langle x \rangle \langle y \rangle = 1 - 0 = 1

is positive. But if

x = (1,-1), \quad y = (-1,1)

then one is positive when the other is negative, so the covariance

\langle x y \rangle - \langle x \rangle \langle y \rangle = -1 - 0 = -1

is negative.

Of course the covariance will get bigger if we multiply both x and y by some big number. If we don’t want this effect, we can normalize the covariance and get the correlation:

\displaystyle{ \frac{ \langle x y \rangle - \langle x \rangle \langle y \rangle }{\sigma_x \sigma_y} }

which will always be between -1 and 1.

For example, if we compute the correlation between the air pressure anomalies at Darwin and Tahiti, measured monthly from 1866 to 2012, we get
-0.253727. This indicates that when one goes up, the other tends to go down. But since we’re not getting -1, it means they’re not completely locked into a linear relationship where one is some negative number times the other.

Okay, we’re almost ready for continuous wavelet transforms! Here is the main thing we need to know. If the mean of either x or y is zero, the formula for covariance simplifies a lot, to

\displaystyle{  \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }

So, this quantity says how much the numbers x_i ‘vary hand in hand’ with the numbers y_i, in the special case when one (or both) has mean zero.

We can do something similar if x, y : \mathbb{R} \to \mathbb{R} are functions of time defined for all real numbers t. The sum becomes an integral, and we have to give up on dividing by n. We get:

\displaystyle{  \int_{-\infty}^\infty x(t) y(t)\; d t }

This is called the inner product of the functions x and y, and often it’s written \langle x, y \rangle, but it’s a lot like the covariance.

Continuous wavelet transforms

What are continuous wavelet transforms, and why should we care?

People have lots of tricks for studying ‘signals’, like series of numbers x_i or functions x : \mathbb{R} \to \mathbb{R}. One method is to ‘transform’ the signal in a way that reveals useful information. The Fourier transform decomposes a signal into sines and cosines of different frequencies. This lets us see how much power the signal has at different frequencies, but it doesn’t reveal how the power at different frequencies changes with time. For that we should use something else, like the Gabor transform explained by Blake Pollard in a previous post.

Sines and cosines are great, but we might want to look for other patterns in a signal. A ‘continuous wavelet transform’ lets us scan a signal for appearances of a given pattern at different times and also at different time scales: a pattern could go by quickly, or in a stretched out slow way.

To implement the continuous wavelet transform, we need a signal and a pattern to look for. The signal could be a function x : \mathbb{R} \to \mathbb{R}. The pattern would then be another function y: \mathbb{R} \to \mathbb{R}, usually called a wavelet.

Here’s an example of a wavelet:


If we’re in a relaxed mood, we could call any function that looks like a bump with wiggles in it a wavelet. There are lots of famous wavelets, but this particular one is the fourth derivative of a certain Gaussian. Mathematica calls this particular wavelet DGaussianWavelet[4], and you can look up the formula under ‘Details’ on their webpage.

However, the exact formula doesn’t matter at all now! If we call this wavelet y, all that matters is that it’s a bump with wiggles on it, and that its mean value is 0, or more precisely:

\displaystyle{ \int_{-\infty}^\infty y(t) \; d t = 0 }

As we saw in the last section, this fact lets us take our function x and the wavelet y and see how much they ‘vary hand it hand’ simply by computing their inner product:

\displaystyle{ \langle x , y \rangle = \int_{-\infty}^\infty x(t) y(t)\; d t }

Loosely speaking, this measures the ‘amount of y-shaped wiggle in the function x’. It’s amazing how hard it is to say something in plain English that perfectly captures the meaning of a simple formula like the above one—so take the quoted phrase with a huge grain of salt. But it gives a rough intuition.

Our wavelet y happens to be centered at t  = 0. However, we might be interested in y-shaped wiggles that are centered not at zero but at some other number s. We could detect these by shifting the function y before taking its inner product with x:

\displaystyle{ \int_{-\infty}^\infty x(t) y(t-s)\; d t }

We could also be interested in measuring the amount of some stretched-out or squashed version of a y-shaped wiggle in the function x. Again we could do this by changing y before taking its inner product with x:

\displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t}{P}\right) \; d t }

When P is big, we get a stretched-out version of y. People sometimes call P the period, since the period of the wiggles in y will be proportional to this (though usually not equal to it).

Finally, we can combine these ideas, and compute

\displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t- s}{P}\right)\; dt }

This is a function of the shift s and period P which says how much of the s-shifted, P-stretched wavelet y is lurking in the function x. It’s a version of the continuous wavelet transform!

Mathematica implements this idea for time series, meaning lists of numbers x = (x_1,\dots,x_n) instead of functions x : \mathbb{R} \to \mathbb{R}. The idea is that we think of the numbers as samples of a function x:

x_1 = x(\Delta t)

x_2 = x(2 \Delta t)

and so on, where \Delta t is some time step, and replace the integral above by a suitable sum. Mathematica has a function ContinuousWaveletTransform that does this, giving

\displaystyle{  w(s,P) = \frac{1}{\sqrt{P}} \sum_{i = 1}^n x_i \; y\left(\frac{i \Delta t - s}{P}\right) }

The factor of 1/\sqrt{P} in front is a useful extra trick: it’s the right way to compensate for the fact that when you stretch out out your wavelet y by a factor of P, it gets bigger. So, when we’re doing integrals, we should define the continuous wavelet transform of y by:

\displaystyle{ w(s,P) = \frac{1}{\sqrt{P}} \int_{-\infty}^\infty x(t) y(\frac{t- s}{P})\; dt }

The results

Dara Shayda started with the air pressure anomaly at Darwin and Tahiti, measured monthly from 1866 to 2012. Taking DGaussianWavelet[4] as his wavelet, he computed the continuous wavelet transform w(s,P) as above. To show us the answer, he created a scalogram:


This is a 2-dimensional color plot showing roughly how big the continuous wavelet transform w(s,P) is for different shifts s and periods P. Blue means it’s very small, green means it’s bigger, yellow means even bigger and red means very large.

Tahiti gave this:


You’ll notice that the patterns at Darwin and Tahiti are similar in character, but notably different in detail. For example, the red spots, where our chosen wavelet shows up strongly with period of order ~100 months, occur at different times.

Puzzle 1. What is the meaning of the ‘spikes’ in these scalograms? What sort of signal would give a spike of this sort?

Puzzle 2. Do a Gabor transform, also known as a ‘windowed Fourier transform’, of the same data. Blake Pollard explained the Gabor transform in his article Milankovitch vs the Ice Ages. This is a way to see how much a signal wiggles at a given frequency at a given time: we multiply the signal by a shifted Gaussian and then takes its Fourier transform.

Puzzle 3. Read about continuous wavelet transforms. If we want to reconstruct our signal x from its continuous wavelet transform, why should we use a wavelet y with

\displaystyle{\int_{-\infty}^\infty y(t) \; d t = 0 ? }

In fact we want a somewhat stronger condition, which is implied by the above equation when the Fourier transform of y is smooth and integrable:

Continuous wavelet transform, Wikipedia.

Another way to understand correlations

David Tweed mentioned another approach from signal processing to understanding the quantity

\displaystyle{  \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }

If we’ve got two lists of data x and y that we want to compare to see if they behave similarly, the first thing we ought to do is multiplicatively scale each one so they’re of comparable magnitude. There are various possibilities for assigning a scale, but a reasonable one is to ensure they have equal ‘energy’

\displaystyle{  \sum_{i=1}^n x_i^2 = \sum_{i=1}^n y_i^2 }

(This can be achieved by dividing each list by its standard deviation, which is equivalent to what was done in the main derivation above.) Once we’ve done that then it’s clear that looking at

\displaystyle{  \sum_{i=1}^n (x_i-y_i)^2 }

gives small values when they have a very good match and progressively bigger values as they become less similar. Observe that

\begin{array}{ccl}  \displaystyle{\sum_{i=1}^n (x_i-y_i)^2 }  &=& \displaystyle{ \sum_{i=1}^n (x_i^2 - 2 x_i y_i + y_i^2) }\\  &=& \displaystyle{ \sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2 }  \end{array}

Since we’ve scaled things so that \sum_{i=1}^n x_i^2 and \sum_{i=1}^n y_i^2 are constants, we can see that when \sum_{i=1}^n x_i y_i becomes bigger,

\displaystyle{ \sum_{i=1}^n (x_i-y_i)^2 }

becomes smaller. So,

\displaystyle{\sum_{i=1}^n x_i y_i}

serves as a measure of how close the lists are, under these assumptions.


Chemical Reaction Network Talks

26 June, 2014

A while ago I blogged about David Soloveichik’s talk at this workshop:

Programming with Chemical Reaction Networks: Mathematical Foundations, Banff International Research Station, 8-13 June 2014.

Now the slides for his talk are available:

• David Soloveichik, U.C. San Francisco, The computational power of chemical reaction networks.

And now I’d like to tell you about three more talks!

The first two are about ways one chemical reaction network can simulate another. This is important for a couple of reasons. First, in biology, a bunch of different chemical reactions can ‘accomplish the same task’—and we’d like to make this idea precise. That’s what Luca Cardelli spoke about. Second, people trying to do computation with chemistry are starting to simulate quite general reactions using DNA! That’s what Sheung Woo Shin spoke about.

Luca Cardelli

Luca Cardelli was at Oxford when I was visiting this spring, but unfortunately I didn’t meet him! He works for Microsoft on the interface of biology and computation. At Banff, he talked about ways one chemical reaction network can simulate another. His slides are here:

• Luca Cardelli, Morphisms of reaction networks.

He has a paper that gives a more detailed explanation of these ideas:

• Luca Cardelli, Morphisms of reaction networks that couple structure to function.

Here is my own disorganized explanation… with lots of informative but confusing digressions. A population protocol is a chemical reaction with only 2-in, 2-out reactions. For example, this paper presents a population protocol that does ‘approximate majority detection':

• Dana Angluin, James Aspnes, and David Eisenstat, A simple population protocol for fast robust approximate majority, Distributed Computing 21 (2008), 87–102.

What’s the idea? We start with two kinds of molecules, say x’s and y’s, and we want to see which one is in the majority, so we run these chemical reactions:

x + y \to x + b

x + y \to y + b

x + b \to 2x

y + b \to 2y

See? All the reactions have 2 molecules going in and 2 going out. The b molecules act as ‘undecided voters’ who become either an x or a y, depending on who they meet first.

If we start with about n molecules, in O(n \log n) time these reactions are very likely to convert all x’s and y’s to whatever kind of molecule was in the majority initially… at least if the gap in the number of x’s and y’s is big enough.

Here’s another population protocol that also does the job:

x + y \to 2b

x + b \to 2x

y + b \to 2y

And here’s a proof that one of these algorithms actually works—most of the time, when the initial difference in populations is big enough:

• Etienne Perron, Dinkar Vasudevan, and Milan Vojonvic, Using three states for binary consensus on complete graphs, Technical Report, MSR-TR-2008-114, Microsoft, September 2008.

If we use a discrete-time formalism to describe the dynamics, the proof seems to get harder. See the paper by Angluin, Aspnes, and Eisenstat for the only known proof!

Anyway, Luca Cardelli is interested in chemical reaction networks actually found in biology. This approximate majority algorithm is seen quite clearly in a certain biological system: a certain ‘epigenetic switch’. However, it is usually ‘obfuscated’ or ‘encrypted': hidden in a bigger, more complicated chemical reaction network. For example, see:

• Luca Cardelli and Attila Csikász-Nagy, The cell cycle switch is approximate majority obfuscated, Scientific Reports 2 (2012).

This got him interested in developing a theory of morphisms between reaction networks, which could answer questions like: When can one CRN emulate another? But these questions turn out to be much easier if we use the rate equation than with the master equation. So, he asks: when can one CRN give the same rate equation as another?

He found a sufficient condition that’s ‘purely syntactic': you can tell if it holds by looking at the reaction networks, regardless of the rate constants.

Here’s the idea. We say one network emulates another if for any rate constants of the second, we can find rate constants for the first that makes its rate equation have solutions exactly mimicking that of the second, but where several species in the first correspond to one in the second.

For this to make sense, we assume there is a map sending:

• species to species
• reactions to reactions

In a chemical reaction network homomorphism, the map on reactions is determined by the map on species in the obvious way. For example, if species A is sent to f(A) and species B is sent to f(B) then the reaction

2A + B \to 3B

is sent to the reaction

2f(A) + f(B) \to 3 f(B)

In this situation, to make the first network emulate the second, we need to set equal the initial concentrations of all species in the inverse image of a given species.

A reactant homomorphism from one chemical reaction network to another is more general: it sends species to species, and for any reaction in the first chemical reaction network with input

A + B + C \cdots

there’s a reaction in the second with input

f(A) + f(B) + f(C) + \cdots

(Reactant is another name for input.)

A stoichiomorphism is a kind of morphism that takes rate constants into account. See Cardelli’s paper for the definition.

The main theorem: given a stoichiomorphism from one chemical reaction network to another that’s also a reactant homomorphism, then the first emulates the second.

For a better explanation, read his paper! Here’s a cool picture from his paper showing a bunch of chemical reaction networks including the approximate majority network (labelled AM), many of which show up in biology, and morphisms between them:


Click to enlarge! These chemical reaction networks are drawn in a special style: as influence networks, consisting of ‘gates’ where process activates or deactivates another. Each gate is a chemical reaction network of a certain form, schematically like this:

\mathrm{off} \leftrightarrow \mathrm{intermediate} \leftrightarrow \mathrm{on}

where the forward reactions are catalyzed by one chemical and the reverse reactions are catalyzed by another. A gate is like a switch that can be turned on or off.

While listening to this talk, I thought the way in which one CRN emulates another in Cardelli’s formalism looks awfully similar to the way one dynamical system emulates another in Eugene Lerman’s formalism:

• Eugene Lerman, Networks of dynamical systems, Azimuth, 18 March 2014.

The following picture from Cardelli’s paper shows that one of his morphisms of reaction networks is like ‘covering map’. This reminds me a lot of what’s happening in Lerman’s work.


Again, click to enlarge!

Seung Woo Shin

Seung Woo Shin was actually Brendan Fong’s roommate at the National University of Singapore while Brendan was working with me on chemical reaction networks. Apparently they never talked about their work!

Shin spoke about some other concepts of ‘morphism’ between chemical reaction networks. These other concepts do not involve reaction rates, just which chemicals can turn into which. You can see his slides here:

• Seung Woo Shin, Verifying CRN implementations.

and read his thesis for more details:

• Seung Woo Shin, Compiling and verifying DNA-based chemical reaction network implementations, Masters thesis, Caltech, 2012.

Abstract: One goal of molecular programming and synthetic biology is to build chemical circuits that can control chemical processes at the molecular level. Remarkably, it has been shown that synthesized DNA molecules can be used to construct complex chemical circuits that operate without any enzyme or cellular component. However, designing DNA molecules at the individual nucleotide base level is often difficult and laborious, and thus chemical reaction networks (CRNs) have been proposed as a higher-level programming language. So far, several general-purpose schemes have been described for designing synthetic DNA molecules that simulate the behavior of arbitrary CRNs, and many more are being actively investigated.

Here, we solve two problems related to this topic. First, we present a general-purpose CRN-to-DNA compiler that can apply user-defined compilation schemes for translating formal CRNs to domain-level specifications for DNA molecules. In doing so, we develop a language in which such schemes can be concisely and precisely described. This compiler can greatly reduce the amount of tedious manual labor faced by researchers working in the field. Second, we present a general method for the formal verification of the correctness of such compilation. We first show that this problem reduces to testing a notion of behavioral equivalence between two CRNs, and then we construct a mathematical formalism in which that notion can be precisely defined. Finally, we provide algorithms for testing that notion. This verification process can be thought of as an equivalent of model checking in molecular computation, and we hope that the generality of our verification techniques will eventually allow us to apply them not only to DNA-based CRN implementations but to a wider class of molecular programs.

His thesis built on this earlier paper:

• David Soloveichik, Georg Seelig and Erik Winfree, DNA as a universal substrate for chemical kinetics, Proceedings of the National Academy of Sciences (2010).

I think this work is fascinating and deeply related to category theory, so I talked to Shin and Winfree about it, and this is what we came up with:

CRN equivalences: progress report.

This is one of several reports on progress people at the workshop made on various open problems.

David Anderson

Brendan Fong and I wrote about David Anderson’s work in Part 9 of the network theory series. It’s so impressive that I expected him to be older… older than me, I guess. He’s not!

In his tutorial, he gave an overview of chemical reaction networks with an emphasis on the deficiency zero theorem. Since many people were puzzled by the ‘deficiency’ concept, they asked lots of questions. But I’ve already explained that idea in Part 21. So, I’ll just mention a couple of cool theorems he told us about!

Theorem (Horn and Jackson). If a reaction network has a complex balanced equilibrium, then:

1. It has no equilibria that are not complex balanced.

2. The reaction network must be weakly reversible.

3. Every stochiometric compatibility class contains precisely one complex balanced equilibrium.

I should have known this, since this work is classic. But I don’t think I knew that the existence of one complex balanced equilibrium implied all this stuff!

He also mentioned this paper:

• Guy Shinar and Martin Feinberg, Structural sources of robustness in biochemical reaction networks, Science (2010).

which contains this amazing theorem:

Theorem (Shinar and Feinberg). Suppose there is a chemical reaction network such that:

1. its deficiency equals one;

2. it has a positive steady state;

3. it has two “non-terminal complexes” that differ only in one species S. (“Non-terminal” is a concept that’s easier to explain with a picture of a reaction network).

Then the species S is absolutely robust: with any initial conditions, the rate equation will approach an equilibrium where the concentration of S approaches a specific fixed value c, independent of the initial conditions!

However, things work very differently if we treat the system stochastically, using the master equation:

• David F. Anderson, German A. Enciso and Matthew D. Johnston, Stochastic analysis of biochemical reaction networks with absolute concentration robustness.

More

A lot more happened at this workshop! There was a huge amount of discussion of the leader election problem, which is about how to cook up chemical reactions that create a ‘leader': a single molecule of some sort.

Leader election: the problem, and references.

Leader election: progress report.

As I explained before, David Soloveichik talked about various forms of digital computation with chemical reaction networks. David Doty talked about the very important flip side of the coin: analog computation.

• David Doty, Rate-independent computation by real-valued chemistry.

There were also great talks by Lulu Qian and Erik Winfree, which I won’t try to summarize. Qian does a lot of work in the lab making things actually happen, so if you’re a practical sort this is the talk to look at:

• Lulu Qian, Implementing complex CRNs with modular DNA components.

All in all, a very stimulating workshop. The diversity of things one can ask about chemical reaction networks is quite exciting!


The Computational Power of Chemical Reaction Networks

10 June, 2014

I’m at this workshop:

Programming with Chemical Reaction Networks: Mathematical Foundations, Banff International Research Station, 8-13 June 2014.

Luca Cardelli wrote about computation with chemical reactions in Part 26 of the network theory series here on this blog. So, it’s nice to meet him and many other researchers, learn more, and try to solve some problems together!

The first tutorial was this:

• David Soloveichik, U.C. San Francisco, The computational power of chemical reaction networks.

David works at the Center for Systems and Synthetic Biology, and their website says:

David did his graduate work with Erik Winfree at Caltech, focusing on algorithmic self-assembly and on synthetic networks of nucleic-acid interactions based on strand displacement cascades. He is interested in “molecular programming”: the systematic design of complex molecular systems based on the principles of computer science and distributed computing. More generally, he is trying to create a theoretical foundation of chemical computation applicable to both synthetic and natural systems.

According to his webpage, Soloveichik’s research interests are:

Wet-lab: the rational design of molecular interactions for synthetic biology, nanotechnology, and bioengineering. The goal is to engineer autonomous molecular systems that can sense, compute, and perform various actions. Using nucleic-acid “strand displacement cascades” as the molecular primitive, we are able to attain freedom of design that hasn’t been previously possible.

Theory: The theoretical foundation of chemical computation. Once we have a way to program molecular interactions, what programming language shall we use? How molecules can process information and carry out computation is still not well-understood; however, a formal connection to models of concurrent computation may allow systematic and scalable design, rigorous analysis and verification. Further, computational principles may elucidate the design of biological regulatory networks.

Here are my notes on his tutorial.

Motivation

We’ve got people here from different backgrounds:

• computational complexity theory
• wetlab / experimental science
• pure and applied mathematics
• software verification

CRNs (chemical reaction networks) show up in:

• chemistry
• population biology
• sensor networks
• math:
    ○ vector addition systems
    ○ Petri nets
    ○ commutative semigroups
    ○ bounded context-free languages
    ○ uniform recurrence equations

Why use them for computation? People want to go beyond the von Neumann architecture for computation. People also want to understand how cells process information. However, with a few exceptions, the computational perspective in this talk has not yet proved relevant in biology. So, there is a lot left to learn.

The model

The model of computation here will be the master equation for a chemical reaction network… since this has been explained starting Part 4 of the network theory series, I won’t review it!

Can all chemical reaction networks, even those without any conservation laws, be realized by actual chemical systems?

Though this is a subtle question, one answer is “yes, using strand displacement cascades”. This is a trick for getting DNA to simulate other chemical reactions. It’s been carried out in the lab! See this paper and many subsequent ones:

• Soloveichik, Seelig and Winfree, DNA as a universal substrate for chemical kinetics.

Abstract: Molecular programming aims to systematically engineer molecular and chemical systems of autonomous function and ever-increasing complexity. A key goal is to develop embedded control circuitry within a chemical system to direct molecular events. Here we show that systems of DNA molecules can be constructed that closely approximate the dynamic behavior of arbitrary systems of coupled chemical reactions. By using strand displacement reactions as a primitive, we construct reaction cascades with effectively unimolecular and bimolecular kinetics. Our construction allows individual reactions to be coupled in arbitrary ways such that reactants can participate in multiple reactions simultaneously, reproducing the desired dynamical properties. Thus arbitrary systems of chemical equations can be compiled into real chemical systems. We illustrate our method on the Lotka–Volterra oscillator, a limit-cycle oscillator, a chaotic system, and systems implementing feedback digital logic and algorithmic behavior.

However, even working with the master equation for a CRN, there are various things we might mean by having it compute something:

• uniform vs non-uniform: is a single CRN supposed to handle all inputs, or do we allow adding extra reactions for larger inputs? It’s a bit like Turing machines vs Boolean circuits.

• deterministic vs probabilistic: is the correct output guaranteed or merely likely?

• halting vs stabilizing: does the CRN ‘know’ when it has finished, or not? In the ‘halting’ case the CRN irreversibly produces some molecules that signal that the computation is done. In the ‘stabilizing’ case, it eventually stabilizes to the right answer, but we may not know how long to wait.

These distinctions dramatically affect the computational power. In the case of uniform computation:

• deterministic and halting: this has finite computational power.

• deterministic and stabilizing: this can decide semilinear predicates.

• probabilistic and halting: this is Turing-universal.

• probabilistic and stabilizing: this can decide \Delta_2^0 predicates, which are more general than computable ones. (Indeed, if we use Turing machines but don’t require them to signal when they’ve halted, the resulting infinitely long computations can ‘compute’ stuff that’s not computable in the usual sense.)

Deterministic stabilizing computations

Let’s look at the deterministic stabilizing computations in a bit more detail. We’ll look at decision problems. We have a subset S \subseteq \mathbb{N}^d, and we want to answer this question: is the vector X \in \mathbb{N}^d in the set S?

To do this, we represent the vector as a bunch of molecules: X_1 of the first kind, X_2 of the second kind, and so on. We call this an input. We may also include a fixed collection of additional molecules in our input, to help the reactions run.

Then we choose a chemical reaction network, and we let it run on our input. The answer to our question will be encoded in some molecules called Y and N. If X is in S, we want our chemical reaction to produce Y molecules. If it’s not, we want our reaction to produce N’s.

To make this more precise, we need to define what counts as an output. If we’ve got a bunch of molecules that

• contains Y but not N: then the output is YES.

• contains N but not Y: then the output is NO.

Otherwise the output is undefined.

Output-stable states are states with YES or NO output such that all states reachable from them via our chemical reactions give the same output. We say an output-stable-state is correct if this output is the correct answer to the question: is X in S.

Our chemical reaction network gives a deterministic stabilizing computation if for any input, and choosing any state reachable from that input, we can do further chemical reactions to reach a correct output-stable state.

In other words: starting from our input, and letting the chemical reactions <run any way they want, we will eventually stabilize at an output that gives the right answer to the question “is X in S?”

Examples

This sounds a bit complicated, but it’s really not. Let’s look at some examples!

Example 1. Suppose you want to check two numbers and see if one is greater than or equal to another. Here

S = \{(X_1,X_2) : X_2 \ge X_1 \}

How can you decide if a pair of numbers (X_1,X_2) is in this set?

You start with X_1 molecules of type A, X_2 molecules of type B, and one molecule of type Y. Then you use a chemical reaction network with these reactions:

A + N \to Y
B + Y \to N

If you let these reactions run, the Y switches to a N each time the reactions destroy an A. But the N switches back to a Y each time the reactions destroy a B.

When no more reactions are possible, we are left with either one Y or one N, which is the correct answer to your question!

Example 2. Suppose you want to check two numbers and see if one is equal to another. Here

S = \{(X_1,X_2) : X_2 = X_1 \}

How can you decide if a pair of numbers (X_1,X_2) is in here?

This is a bit harder! As before, you start with X_1 molecules of type A, X_2 molecules of type B, and one molecule of type Y. Then you use a chemical reaction network with these reactions:

A + B \to Y
Y + N \to Y
A + Y \to A + N
B + Y \to B + N

The first reaction lets an A and a B cancel out, producing a Y. If you only run this reaction, you’ll eventually be left with either a bunch of A\mathrm{s} or a bunch of B\mathrm{s} or nothing but Y\mathrm{s}.

If you have Y\mathrm{s}, your numbers were equal. The other reactions deal with the cases where you have A\mathrm{s} or B\mathrm{s} left over. But the key thing to check is that no matter what order we run the reactions, we’ll eventually get the right answer! In the end, you’ll have either Y\mathrm{s} or N\mathrm{s}, not both, and this will provide the yes-or-no answer to the question of whether X_1 = X_2.

What deterministic stabilizing computations can do

We’ve looked at some examples of deterministic stabilizing computations. The big question is: what kind of questions can they answer?

More precisely, for what subsets A \subseteq \mathbb{N}^d can we build a deterministic stabilizing computation that ends with output YES if the input X lies in A and with output NO otherwise?

The answer is: the ‘semilinear’ subsets!

• Dana Angluin, James Aspnes and David Eistenstat, Stably computable predicates are semilinear.

A set S \subseteq \mathbb{N}^d is linear if it’s of the form

\{u_0 + n_1 u_1 + \cdots + n_p u_p : n_i \in \mathbb{N}  \}

for some fixed vectors of natural numbers u_i \in \mathbb{N}^d.

A set S \subseteq \mathbb{N}^d semilinear if it’s a finite union of linear sets.

How did Angluin, Aspnes and Eisenstat prove their theorem? Apparently the easy part is showing that membership in any semilinear set can be decided by a chemical reaction network. David sketched the proof of the converse. I won’t go into it, but it used a very nice fact:

Dickson’s Lemma. Any subset of \mathbb{N}^d has a finite set of minimal elements, where we define x \le y if x_i \le y_i for all i.

For example, the region above and to the right of the hyperbola here has five minimal elements:

If you know some algebra, Dickson’s lemma should remind you of the Hilbert basis theorem, saying (for example) that every ideal in a ring of multivariable polynomials over a field is finitely generated. And in fact, Paul Gordan used Dickson’s Lemma in 1899 to help give a proof of Hilbert’s basis theorem.

It’s very neat to see how this lemma applies to chemical reaction networks! You can see how it works in Angluin, Aspnes and Eistenstat’s paper. But they call it “Higman’s lemma” for some reason.

References

Here are some of David Soloveichik’s recent talks:

• An introduction to strand displacement cascades for the Foresight Institute Conference (Palo Alto, Jan 2013): An artificial “biochemistry” with DNA.

• Paper presented at DNA Computing and Molecular Programming 18 (Aarhus, Denmark, Aug 2012): Deterministic function computation with chemical reaction networks.

• Tutorial talk for DNA Computing and Molecular Programming 17 (Pasadena, Aug 2011): The programming language of chemical kinetics, and how to discipline your DNA molecules using strand displacement cascades.

• High-level introduction to algorithmic self-assembly and stochastic chemical reaction networks as computer-theoretic models: Computer-theoretic abstractions for molecular programming.

• On algorithmic behavior in chemical reaction networks and implementing arbitrary chemical reaction networks with DNA: programming well-mixed chemical kinetics.


Hexagonal Hyperbolic Honeycombs

14 May, 2014

This post is just for fun.

Roice Nelson likes geometry, and he makes plastic models of interesting objects using a 3d printer. He recently created some great pictures of ‘hexagonal hyperbolic honeycombs’. With his permission, I wrote about them on my blog Visual Insight. Here I’ve combined those posts into a single more polished article.

But the pictures are the star of the show. They deserve to be bigger than the 450-pixel width of this blog, so please click on them and see the full-sized versions!

The {6,3,3} honeycomb

 

This is the {6,3,3} honeycomb.

How do you build this structure? Take 4 rods and glue them together so their free ends lie at the corners of a regular tetrahedron. Make lots of copies of this thing. Then stick them together so that as you go around from one intersection to the next, following the rods, the shortest possible loop is always a hexagon!

This is impossible in ordinary flat 3-dimensional space. But you can succeed if you work in hyperbolic space, a non-Euclidean space where the angles of a triangle add up to less than 180°. The result is the {6,3,3} honeycomb, shown here.

Of course, this picture has been projected onto your flat computer screen. This distorts the rods, so they look curved. But they’re actually straight… inside curved space.

The {6,3,3} honeycomb is an example of a ‘hyperbolic honeycomb’. In general, a 3-dimensional honeycomb is a way of filling 3d space with polyhedra. It’s the 3-dimensional analogue of a tiling of the plane. Besides honeycombs in 3d Euclidean space, we can also have honeycombs in 3d hyperbolic space. The {6,3,3} honeycomb is one of these.

But actually, when I said a honeycomb is a way of filling 3d space with polyhedra, I was lying slightly. It’s often true—but not in this example!

For comparison, in the {5,3,4} honeycomb, space really is filled with polyhedra:

You can see a lot of pentagons, and if you look carefully you’ll see these pentagons are faces of dodecahedra:

In the honeycomb, these dodecahedra fill hyperbolic space.

But in the {6,3,3} honeycomb, all the hexagons lie on infinite sheets. You can see one near the middle of this picture:

These sheets of hexagons are not polyhedra in the usual sense, because they have infinitely many polygonal faces! So, the {6,3,3} honeycomb is called a paracompact honeycomb.

But what does the symbol {6,3,3} mean?

It’s an example of a Schläfli symbol. It’s defined in a recursive way. The symbol for the hexagon is {6}. The symbol for the hexagonal tiling of the plane is {6,3} because 3 hexagons meet at each vertex. Finally, the hexagonal tiling honeycomb has symbol {6,3,3}, because 3 hexagonal tilings meet at each edge of this honeycomb.

So, we can build a honeycomb if we know its Schläfli symbol. And there’s a lot of information in this symbol.

For example, just as the {6,3} inside {6,3,3} describes the hexagonal tilings inside the {6,3,3} honeycomb, the {3,3} describes the vertex figure of this honeycomb: that is, the way the edges meet at each vertex. {3,3} is the Schläfli symbol for the regular tetrahedron, and in the {6,3,3} honeycomb each vertex has 4 edges coming out, just like the edges going from the center of a tetrahedron to its corners!

The {6,3,4} honeycomb

 

This is the {6,3,4} honeycomb.

How do you build this structure? Make 3 intersecting rods at right angles to each other. Make lots of copies of this thing. Then stick them together so that as you go around from one intersection to the next, following the rods, the shortest possible loop is always a hexagon!

This is impossible in ordinary flat 3-dimensional space. You can only succeed if the shortest possible loop is a square. Then you get the familiar cubic honeycomb, also called the the {4,3,4} honeycomb:


To get hexagons instead of squares, space needs to be curved! You can succeed if you work in hyperbolic space, where it’s possible to create a hexagon whose internal angles are all 90°. In ordinary flat space, only a square can have all its internal angles be 90°.

Here’s the tricky part: the hexagons in the {6,3,4} honeycomb form infinite sheets where 3 hexagons meet at each corner. You can see one of these sheets near the center of the picture. The corners of the hexagons in one sheet lie on a flat plane in hyperbolic space, called a horosphere.

That seems to make sense, because in flat space hexagons can have all their internal angles be 120°… so three can meet snugly at a corner. But I just said these hexagons have 90° internal angles!

Puzzle 1. What’s going on? Can you resolve the apparent contradiction?

The Schläfli symbol of this honeycomb is {6,3,4}, and we can see why using ideas I’ve already explained. It’s made of hexagonal tilings of the plane, which have Schläfli symbol {6,3} because 3 hexagons meet at each vertex. On the other hand, the vertex figure of this honeycomb is an octahedron: if you look at the picture you can can see that each vertex has 6 edges coming out, just like the edges going from the center of an octahedron to its corners. The octahedron has Schläfli symbol {3,4}, since it has 4 triangles meeting at each corner. Take {6,3} and {3,4} and glue them together and you get {6,3,4}!

We can learn something from this. Since this honeycomb has Schläfli symbol {6,3,4}, it has 4 hexagonal tilings meeting at each edge! That’s a bit hard to see from the picture.

All the honeycombs I’ve been showing you are ‘regular’. This is the most symmetrical kind of honeycomb. A flag in a honeycomb is a vertex lying on an edge lying on a face lying on a cell (which could be a polyhedron or an infinite sheet of polygons). A honeycomb is regular if there’s a symmetry sending any flag to any other flag.

The {6,3,3} and {6,3,4} honeycombs are also ‘paracompact’. Remember, this means they have infinite cells, which in this case are the hexagonal tilings {6,3}. There are 15 regular honeycombs in 3d hyperbolic space, of which 11 are paracompact. For a complete list of regular paracompact honeycombs, see:

Regular paracompact honeycombs, Wikipedia.

The {6,3,5} honeycomb

 

This is the {6,3,5} honeycomb. It’s built from sheets of regular hexagons, and 5 of these sheets meet along each edge of the honeycomb. That explains the Schläfli symbol {6,3,5}.

If you look very carefully, you’ll see 12 edges coming out of each vertex here, grouped in 6 opposite pairs. These edges go out from the vertex to its 12 neighbors, which are arranged like the corners of a regular icosahedron!

In other words, the vertex figure of this honeycomb is an icosahedron. And even if you can’t see this in the picture, you can deduce that it’s true, because {3,5} is the Schläfli symbol for the regular icosahedron, and it’s sitting inside {6,3,5}, at the end.

But now for a puzzle. This is for people who like probability theory:

Puzzle 2. Say you start at one vertex in this picture, a place where edges meet. Say you randomly choose an edge and walk down it to the next vertex… each edge being equally likely. Say you keep doing this. This is the most obvious random walk you can do on the {6,3,5} honeycomb. Is the probability that eventually you get back where you started equal to 1? Or is it less than 1?

If that’s too hard, try the same sort of question with the usual cubical honeycomb in ordinary flat 3d space. Or the square lattice on the plane!

In one dimension, where you just take steps back and forth on the integers, with equal chances of going left or right each time, you have a 100% chance of eventually getting back where you started. But the story works differently in different dimensions—and it also depends on whether space is flat, spherical or hyperbolic.

The {6,3,6} honeycomb

 

This is the {6,3,6} honeycomb. It has a lot of sheets of regular hexagons, and 6 sheets meet along each edge of the honeycomb.

The {6,3,6} honeycomb has a special property: it’s ‘self-dual’. The tetrahedron is a simpler example of a self-dual shape. If we draw a vertex in the middle of each face of the tetrahedron, and draw an edge crossing each edge, we get a new shape with a face for each vertex of the tetrahedron… but this new shape is again a tetrahedron!

If we do a similar thing one dimension up for the {6,3,6} honeycomb, this amounts to creating a new honeycomb with:

• one vertex for each infinite sheet of hexagons in the original honeycomb;

• one edge for each hexagon in the original honeycomb;

• one hexagon for each edge in the original honeycomb;

• one infinite sheet of hexagons for each vertex in the original honeycomb.

But this new honeycomb turns out to be another {6,3,6} honeycomb!

This is hard to visualize, at least for me, but it implies something cool. Just as each sheet of hexagons has infinitely many hexagons on it, each vertex has infinitely many edges going through it.

This self-duality comes from the symmetry of the Schläfli symbol {6,3,6}: if you reverse it, you get the same thing!

Okay. I’ve showed you regular hyperbolic honeycombs where 3, 4, 5, or 6 sheets of hexagons meet along each edge. Sometimes in math patterns go on forever, but sometimes they end—just like life itself. And indeed, we’ve reached the end of something here! You can’t build a regular honeycomb in hyperbolic space with 7 sheets of hexagons meeting at each edge.

Puzzle 3. What do you get if you try?

I’m not sure, but it’s related to a pattern we’ve been seeing. The hexagonal hyperbolic honeycombs I’ve shown you are the ‘big brothers’ of the tetrahedron, the octahedron, the icosahedron and the triangular tiling of the plane! Here’s how it goes:

• You can build a tetrahedron where 3 triangles meet at each corner:

For this reason, the Schläfli symbol of the tetrahedron is {3,3}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of a tetrahedron… and these edges form hexagons. This is the {6,3,3} honeycomb.

• You can build an octahedron where 4 triangles meet at each corner:

The Schläfli symbol of the octahedron is {3,4}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of an octahedron… and these edges form hexagons. This is the {6,3,4} honeycomb.

• You can build an icosahedron where 5 triangles meet at each corner:

The Schläfli symbol of the icosahedron is called {3,5}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of an icosahedron… and these edges form hexagons. This is the {6,3,5} honeycomb.

• You can build a tiling of a flat plane where 6 triangles meet at each corner:

This triangular tiling is also called {3,6}. You can build a hyperbolic honeycomb where the edges coming out of any vertex go out to the corners of a triangular tiling… and these edges form hexagons. This is the {6,3,6} honeycomb.

The last one is a bit weird! The triangular tiling has infinitely many corners, so in the picture here, there are infinitely many edges coming out of each vertex.

But what happens when we get to {6,3,7}? That’s the puzzle.

Coxeter groups

I’ve been telling you about Schläfli symbols, but these are closely related to another kind of code, which is deeper and in many ways better. It’s called a Coxeter diagram. The Coxeter diagram of the {6,3,3} honeycomb is

●—6—o—3—o—3—o

What does this mean? It looks a lot like the Schläfli symbol, and that’s no coincidence, but there’s more to it.

The symmetry group of the {6,3,3} honeycomb is a discrete subgroup of the symmetry group of hyperbolic space. This discrete group has generators and relations summarized by the unmarked Coxeter diagram:

o—6—o—3—o—3—o

This diagram says there are four generators s_1, \dots, s_4 obeying relations encoded in the edges of the diagram:

(s_1 s_2)^6 = 1
(s_2 s_3)^3 = 1
(s_3 s_4)^3 = 1

together with relations

s_i^2 = 1

and

s_i s_j = s_j s_i \; \textrm{ if } \; |i - j| > 1

Marking the Coxeter diagram in different ways lets us describe many honeycombs with the same symmetry group as the hexagonal tiling honeycomb—in fact, 24 – 1 = 15 of them, since there are 4 dots in the Coxeter diagram! For the theory of how this works, illustrated by some simpler examples, try this old post of mine:

Symmetry and the Fourth Dimension (Part 9).

or indeed the whole series. The series is far from done; I have a pile of half-written episodes that I need to finish up and publish. This post should, logically, come after all those… but life is not fully governed by logic.

Similar remarks apply to all the hexagonal hyperbolic honeycombs I’ve shown you today:


{6,3,3} honeycomb


●—6—o—3—o—3—o
3 hexagonal tilings meeting at each edge
vertex figure: tetrahedron

 

{6,3,4} honeycomb


●—6—o—3—o—4—o
4 hexagonal tilings meeting at each edge
vertex figure: octahedron

 

{6,3,5} honeycomb


●—6—o—3—o—5—o
5 hexagonal tilings meeting at each edge
vertex figure: icosahedron

 

{6,3,6} honeycomb


●—6—o—3—o—6—o
6 hexagonal tilings meeting at each edge
vertex figure: hexagonal tiling

Finally, one more puzzle, for people who like algebra and number theory:

Puzzle 4. The symmetry group of 3d hyperbolic space, not counting reflections, is \mathrm{PSL}(2,\mathbb{C}). Can you explicitly describe the subgroups that preserve the four hexagonal hyperbolic honeycombs?

For the case of {6,3,3}, Martin Weissman gave an answer on G+:

Well, it’s \mathrm{PSL}_2(\mathbb{Z}[e^{2 \pi i / 3}]), of course!

Since he’s an expert on arithmetic Coxeter groups, this must be about right! Theorem 10.2 in this paper he showed me:

• Norman W. Johnson and Asia Ivic Weiss, Quadratic integers and Coxeter Groups, Canad. J. Math. Vol. 51 (1999), 1307–1336.

is a bit more precise. It gives a nice description of the even part of the Coxeter group discussed in this article, that is, the part generated by products of pairs of reflections. To get this group, we start with 2 × 2 matrices with entries in the Eisenstein integers: the integers with a cube root of -1 adjoined. We look at the matrices where the absolute value of the determinant is 1, and then we ‘projectivize’ it, modding out by its center. That does the job!

They call the even part of the Coxeter group [3,3,6]+, and they call the group it’s isomorphic to \mathrm{P\overline{S}L}_2(\mathbb{E}), where \mathbb{E} is their notation for the Eisenstein integers, also called \mathbb{Z}[e^{2 \pi i / 3}]. The weird little line over the \mathrm{S} is a notation of theirs: \mathrm{SL}_2 stands for 2 × 2 matrices with determinant 1, but \mathrm{\overline{S}L}_2 is their notation for 2 × 2 matrices whose determinant has absolute value 1.

Can you say more about this case? What about the other cases?


Noether’s Theorem: Quantum vs Stochastic

3 May, 2014

guest post by Ville Bergholm

In 1915 Emmy Noether discovered an important connection between the symmetries of a system and its conserved quantities. Her result has become a staple of modern physics and is known as Noether’s theorem.

Photo of Emmy Noether

The theorem and its generalizations have found particularly wide use in quantum theory. Those of you following the Network Theory series here on Azimuth might recall Part 11 where John Baez and Brendan Fong proved a version of Noether’s theorem for stochastic systems. Their result is now published here:

• John Baez and Brendan Fong, A Noether theorem for stochastic mechanics, J. Math. Phys. 54:013301 (2013).

One goal of the network theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. John and Brendan proved their stochastic version of Noether’s theorem by exploiting ‘stochastic mechanics’ which was formulated in the network theory series to mathematically resemble quantum theory. Their result, which we will outline below, was different than what would be expected in quantum theory, so it is interesting to try to figure out why.

Recently Jacob Biamonte, Mauro Faccin and myself have been working to try to get to the bottom of these differences. What we’ve done is prove a version of Noether’s theorem for Dirichlet operators. As you may recall from Parts 16 and 20 of the network theory series, these are the operators that generate both stochastic and quantum processes. In the language of the series, they lie in the intersection of stochastic and quantum mechanics. So, they are a subclass of the infinitesimal stochastic operators considered in John and Brendan’s work.

The extra structure of Dirichlet operators—compared with the wider class of infinitesimal stochastic operators—provided a handle for us to dig a little deeper into understanding the intersection of these two theories. By the end of this article, astute readers will be able to prove that Dirichlet operators generate doubly stochastic processes.

Before we get into the details of our proof, let’s recall first how conservation laws work in quantum mechanics, and then contrast this with what John and Brendan discovered for stochastic systems. (For a more detailed comparison between the stochastic and quantum versions of the theorem, see Part 13 of the network theory series.)

The quantum case

I’ll assume you’re familiar with quantum theory, but let’s start with a few reminders.

In standard quantum theory, when we have a closed system with n states, the unitary time evolution of a state |\psi(t)\rangle is generated by a self-adjoint n \times n matrix H called the Hamiltonian. In other words, |\psi(t)\rangle satisfies Schrödinger’s equation:

i \hbar \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle.

The state of a system starting off at time zero in the state |\psi_0 \rangle and evolving for a time t is then given by

|\psi(t) \rangle = e^{-i t H}|\psi_0 \rangle.

The observable properties of a quantum system are associated with self-adjoint operators. In the state |\psi \rangle, the expected value of the observable associated to a self-adjoint operator O is

\langle O \rangle_{\psi} = \langle \psi | O | \psi \rangle

This expected value is constant in time for all states if and only if O commutes with the Hamiltonian H:

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t.

In this case we say O is a ‘conserved quantity’. The fact that we have two equivalent conditions for this is a quantum version of Noether’s theorem!

The stochastic case

In stochastic mechanics, the story changes a bit. Now a state |\psi(t)\rangle is a probability distribution: a vector with n nonnegative components that sum to 1. Schrödinger’s equation gets replaced by the master equation:

\displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle

If we start with a probability distribution |\psi_0 \rangle at time zero and evolve it according to this equation, at any later time have

|\psi(t)\rangle = e^{t H} |\psi_0 \rangle.

We want this always be a probability distribution. To ensure that this is so, the Hamiltonian H must be infinitesimal stochastic: that is, a real-valued n \times n matrix where the off-diagonal entries are nonnegative and the entries of each column sum to zero. It no longer needs to be self-adjoint!

When H is infinitesimal stochastic, the operators e^{t H} map the set of probability distributions to itself whenever t \ge 0, and we call this family of operators a continuous-time Markov process, or more precisely a Markov semigroup.

In stochastic mechanics, we say an observable O is a real diagonal n \times n matrix, and its expected value is given by

\langle O\rangle_{\psi} = \langle \hat{O} | \psi \rangle

where \hat{O} is the vector built from the diagonal entries of O. More concretely,

\langle O\rangle_{\psi} = \displaystyle{ \sum_i O_{i i} \psi_i }

where \psi_i is the ith component of the vector |\psi\rangle.

Here is a version of Noether’s theorem for stochastic mechanics:

Noether’s Theorem for Markov Processes (Baez–Fong). Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0

and

\displaystyle{\frac{d}{d t}}\langle O^2 \rangle_{\psi(t)} = 0

for all t \ge 0 and all \psi(t) obeying the master equation.   █

So, just as in quantum mechanics, whenever [O,H]=0 the expected value of O will be conserved:

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

for any \psi_0 and all t \ge 0. However, John and Brendan saw that—unlike in quantum mechanics—you need more than just the expectation value of the observable O to be constant to obtain the equation [O,H]=0. You really need both

\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

together with

\displaystyle{\frac{d}{d t}} \langle O^2\rangle_{\psi(t)} = 0

for all initial data \psi_0 to be sure that [O,H]=0.

So it’s a bit subtle, but symmetries and conserved quantities have a rather different relationship than they do in quantum theory.

A Noether theorem for Dirichlet operators

But what if the infinitesimal generator of our Markov semigroup is also self-adjoint? In other words, what if H is both an infinitesimal stochastic matrix but also its own transpose: H = H^\top? Then it’s called a Dirichlet operator… and we found that in this case, we get a stochastic version of Noether’s theorem that more closely resembles the usual quantum one:

Noether’s Theorem for Dirichlet Operators. If H is a Dirichlet operator and O is an observable, then

[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

Proof. The \Rightarrow direction is easy to show, and it follows from John and Brendan’s theorem. The point is to show the \Leftarrow direction. Since H is self-adjoint, we may use a spectral decomposition:

H = \displaystyle{ \sum_k E_k |\phi_k \rangle \langle \phi_k |}

where \phi_k are an orthonormal basis of eigenvectors, and E_k are the corresponding eigenvalues. We then have:

\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = \langle \hat{O} | H e^{t H} |\psi_0 \rangle = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

\iff \quad \langle \hat{O}| H e^{t H} = 0 \quad \forall t \ge 0

\iff \quad \sum_k \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} \langle \phi_k| = 0 \quad \forall t \ge 0

\iff \quad \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} = 0 \quad \forall t \ge 0

\iff \quad |\hat{O} \rangle \in \mathrm{Span}\{|\phi_k \rangle \, : \; E_k = 0\} = \ker \: H,

where the third equivalence is due to the vectors |\phi_k \rangle being linearly independent. For any infinitesimal stochastic operator H the corresponding transition graph consists of m connected components iff we can reorder (permute) the states of the system such that H becomes block-diagonal with m blocks. Now it is easy to see that the kernel of H is spanned by m eigenvectors, one for each block. Since H is also symmetric, the elements of each such vector can be chosen to be ones within the block and zeros outside it. Consequently

|\hat{O} \rangle \in \ker \: H

implies that we can choose the basis of eigenvectors of O to be the vectors |\phi_k \rangle, which implies

[O, H] = 0

Alternatively,

|\hat{O} \rangle \in \ker \, H

implies that

|\hat{O^2} \rangle \in \ker \: H \; \iff \; \cdots \; \iff \; \displaystyle{\frac{d}{d t}} \langle O^2 \rangle_{\psi(t)} = 0 \; \forall \: |\psi_0 \rangle, \forall t \ge 0,

where we have used the above sequence of equivalences backwards. Now, using John and Brendan’s original proof, we can obtain [O, H] = 0.   █

In summary, by restricting ourselves to the intersection of quantum and stochastic generators, we have found a version of Noether’s theorem for stochastic mechanics that looks formally just like the quantum version! However, this simplification comes at a cost. We find that the only observables O whose expected value remains constant with time are those of the very restricted type described above, where the observable has the same value in every state in a connected component.

Puzzles

Suppose we have a graph whose graph Laplacian matrix H generates a Markov semigroup as follows:

U(t) = e^{t H}

Puzzle 1. Suppose that also H = H^\top, so that H is a Dirichlet operator and hence i H generates a 1-parameter unitary group. Show that the indegree and outdegree of any node of our graph must be equal. Graphs with this property are called balanced.

Puzzle 2. Suppose that U(t) = e^{t H} is doubly stochastic Markov semigroup, meaning that for all t \ge 0 each row and each column of U(t) sums to 1:

\displaystyle{ \sum_i U(t)_{i j} = \sum_j U(t)_{i j} = 1 }

and all the matrix entries are nonnegative. Show that the Hamiltonian H obeys

\displaystyle{\sum_i H_{i j} = \sum_j H_{i j} = 0 }

and all the off-diagonal entries of H are nonnegative. Show the converse is also true.

Puzzle 3. Prove that any doubly stochastic Markov semigroup U(t) is of the form e^{t H} where H is the graph Laplacian of a balanced graph.

Puzzle 4. Let O(t) be a possibly time-dependent observable, and write \langle O(t) \rangle_{\psi(t)} for its expected value with respect to some initial state \psi_0 evolving according to the master equation. Show that

\displaystyle{ \frac{d}{d t}\langle O(t)\rangle_{\psi(t)} = \left\langle [O(t), H] \right\rangle_{\psi(t)} + \left\langle \frac{\partial O(t)}{\partial t}\right\rangle_{\psi(t)} }

This is a stochastic version of the Ehrenfest theorem.


Programming with Chemical Reaction Networks

23 March, 2014

 

There will be a 5-day workshop on Programming with Chemical Reaction Networks: Mathematical Foundation at BIRS from Sunday, June 8 to Friday June 13, 2014 It’s being organized by

Anne Condon (University of British Columbia)
David Doty (California Institute of Technology)
Chris Thachuk (University of Oxford).

BIRS is the Banff International Research Station, in the mountains west of Calgary, in Alberta, Canada.

Description

Here’s the workshop proposal on the BIRS website. It’s a pretty interesting proposal, especially if you’ve already read Luca Cardelli’s description of computing with chemical reaction networks, at the end of our series of posts on chemical reaction networks. The references include a lot of cool papers, so I’ve created links to those to help you get ahold of them.

This workshop will explore three of the most important research themes concerning stochastic chemical reaction networks (CRNs). Below we motivate each theme and highlight key questions that the workshop will address. Our main objective is to bring together distinct research communities in order to consider new problems that could not be fully appreciated in isolation. It is also our aim to determine commonalities between different disciplines and bodies of research. For example, research into population protocols, vector addition systems, and Petri networks provide a rich body of theoretical results that may already address contemporary problems arising in the study of CRNs.

Computational power of CRNs

Before designing robust and practical systems, it is useful to know the limits to computing with a chemical soup. Some interesting theoretical results are already known for stochastic chemical reaction networks. The computational power of CRNs depend upon a number of factors, including: (i) is the computation deterministic, or probabilistic, and (ii) does the CRN have an initial context — certain species, independent of the input, that are initially present in some exact, constant count.

In general, CRNs with a constant number of species (independent of the input length) are capable of Turing universal computation [17], if the input is represented by the exact (unary) count of one molecular species, some small probability of error is permitted and an initial context in the form of a single-copy leader molecule is used.

Could the same result hold in the absence of an initial context? In a surprising result based on the distributed computing model of population protocols, it has been shown that if a computation must be error-free, then deterministic computation with CRNs having an initial context is limited to computing semilinear predicates [1], later extended to functions outputting natural numbers encoded by molecular counts [5].

Furthermore, any semilinear predicate or function can be computed by that class of CRNs in expected time polylogarithmic in the input length. Building on this result, it was recently shown that by incurring an expected time linear in the input length, the same result holds for “leaderless” CRNs [8] — CRNs with no initial context. Can this result be improved to sub-linear expected time? Which class of functions can be computed deterministically by a CRN without an initial context in expected time polylogarithmic in the input length?

While (restricted) CRNs are Turing-universal, current results use space proportional to the computation time. Using a non-uniform construction, where the number of species is proportional to the input length and each initial species is present in some constant count, it is known that any S(n) space-bounded computation can be computed by a logically-reversible tagged CRN, within a reaction volume of size poly(S(n)) [18]. Tagged CRNs were introduced to model explicitly the fuel molecules in physical realizations of CRNs such as DNA strand displacement systems [6] that are necessary to supply matter and energy for implementing reactions such as X → X + Y that violate conservation of mass and/or energy.

Thus, for space-bounded computation, there exist CRNs that are time-efficient or are space-efficient. Does there exist time- and space-efficient CRNs to compute any space-bounded function?

Designing and verifying robust CRNs

While CRNs provide a concise model of chemistry, their physical realizations are often more complicated and more granular. How can one be sure they accurately implement the intended network behaviour? Probabilistic model checking has already been employed to find and correct inconsistencies between CRNs and their DNA strand displacement system (DSD) implementations [9]. However, at present, model checking of arbitrary CRNs is only capable of verifying the correctness of very small systems. Indeed, verification of these types of systems is a difficult problem: probabilistic state reachability is undecidable [17, 20] and general state reachability is EXPSPACE-hard [4].

How can larger systems be verified? A deeper understanding of CRN behaviour may simplify the process of model checking. As a motivating example, there has been recent progress towards verifying that certain DSD implementations correctly simulate underlying CRNs [16, 7, 10]. This is an important step to ensuring correctness, prior to experiments. However, DSDs can also suffer from other errors when implementing CRNs, such as spurious hybridization or strand displacement. Can DSDs and more generally CRNs be designed to be robust to such predictable errors? Can error correcting codes and redundant circuit designs used in traditional computing be leveraged in these chemical computers? Many other problems arise when implementing CRNs. Currently, unique types of fuel molecules must be designed for every reaction type. This complicates the engineering process significantly. Can a universal type of fuel be designed to smartly implement any reaction?

Energy efficient computing with CRNs

Rolf Landauer showed that logically irreversible computation — computation as modeled by a standard Turing machine — dissipates an amount of energy proportional to the number of bits of information lost, such as previous state information, and therefore cannot be energy efficient [11]. However, Charles Bennett showed that, in principle, energy efficient computation is possible, by proposing a universal Turing machine to perform logically-reversible computation and identified nucleic acids (RNA/DNA) as a potential medium to realize logically-reversible computation in a physical system [2].

There have been examples of logically-reversible DNA strand displacement systems — a physical realization of CRNs — that are, in theory, capable of complex computation [12, 19]. Are these systems energy efficient in a physical sense? How can this argument be made formally to satisfy both the computer science and the physics communities? Is a physical experiment feasible, or are these results merely theoretical footnotes?

References

[1] D. Angluin, J. Aspnes, and D. Eisenstat. Stably computable predicates are semilinear. In PODC, pages 292–299, 2006.

[2] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and Development, 17 (6):525–532, 1973.

[3] L. Cardelli and A. Csikasz-Nagy. The cell cycle switch computes approximate majority. Scientific Reports, 2, 2012.

[4] E. Cardoza, R. Lipton, A. R. Meyer. Exponential space complete problems for Petri nets and commutative semigroups (Preliminary Report). Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pages 507–54, 1976.

[5] H. L. Chen, D. Doty, and D. Soloveichik. Deterministic function computation with chemical reaction networks. DNA Computing and Molecular Programming, pages 25–42, 2012.

[6] A. Condon, A. J. Hu, J. Manuch, and C. Thachuk. Less haste, less waste: on recycling and its limits in strand displacement systems. Journal of the Royal Society: Interface Focus, 2 (4):512–521, 2012.

[7] Q. Dong. A bisimulation approach to verification of molecular implementations of formal chemical reaction network. Master’s thesis. SUNY Stony Brook, 2012.

[8] D. Doty and M. Hajiaghayi. Leaderless deterministic chemical reaction networks. In Proceedings of the 19th International Meeting on DNA Computing and Molecular Programming, 2013.

[9] M. R. Lakin, D. Parker, L. Cardelli, M. Kwiatkowska, and A. Phillips. Design and analysis of DNA strand displacement devices using probabilistic model checking. Journal of The Royal Society Interface, 2012.

[10] M. R. Lakin, D. Stefanovic and A. Phillips. Modular Verification of Two-domain DNA Strand Displacement Networks via Serializability Analysis. In Proceedings of the 19th Annual conference on DNA computing, 2013.

[11] R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of research and development, 5 (3):183–191, 1961.

[12] L. Qian, D. Soloveichik, and E. Winfree. Efficient Turing-universal computation with DNA polymers (extended abstract) . In Proceedings of the 16th Annual conference on DNA computing, pages 123–140, 2010.

[13] L. Qian and E. Winfree. Scaling up digital circuit computation with DNA strand displacement cascades. Science, 332 (6034):1196–1201, 2011.

[14] L. Qian, E. Winfree, and J. Bruck. Neural network computation with DNA strand displacement cascades. Nature, 475 (7356):368–372, 2011.

[15] G. Seelig, D. Soloveichik, D.Y. Zhang, and E. Winfree. Enzyme-free nucleic acid logic circuits. Science, 314 (5805):1585–1588, 2006.

[16] S. W. Shin. Compiling and verifying DNA-based chemical reaction network implementations. Master’s thesis. California Insitute of Technology, 2011.

[17] D. Soloveichik, M. Cook, E. Winfree, and J. Bruck. Computation with finite stochastic chemical reaction networks. Natural Computing, 7 (4):615–633, 2008.

[18] C. Thachuk. Space and energy efficient molecular programming. PhD thesis, University of British Columbia, 2012.

[19] C. Thachuk and A. Condon. Space and energy efficient computation with DNA strand displacement systems. In Proceedings of the 18th Annual International Conference on DNA computing and Molecular Programming, 2012.

[20] G. Zavattaro and L. Cardelli. Termination Problems in Chemical Kinetics. In Proceedings of the 2008 Conference on Concurrency Theory, pages 477–491, 2008.


Follow

Get every new post delivered to your Inbox.

Join 2,847 other followers