## Network Theory Seminar (Part 2)

16 October, 2014

This time I explain more about how ‘cospans’ represent gadgets with two ends, an input end and an output end:

I describe how to glue such gadgets together by composing cospans. We compose cospans using a category-theoretic construction called a ‘pushout’, so I also explain pushouts. At the end, I explain how this gives us a category where the morphisms are electrical circuits made of resistors, and sketch what we’ll do next: study the behavior of these circuits.

These lecture notes provide extra details:

## Network Theory (Part 31)

13 October, 2014

Last time we came up with a category of labelled graphs and described circuits as ‘cospans’ in this category.

Cospans may sound scary, but they’re not. A cospan is just a diagram consisting of an object with two morphisms going into it:

We can talk about cospans in any category. A cospan is an abstract way of thinking about a ‘chunk of stuff’ $\Gamma$ with two ‘ends’ $I$ and $O.$ It could be any sort of stuff: a set, a graph, an electrical circuit, a network of any kind, or even a piece of matter (in some mathematical theory of matter).

We call the object $\Gamma$ the apex of the cospan and call the morphisms $i: I \to \Gamma, o : O \to \Gamma$ the legs of the cospan. We sometimes call the objects $I$ and $O$ the feet of the cospan. We call $I$ the input and $O$ the output. We say the cospan goes from $I$ to $O,$ though the direction is just a convention: we can flip a cospan and get a cospan going the other way!

If you’re wondering about the name ‘cospan’, it’s because a span is a diagram like this:

Since a ‘span’ is another name for a bridge, and this looks like a bridge from $I$ to $O,$ category theorists called it a span! And category theorists use the prefix ‘co-‘ when they turn all the arrows around. Spans came first historically, and we will use those too at times. But now let’s think about how to compose cospans.

Composing cospans is supposed to be like gluing together chunks of stuff by attaching the output of the first to the input of the second. So, we say two cospans are composable if the output of the first equals the input of the second, like this:

We then compose them by forming a new cospan going all the way from $X$ to $Z$:

The new object $\Gamma +_Y \Gamma'$ and the new morphisms $i'', o''$ are built using a process called a ‘pushout’ which I’ll explain in a minute. The result is cospan from $X$ to $Z,$ called the composite of the cospans we started with. Here it is:

So how does a pushout work? It’s a general construction that you can define in any category, though it only exists if the category is somewhat nice. (Ours always will be.) You start with a diagram like this:

and you want to get a commuting diamond like this:

which is in some sense ‘the best’ given the diagram we started with. For example, suppose we’re in the category of sets and $Y$ is a set included in both $\Gamma$ and $\Gamma'.$ Then we’d like $A$ to be the union of $\Gamma$ and $\Gamma.$ There are other choices of $A$ that would give a commuting diamond, but the union is the best. Something similar is happening when we compose circuits, but instead of the category of sets we’re using the category of labelled graphs we discussed last time.

How do we make precise the idea that $A$ is ‘the best’? We consider any other potential solution to this problem, that is, some other commuting diamond:

Then $A$ is ‘the best’ if there exists a unique morphism $q$ from $A$ to the ‘competitor’ $Q$ making the whole combined diagram commute:

This property is called a universal property: instead of saying that $A$ is the ‘best’, grownups say it is universal.

When $A$ has this universal property we call it the pushout of the original diagram, and we may write it as $\Gamma +_Y \Gamma'.$ Actually we should call the whole diagram

the pushout, or a pushout square, because the morphisms $i'', o''$ matter too. The universal property is not really a property just of $A,$ but of the whole pushout square. But often we’ll be sloppy and call just the object $A$ the pushout.

Puzzle 1. Suppose we have a diagram in the category of sets

where $Y = \Gamma \cap \Gamma'$ and the maps $i, o'$ are the inclusions of this intersection in the sets $\Gamma$ and $\Gamma'.$ Prove that $A = \Gamma \cup \Gamma'$ is the pushout, or more precisely the diagram

is a pushout square, where $i'', o''$ are the inclusions of $\Gamma$ and $\Gamma$ in the union $A = \Gamma \cup \Gamma'.$

More generally, a pushout in the category of sets is a way of gluing together sets $\Gamma$ and $\Gamma'$ with some ‘overlap’ given by the maps

And this works for labelled graphs, too!

Puzzle 2. Suppose we have two circuits of resistors that are composable, like this:

and this:

These give cospans in the category $L\mathrm{Graph}$ where

$L = (0,\infty)$

(Remember from last time that $L\mathrm{Graph}$ is the category of graphs with edges labelled by elements of some set $L.$) Show that if we compose these cospans we get a cospan corresponding to this circuit:

If you’re a mathematician you might find it easier to solve this kind of problem in general, which requires pondering how pushouts work in $L\mathrm{Graph}.$ Alternatively, you might find it easier to think about this particular example: then you can just check that the answer we want has the desired property of a pushout!

If this stuff seems complicated, well, just know that category theory is a very general, powerful tool and I’m teaching you just the microscopic fragment of it that we need right now. Category theory ultimately seems very simple: I can’t really think of any math that’s simpler! It only seem complicated when it’s unfamiliar and you have a fragmentary view of it.

So where are we? We know that circuits made of resistors are a special case of cospans. We know how to compose cospans. So, we know how to compose circuits… and in the last puzzle, we saw this does just what we want.

The advantage of this rather highbrow approach is that a huge amount is known about composing cospans! In particular, suppose we have any category $C$ where pushouts exist: that is, where we can always complete any diagram like this:

to a pushout square. Then we can form a category $\mathrm{Cospan}(C)$ where:

• an object is an object of $C$

• an morphisms from an object $I \in C$ to an object $O \in C$ is an equivalence classes of spans from $I$ to $O:$

• we compose cospans in the manner just described.

Why did I say ‘equivalence class’? It’s because the pushout is not usually unique. It’s unique only up to isomorphism. So, composing cospans would be ill-defined unless we work with some kind of equivalence class of cospans.

To be precise, suppose we have two cospans from $I$ to $O$:

Then a map of cospans from one to the other is a commuting diagram like this:

We say that this is an isomorphism of cospans if $f$ is an isomorphism.

This gives our equivalence relation on cospans! It’s an old famous theorem in category theory—so famous that it’s hard to find a reference for the proof—that whenever $C$ is a category with pushouts, there’s a category $\mathrm{Cospan}(C)$ where:

• an object is an object of $C$

• an morphisms from an object $I \in C$ to an object $O \in C$ is an isomorphism class of spans from $I$ to $O.$

• we compose isomorphism classes of cospans by picking representatives, composing them and then taking the isomorphism class.

This takes some work to prove, but it’s true, so this is how we get our category of circuits!

Next time we’ll do something with this category. Namely, we’ll cook up a category of ‘behaviors’. The behavior of a circuit made of resistors just says which currents and potentials its terminals can have. If we put a circuit in a metaphorical ‘black box’ and refuse to peek inside, all we can see is its behavior.

Then we’ll cook up a functor from the category of circuits to the category of behaviors. We’ll call this the ‘black box functor’. Saying that it’s a functor mainly means that

$\blacksquare(f g) = \blacksquare(f) \blacksquare(g)$

Here $f$ and $g$ are circuits that we can compose, and $f g$ is their composite. The black square is the black box functor, so $\blacksquare(fg)$ is the behavior of the circuit $f g.$ There’s a way to compose behaviors, too, and the equation above says that the behavior of the composite circuit is the composite of their behaviors!

This is very important, because it says we can figure out what a big circuit does if we know what its pieces do. And this is one of the grand themes of network theory: understanding big complicated networks by understanding their pieces. We may not always be able to do this, in practice! But it’s something we’re always concerned with.

## Network Theory Seminar (Part 1)

11 October, 2014

Check out this video! I start with a quick overview of network theory, and then begin building a category where the morphisms are electrical circuits. These lecture notes provide extra details:

With luck, this video will be the first of a series. I’m giving a seminar on network theory at U.C. Riverside this fall. I’ll start by sketching the results here:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

But this is a big paper, and I also want to talk about other papers, so I certainly won’t explain everything in here—just enough to help you get started! If you have questions, don’t be shy about asking them.

I thank Blake Pollard for filming this seminar, and Muhammad “Siddiq” Siddiqui-Ali for providing the videocamera and technical support.

## Network Theory (Part 30)

3 October, 2014

The network theory series is back! You may have thought it died out, but in fact it’s just getting started. Over the last year my grad students have made huge strides in working out the math of networks. Now it’s time to explain what they’ve done.

In the last three episodes I explained how electrical circuits made of resistors, inductors and capacitors are a great metaphor for many kinds of complex systems made of interacting parts. And it’s not just a metaphor; there are mathematically rigorous analogies—in other words, isomorphisms—between such electrical circuits and various other kinds of ‘passive linear networks’.

I showed you a chart of these analogies last time:

 displacement:    $q$ flow:      $\dot q$ momentum:      $p$ effort:           $\dot p$ Mechanics: translation position velocity momentum force Mechanics: rotation angle angular velocity angular momentum torque Electronics charge current flux linkage voltage Hydraulics volume flow pressure momentum pressure Thermal Physics entropy entropy flow temperature momentum temperature Chemistry moles molar flow chemical momentum chemical potential

But what do I mean by a ‘passive linear network’? Let me explain this very roughly at first, since we’ll be painfully precise later on.

Right now by ‘network’ I mean a graph with gizmos called ‘components’ on the edges. For example:

In a network there is some kind of ‘flow’ running along each edge, and also some kind of ‘effort’ across that edge. For example, in electronics the flow is electrical current and the effort is voltage. The chart shows the meaning of flow and effort in other examples.

‘Passivity’ means roughly that none of the components put out energy that didn’t earlier come in. For example, resistors lose energy (which goes into heat, which we’re not counting). Capacitors can store energy and later release it. So, resistors and capacitors are passive—and so are inductors. But batteries and current sources actually put out energy, so we won’t allow them in our networks yet. For now, we’re just studying how passive components respond to a source of flow or effort.

For some subtleties that show up when you try to make the concept of passivity precise, try:

Passivity (engineering), Wikipedia.

Finally, ‘linearity’ means that the flow along each edge of our network is linearly related to the effort across that edge. Here are the key examples:

• For electrical resistors, linearity is captured by Ohm’s law. If an edge $e$ in our network is labelled by a resistor of resistance $R,$ usually drawn like this:

then Ohm’s law says:

$V = R I$

where $V$ is the voltage across that edge and $I$ is the current along that edge.

• If our edge $e$ is labelled by an inductor of inductance $L$:

we have

$\displaystyle{ V = L \frac{d I}{d t} }$

Here we need to think of the voltage and current as functions of time.

• If our edge $e$ is labelled by a capacitor of capacitance $C$:

we write the equation

$\displaystyle{ I = C \frac{d V}{d t} }$

where again we think of the voltage and current as functions of time.

Both linearity and passivity are simplifying assumptions that we eventually want to drop. If we include batteries or current sources, we’re dropping passivity. And if include transistors, we’re dropping linearity. Obviously both these are important!

However, there is a lot to do even with these simplifying assumptions. And now it’s time to get started!

In what follows, I will not use the terms ‘flow’ and ‘effort’, which are chosen to be neutral and subject-independent. Instead, I’ll use the vocabulary from electronics, e.g. ‘current’ and ‘voltage’. The reason is that we’ve all heard of resistors, capacitors, Ohm’s law and Kirchhoff’s laws, and while these have analogues in every row of the chart, it seems pointless to make up weird new ‘neutral’ terms for all these concepts.

But don’t be fooled by the jargon! We’re not merely studying electrical circuits. We’ll be studying passive linear networks in full generality… with the help of category theory.

### Linear passive networks as morphisms

To get going, let’s think about circuits made of resistors. We can do this without harm, because we’ll later include capacitors and inductors using a simple effortless trick. Namely, we’ll generalize the ‘resistance’ of a resistor, which is a real number, to something called ‘impedance’, which is an element of some larger field. Everything will be so abstract that replacing resistances with impedances will be as easy as snapping our fingers.

Right now I want to define a category where the morphisms are circuits made of resistors. Any morphism will go from some ‘inputs’ to some ‘outputs’, like this:

So a morphism is roughly a graph with edges labelled by numbers called ‘resistances’, with some special nodes called ‘inputs’ and some special nodes called ‘outputs’.

What can do with morphisms? Compose them! So, suppose we have a second morphism whose inputs match the outputs of the first:

Then we can compose them, attaching the outputs of the first to the inputs of the second. We get this morphism as the result:

So, composing morphisms is a way to build big electrical circuits—or other ‘linear passive networks’—out of little ones.

This seems pretty simple, but let’s try to formalize it and see why we have a category. In fact it takes a bit of thought. To check that we get a category, we need to check that composition is associative:

$(fg)h = f(gh)$

and that each object $x$ has an identity morphism $1_x : x \to x$ that does what an identity should:

$f 1_x = f$

$1_x g = g$

All these equations seem obviously true in our example… until you try to prove them.

You might think an identity morphism should be a bunch of straight pieces of wire—a bunch of edges each with an input node and an output node—but that doesn’t quite work, since sticking an extra edge onto a graph gives a new graph with an extra edge!

Also, we are composing circuits by ‘sticking them together’. This process is formalized in category theory using a pushout, and pushouts are only defined ‘up to canonical isomorphism’. The very simplest example is the disjoint union of two sets. We all know what it means, but if you examine it carefully, you’ll see it’s only defined up to canonical isomorphism, because it involves a choice of how we make the two sets disjoint, and this choice is somewhat arbitrary.

All this means the category we’re after is a bit subtler than you might at first expect; in fact, it’s most naturally thought of as a bicategory, meaning roughly that all the equations above hold only ‘up to canonical isomorphism’.

So, we proceed like this.

First we define a concept of ‘labelled graph’, where (for now) only the edges are labelled. We do this because we want our circuits to have edges labelled by ‘resistances’, which are real numbers. But we do it in greater generality because later we’ll want the edges to be labelled by ‘impedances’, which are elements of some other field. And since we’re studying electrical circuits just as examples of networks, later still we will probably want graphs whose edges are labelled in still other ways.

So:

Definition. A graph consists a finite set $E$ of edges, a finite set $N$ of nodes, and two functions

$s,t : E \to N$

Thus each edge $e$ will have some node $s(e)$ as its source and some node $t(e)$ as its target:

Definition. Given a set $L,$ we define an $L$-labelled graph to be a graph together with a function $r : E \to L.$ This assigns to each edge $e \in E$ its label $r(e) \in L.$ We call $L$ the label set.

We use the letter $r$ because for circuits of resistors we will take the label set to be

$L = (0,\infty) \subset \mathbb{R}$

the positive real numbers, and $r(e)$ will be the resistance of the edge $e.$ For circuits that also contain inductors and capacitors we will take the label set to be the positive elements of some larger field… but more about that later!

Now we want to give our $L$-labelled graph a set of nodes called ‘inputs’ and a set of nodes called ‘outputs’. You might think the set of inputs should be disjoint from the set of outputs, but that’s a tactical error! It turns out an identity morphism in our category should have the inputs being exactly the same as the outputs… and no edges at all:

To handle this nicely, we need to make a category of $L$-labelled graphs. This works in the obvious way, if you’re used to this stuff. A morphism from one $L$-labelled graph to another sends edges to edges, nodes to nodes, and preserves everything in sight:

Definition. Given $L$-graphs $\Gamma = (E,N,s,t,r)$ and $\Gamma' = (E',N',s',t',r'),$ a morphism of $L$-labelled graphs from $\Gamma$ to $\Gamma'$ is a pair of functions

$\epsilon: E \to E'$

$\nu : N \to N'$

such that the following diagrams commute:

There is a category $L\mathrm{Graph}$ where the objects are $L$-labelled graphs and the morphisms are as we’ve just defined them.

Warning: the morphisms in $L\mathrm{Graph}$ are not the morphisms of the kind we really want, the ones that look like this:

They are just a step along the way. A morphism of the kind we really want is a diagram like this in $L\mathrm{Graph}$:

where $\Gamma$ is an $L$-labelled graph and $I, O$ are $L$-labelled graphs with no edges!

You see, if $I$ and $O$ have no edges, all they have is nodes. We call the nodes of $I$ the inputs and those of $O$ the outputs. The morphisms $i: I \to \Gamma$ and $o : O \to \Gamma$ say how these nodes are included in $\Gamma.$ $\Gamma$ is our circuit made of resistors.

In general, any diagram shaped like this is called a cospan:

If we turned the arrows around it would be called a span. Cospans are good whenever you a thing with an ‘input end’ and an ‘output end’, and you want to describe how the ends are included in the thing. So, they’re precisely what we need for describing a circuit made of resistors, like this:

This makes us want to cook up a category $L\mathrm{Circ}$ where:

• an object $I$ is an $L$-labelled graph with no edges. We can alternatively think of it as a finite set: a set of nodes.

• a morphism from $I$ to $O$ is a cospan of $L$-labelled graphs:

We still need to say how to compose these morphisms. We know it will amount to attaching the outputs of one circuit to the inputs of the next—that’s all there is to it! But we need to make this precise and prove we get a category. And as I’ve hinted, we will actually get something bigger and better: a bicategory! This will come as no surprise to if you’re familiar with span and cospan bicategories–but it may induce a heart attack otherwise.

This bicategory can then be ‘watered down’ to give our category $L\mathrm{Circ}.$ And when we take

$L = (0,\infty)$

we’ll get the category where morphisms are circuits made of resistors! We’ll call this $\mathrm{ResCirc}.$

Then I’ll explain what we can do with this category! There’s no end of things we could do with it. But the main thing Brendan does is study the ‘black-boxing’ operation, where we take a circuit, forget its inner details, and only keep track of what it does. This turns out to be quite interesting.

### References

I thank Brendan Fong for drawing some of the pictures of circuits here. For the details of what I’m starting to explain here, read our paper:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

• Dean C. Karnopp, Donald L. Margolis and Ronald C. Rosenberg, System Dynamics: a Unified Approach, Wiley, New York, 1990.

• Forbes T. Brown, Engineering System Dynamics: a Unified Graph-Centered Approach, CRC Press, Boca Raton, 2007.

• Francois E. Cellier, Continuous System Modelling, Springer, Berlin, 1991.

## The Logic of Real and Complex Numbers

8 September, 2014

I’ve always liked logic. I studied it a bunch in high school and college. Nowadays it’s a kind of hobby. I turn to it for relief sometimes when I become frustrated trying to figure out what I can do about global warming. Lately I’ve been digging a bit deeper into the logic behind the real and complex numbers. And I’m teaching a graduate course on real analysis this fall, so I actually have a slight excuse for doing this.

There’s something about logic that’s both fascinated and terrified me ever since I was a kid: it’s how we can’t fully pin down infinite structures, like the real or complex number systems, using a language with finitely many symbols and a theory with finitely many axioms.

It’s terrifying that we don’t fully know what we’re talking about when we’re talking about numbers! But it’s fascinating that we can understand a lot about the limitations.

There are many different things to say about this, depending on what features of these number systems we want to describe, and what kind of logic we want to use.

Maybe I should start with the natural numbers, since that story is more famous. This can also serve as a lightning review of some basic concepts which I’ll pretend you already vaguely know: first-order versus second-order logic, proofs versus models, and so on. If you don’t know these, you can either fake it or read some of the many links in this article!

### Natural numbers

When Peano originally described the natural numbers he did so using axioms phrased in second-order logic. In first-order logic we can quantify over variables: for example, we can say

$\forall x \; (P(x)) \; \Rightarrow \; P(y)) \;$

which means that if the predicate $P$ holds for all $x$ it holds for any variable $y.$ In second-order logic we can also quantify over predicates: for example, we can say

$\forall P \; (P(x) \Leftrightarrow P(y)) \; \Leftrightarrow \; x = y$

which says that $x = y$ if and only if for every predicate $P,$ $P(x)$ is true precisely when $P(y)$ is true. Leibniz used this principle, called the identity of indiscernibles, to define equality… and this is a nice example of the greater power of second-order logic. In first-order logic we typically include equality as part of the language and add axioms describing its properties, like

$\forall x \; \forall y \; (x = y \Leftrightarrow y = x)$

In second-order logic we can define equality and prove these properties starting from the properties we already have for $\Leftrightarrow.$

Anyway, in his axioms for the natural numbers, Peano used second-order logic to formulate the principle of mathematical induction in this sort of way:

$\forall P \; \big[ P(0) \; \& \; \forall n \; ((P(n) \Rightarrow P(n+1)) \; \; \Rightarrow \; \; \forall n \; P(n))\big]$

This says that if you’ve got any predicate that’s true for $0$ and is true for $n+1$ whenever it’s true for $n,$ then it’s true for all natural numbers.

In 1888, Dedekind showed that Peano’s original axioms for the natural numbers are categorical, meaning all its models are isomorphic.

The concept of ‘model’ involves set theory. In a model you pick a set $S$ for your variables to range over, pick a subset of $S$ for each predicate—namely the subset where that predicate is true —and so on, in such a way that all the axioms in that theory are satisfied. If two models are isomorphic, they’re the same for all practical purposes.

So, in simple rough terms, a categorical theory is one that gives a full description of the mathematical structure it’s talking about.

This makes Dedekind’s result sound like great news. It sounds like Peano’s original second-order axioms for arithmetic completely describe the natural numbers.

However, there’s an important wrinkle. There are many inherently undetermined things about set theory! So in fact, a categorical theory only gives a full description of the mathematical structure it’s talking about relative to a choice of what sets are like.

So, Dedekind’s result just shoves everything mysterious and undetermined about the natural numbers under the carpet: they become mysterious and undetermined things about set theory. This became clear much later, thanks to Gödel and others. And in the process, it became clear that second-order logic is a bit problematic compared to first-order logic.

You see, first-order logic has a set of deduction rules that are:

sound: Every provable sentence holds in every model.

semantically complete: Every sentence that holds in every model is provable.

effective: There is an algorithm that can correctly decide whether any given sequence of symbols is a proof.

Second-order logic does not! It’s ‘too powerful’ to also have all three of these nice properties.

So, these days people often work with a first-order version of Peano’s axioms for arithmetic. Instead of writing down a single axiom for mathematical induction:

$\forall P \; \big[ P(0) \; \& \; \forall n \; P(n) \Rightarrow P(n+1)) \; \; \Rightarrow \;\; \forall n \; (P(n))\big]$

we write down an axiom schema—an infinite list of axioms—with one axiom like this:

$\phi(0) \; \& \; \forall n \; (\phi(n) \Rightarrow \phi(n+1)) \; \; \Rightarrow \;\; \forall n \; (\phi(n))$

for each formula $\phi$ that we can actually write down using the language of arithmetic.

This first-order version of Peano arithmetic is not categorical: it has lots of nonisomorphic models. People often pretend there’s one ‘best’ model: they call it the ‘standard’ natural numbers, and call all the others ‘nonstandard’. But there’s something a bit fishy about this.

Indeed, Gödel’s first incompleteness theorem says there are many statements about natural numbers that can neither be proved nor disproved starting from Peano’s axioms. It follows that for any such statement we can find a model of the Peano axioms in which that statement holds, and also a model in which it does not.

Furthermore, this remains true even if we add any list of extra axioms to Peano arithmetic, as long as there’s some algorithm that can list all these axioms.

So, I’d prefer to say there are many different ‘versions’ of the natural numbers, just as there are many different groups.

We can study these different versions, and it’s a fascinating subject:

• Wikipedia, Nonstandard models of arithmetic.

However, I want to talk about the situation for other number systems!

### The real numbers

The situation is better for the real numbers—at least if we are willing to think about them in a ‘purely algebraic’ way, leaving most analysis behind.

To do this, we can use the theory of a ‘real closed field’. This is a list of axioms, formulated in first-order logic, which describe how $+, \times, 0, 1$ and $\le$ work for the real numbers. You can think of these axioms as consisting of three parts:

• the field axioms: the usual algebraic identities involving $+, \times, 0$ and $1$ together with laws saying that everything has an additive inverse and everything except $0$ has a multiplicative inverse.

• the formally real field axiom, saying that $-1$ is not the square of anything. This implies that we can equip the field with a concept of $\le$ that makes it into an ordered field—but not necessarily in a unique way.

• the real closed field axioms, which says that also for any number $x,$ either $x$ or $-x$ has a square root, and every polynomial of odd degree has a root. Among other things this implies our field can be made into an ordered field in a unique way. To do this, we say $x \le y$ if and only if $y - x$ has a square root.

Tarski showed this theory is complete: any first-order sentence involving only the operations $+, \times, 0, 1$ and the relation $\le$ can either be proved or disproved starting from the above axioms.

Nonetheless, the theory of real closed fields is not categorical: besides the real numbers, there are many other models! These models are all elementarily equivalent: any sentence involving just $+, \times, 0, 1, \le$ and first-order logic that holds in one model holds in all the rest. But these models are not all isomorphic: we can’t get a bijection between them that preserves $+, \times, 0, 1$ and $\le.$

Indeed, only finite-sized mathematical structures can be ‘nailed down’ up to isomorphism by theories in first-order logic. You see, the Löwenheim–Skolem theorem says that if a first-order theory in a countable language has an infinite model, it has at least one model of each infinite cardinality. So, if we’re trying to use this kind of theory to describe an infinitely big mathematical structure, the most we can hope for is that after we specify its cardinality, the axioms completely determine it.

However, the real closed field axioms aren’t even this good. For starters, they have infinitely many nonisomorphic countable models. Here are a few:

• the algebraic real numbers: these are the real numbers that obey polynomial equations with integer coefficients.

• the computable real numbers: these are the real numbers that can be computed to arbitrary precision by a computer program.

• the arithmetical real numbers: these are the numbers definable in the language of arithmetic. More precisely, a real number $x$ is arithmetical if there is a formula $\phi$ in the language of first-order Peano arithmetic, with two free variables, such that

$\displaystyle{ \forall m \; \forall n \; (\frac{m}{n} \le x \; \; \Leftrightarrow \; \; \phi(n,m)) }$

Every computable real number is arithmetical, but not vice versa: just because you can define a real number in the above way does not mean you can actually compute it to arbitrary precision!

And indeed, there are other even bigger countable real closed fields, consisting of real numbers that are definable using more powerful methods, like second-order Peano arithmetic.

We can also get countable real closed fields using tricks like this: take the algebraic real numbers and throw in the number $\pi$ along with just enough other numbers to get a real closed field again. Or, we could throw in both $\pi$ and $e.$ This probably gives a bigger real closed field—but nobody knows, because for all we know, $\pi$ could equal $e$ plus some rational number! Everyone believes this is false, but nobody has proved it.

There are also lots of nonisomorphic uncountable real closed fields, including ones that include the usual real numbers.

For example, we can take the real numbers and throw in an element $\infty$ that is bigger than $1, 2, 3, \dots,$ and so on—and then do what it takes to get another real closed field. This involves throwing in elements like

$-\infty, \; \infty + 1, \; 1/\infty, \; \infty^2, \; \sqrt{\infty}, \; \sqrt{\infty^2 + 17} , \dots$

and so on. So, we get lots of infinities and infinitesimals.

It gets a bit confusing here, trying to figure out what equals what. But there’s another real closed field containing an infinite element that seems easier to manage. It’s called the field of real Puiseux series. These are series of the form

$\sum_{i = k}^\infty a_i z^{i/n}$

where $k$ is any integer, perhaps negative, $n$ is any
positive integer, and the coefficients $a_i$ are real.

What’s $z?$ It’s just a formal variable. But the real Puiseux series are real closed field, and $z$ acts like $1/\infty:$ it’s positive, but smaller than any positive real number.

With considerably more work, we can make up a real closed field that:

• contains the real numbers,

• contains an element $\infty$ bigger than $1,2,3, \dots,$ and

• obeys the transfer principle, which says that a first-order statement phrased in the usual language of set theory holds for the real numbers if and only if it holds for this other number system.

Any real closed field with these properties is called a system of hyperreal numbers. In the 1960s, the logician Abraham Robinson used them to make Leibniz’s old idea of infinitesimals in calculus fully rigorous. The resulting theory is called nonstandard analysis.

So, I hope you see there’s an exciting—or perhaps appalling—diversity of real closed fields. But don’t forget: they’re all elementarily equivalent. If a sentence involving just $+, \times, 0, 1,$ $\le$ and first-order logic holds in any one of these real closed fields, it holds in all of them!

Here the situation looks very different. In second-order logic we can do analysis, because we can quantify over predicates, which allows us to talk about subsets of real numbers. And in second-order logic we can write down a theory of real numbers that’s categorical! It’s called the theory of a Dedekind-complete ordered field. Again, we can group the axioms in three bunches:

• the field axioms: the usual algebraic identities involving $+, \times, 0$ and $1$ together with laws saying that everything has an additive inverse and everything except $0$ has a multiplicative inverse.

• the ordered field axiom, saying there is a total ordering $\le$ such that $x \le y$ and $x' \le y'$ implies $x + x' \le y + y'$ and $x,y \ge 0$ implies $x y \ge 0.$

• the Dedekind completeness axiom, which says that every nonempty subset with an upper bound has a least upper bound. But instead of talking about subsets, we talk about the predicates that hold on those subsets, so we say “for all predicates $P$ such that…”

Because they’re categorical, people often use these axioms to define the real numbers. But because they’re second-order, the problem of many nonisomorphic models has really just been swept under the rug. If we use second-order logic, we won’t have a concept of ‘proof’ that’s sound, semantically complete and effective. And if we use first-order axioms for set theory to explicitly talk about subsets instead of predicates, then our set theory will have many models! Each model will have a version of the real numbers in it that’s unique up to isomorphism… but the versions in different models will be really different.

In fact, there’s a precise sense in which the ‘standard real numbers’ in one model of set theory can be the ‘hyperreals’ in another. This was first shown by Abraham Robinson.

### The complex numbers

I mentioned that when we’re studying an infinite mathematical structure using first-order logic, the best we can hope for is to have one model of each size (up to isomorphism). The real numbers are far from being this nice… but the complex numbers come much closer!

More precisely, say $\kappa$ is some cardinal. A first-order theory describing structure on a single set is called κ-categorical if it has a unique model of cardinality $\kappa.$ And 1965, a logician named Michael Morley showed that if a list of axioms is $\kappa$-categorical for some uncountable $\kappa,$ it’s $\kappa$-categorical for every uncountable $\kappa.$ I haven’t worked my way through the proof, which seems to be full of interesting ideas. But such theories are called uncountably categorical.

A great example is the ‘purely algebraic’ theory of the complex numbers. By this I mean we only write down axioms involving $+, \times, 0$ and $1.$ We don’t include anything about $\le$ this time, nor anything about complex conjugation. You see, if we start talking about complex conjugation we can pick out the real numbers inside the complex numbers, and then we’re more or less back to the story we had for real numbers.

This theory is called the theory of an algebraically closed field of characteristic zero. Yet again, the axioms come in three bunches:

• the field axioms.

• the characteristic zero axioms: these are an infinite list of axioms saying that

$1 \ne 0, \quad 1+1 \ne 0, \quad 1+1+1 \ne 0, \dots$

• the algebraically closed axioms: these say that every non-constant polynomial has a root.

Pretty much any mathematician worth their salt knows that the complex numbers are a model of these axioms, whose cardinality is that of the continuum. There are lots of different countable models: the algebraic complex numbers, the computable complex numbers, and so on. But because the above theory is uncountably categorical, there is exactly one algebraically closed field of characteristic zero of each uncountable cardinality… up to isomorphism.

This implies some interesting things.

For example, we can take the complex numbers, throw in an extra element, and let it freely generate a bigger algebraically closed field. It’s ‘bigger’ in the sense that it contains the complex numbers as a proper subset, indeed a subfield. But since it has the same cardinality as the complex numbers, it’s isomorphic to the complex numbers!

And then, because this ‘bigger’ field is isomorphic to the complex numbers, we can turn this argument around. We can take the complex numbers, remove a lot of carefully chosen elements, and get a subfield that’s isomorphic to the complex numbers.

Or, if we like, we can take the complex numbers, adjoin a really huge set of extra elements, and let them freely generate an algebraically closed field of characteristic zero. The cardinality of this field can be as big as we want. It will be determined up to isomorphism by its cardinality.

One piece of good news is that thanks to a result of Tarski, the theory of an algebraically closed field of characteristic zero is complete, and thus, all its models are elementarily equivalent. In other words, all the same first-order sentences written in the language of $+, \times, 0$ and $1$ hold in every model.

But here’s a piece of strange news.

As I already mentioned, the theory of a real closed field is not uncountably categorical. This implies something really weird. Besides the ‘usual’ real numbers $\mathbb{R}$ we can choose another real closed field $\mathbb{R}',$ not isomorphic to $\mathbb{R},$ with the same cardinality. We can build the complex numbers $\mathbb{C}$ using pairs of real numbers. We can use the same trick to build a field $\mathbb{C}'$ using pairs of guys in $\mathbb{R}'.$ But it’s easy to check that this funny field $\mathbb{C}'$ is algebraically closed and of characteristic zero. Since it has the same cardinality as $\mathbb{C},$ it must be isomorphic to $\mathbb{C}.$

In short, different ‘versions’ of the real numbers can give rise to the same version of the complex numbers!

### References

So, I hope you see that the logical foundations of the real and complex number systems are quite slippery… yet with work, we can understand a lot about this slipperiness.

Besides the references I’ve given, I just want to mention two more. First, here’s a free introductory calculus textbook based on nonstandard analysis:

• H. Jerome Keisler, Elementary Calculus: an Infinitesimal Approach, available as a website or in PDF.

And here’s an expository paper that digs deeper into uncountably categorical theories:

• Nick Ramsey, Morley’s categoricity theorem.

## Exploring Climate Data (Part 1)

1 August, 2014

joint with Dara O Shayda

Emboldened by our experiments in El Niño analysis and prediction, people in the Azimuth Code Project have been starting to analyze weather and climate data. A lot of this work is exploratory, with no big conclusions. But it’s still interesting! So, let’s try some blog articles where we present this work.

This one will be about the air pressure on the island of Tahiti and in a city called Darwin in Australia: how they’re correlated, and how each one varies. This article will also be a quick introduction to some basic statistics, as well as ‘continuous wavelet transforms’.

### Darwin, Tahiti and El Niños

The El Niño Southern Oscillation is often studied using the air pressure in Darwin, Australia versus the air pressure in Tahiti. When there’s an El Niño, it gets stormy in the eastern Pacific so the air temperatures tend to be lower in Tahiti and higher in Darwin. When there’s a La Niña, it’s the other way around:

The Southern Oscillation Index or SOI is a normalized version of the monthly mean air pressure anomaly in Tahiti minus that in Darwin. Here anomaly means we subtract off the mean, and normalized means that we divide by the standard deviation.

So, the SOI tends to be negative when there’s an El Niño. On the other hand, when there’s an El Niño the Niño 3.4 index tends to be positive—this says it’s hotter than usual in a certain patch of the Pacific.

Here you can see how this works:

When the Niño 3.4 index is positive, the SOI tends to be negative, and vice versa!

It might be fun to explore precisely how well correlated they are. You can get the data to do that by clicking on the links above.

But here’s another question: how similar are the air pressure anomalies in Darwin and in Tahiti? Do we really need to take their difference, or are they so strongly anticorrelated that either one would be enough to detect an El Niño?

You can get the data to answer such questions here:

Southern Oscillation Index based upon annual standardization, Climate Analysis Section, NCAR/UCAR. This includes links to monthly sea level pressure anomalies in Darwin and Tahiti, in either ASCII format (click the second two links) or netCDF format (click the first one and read the explanation).

In fact this website has some nice graphs already made, which I might as well show you! Here’s the SOI and also the sum of the air pressure anomalies in Darwin and Tahiti, normalized in some way:

(Click to enlarge.)

If the sum were zero, the air pressure anomalies in Darwin and Tahiti would contain the same information and life would be simple. But it’s not!

How similar in character are the air pressure anomalies in Darwin and Tahiti? There are many ways to study this question. Dara tackled it by taking the air pressure anomaly data from 1866 to 2012 and computing some ‘continuous wavelet transforms’ of these air pressure anomalies. This is a good excuse for explaining how a continuous wavelet transform works.

### Very basic statistics

It helps to start with some very basic statistics. Suppose you have a list of numbers

$x = (x_1, \dots, x_n)$

You probably know how to take their mean, or average. People often write this with angle brackets:

$\displaystyle{ \langle x \rangle = \frac{1}{n} \sum_{i = 1}^n x_i }$

You can also calculate the mean of their squares:

$\displaystyle{ \langle x^2 \rangle = \frac{1}{n} \sum_{i = 1}^n x_i^2 }$

If you were naive you might think $\langle x^2 \rangle = \langle x \rangle^2,$ but in fact we have:

$\langle x^2 \rangle \ge \langle x \rangle^2$

and they’re equal only if all the $x_i$ are the same. The point is that if the numbers $x_i$ are spread out, the squares of the big ones (positive or negative) contribute more to the average of the squares than if we had averaged them out before squaring. The difference

$\langle x^2 \rangle - \langle x \rangle^2$

is called the variance; it says how spread out our numbers are. The square root of the variance is the standard deviation:

$\sigma_x = \sqrt{\langle x^2 \rangle - \langle x \rangle^2 }$

and this has the slight advantage that if you multiply all the numbers $x_i$ by some constant $c,$ the standard deviation gets multiplied by $|c|.$ (The variance gets multiplied by $c^2.$)

We can generalize the variance to a situation where we have two lists of numbers:

$x = (x_1, \dots, x_n)$

$y = (y_1, \dots, y_n)$

Namely, we can form the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle$

This reduces to the variance when $x = y.$ It measures how much $x$ and $y$ vary together — ‘hand in hand’, as it were. A bit more precisely: if $x_i$ is greater than its mean value mainly for $i$ such that $y_i$ is greater than its mean value, the covariance is positive. On the other hand, if $x_i$ tends to be greater than average when $y_i$ is smaller than average — like with the air pressures at Darwin and Tahiti — the covariance will be negative.

For example, if

$x = (1,-1), \quad y = (1,-1)$

then they ‘vary hand in hand’, and the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle = 1 - 0 = 1$

is positive. But if

$x = (1,-1), \quad y = (-1,1)$

then one is positive when the other is negative, so the covariance

$\langle x y \rangle - \langle x \rangle \langle y \rangle = -1 - 0 = -1$

is negative.

Of course the covariance will get bigger if we multiply both $x$ and $y$ by some big number. If we don’t want this effect, we can normalize the covariance and get the correlation:

$\displaystyle{ \frac{ \langle x y \rangle - \langle x \rangle \langle y \rangle }{\sigma_x \sigma_y} }$

which will always be between $-1$ and $1.$

For example, if we compute the correlation between the air pressure anomalies at Darwin and Tahiti, measured monthly from 1866 to 2012, we get
-0.253727. This indicates that when one goes up, the other tends to go down. But since we’re not getting -1, it means they’re not completely locked into a linear relationship where one is some negative number times the other.

Okay, we’re almost ready for continuous wavelet transforms! Here is the main thing we need to know. If the mean of either $x$ or $y$ is zero, the formula for covariance simplifies a lot, to

$\displaystyle{ \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }$

So, this quantity says how much the numbers $x_i$ ‘vary hand in hand’ with the numbers $y_i,$ in the special case when one (or both) has mean zero.

We can do something similar if $x, y : \mathbb{R} \to \mathbb{R}$ are functions of time defined for all real numbers $t.$ The sum becomes an integral, and we have to give up on dividing by $n.$ We get:

$\displaystyle{ \int_{-\infty}^\infty x(t) y(t)\; d t }$

This is called the inner product of the functions $x$ and $y,$ and often it’s written $\langle x, y \rangle,$ but it’s a lot like the covariance.

### Continuous wavelet transforms

What are continuous wavelet transforms, and why should we care?

People have lots of tricks for studying ‘signals’, like series of numbers $x_i$ or functions $x : \mathbb{R} \to \mathbb{R}.$ One method is to ‘transform’ the signal in a way that reveals useful information. The Fourier transform decomposes a signal into sines and cosines of different frequencies. This lets us see how much power the signal has at different frequencies, but it doesn’t reveal how the power at different frequencies changes with time. For that we should use something else, like the Gabor transform explained by Blake Pollard in a previous post.

Sines and cosines are great, but we might want to look for other patterns in a signal. A ‘continuous wavelet transform’ lets us scan a signal for appearances of a given pattern at different times and also at different time scales: a pattern could go by quickly, or in a stretched out slow way.

To implement the continuous wavelet transform, we need a signal and a pattern to look for. The signal could be a function $x : \mathbb{R} \to \mathbb{R}.$ The pattern would then be another function $y: \mathbb{R} \to \mathbb{R},$ usually called a wavelet.

Here’s an example of a wavelet:

If we’re in a relaxed mood, we could call any function that looks like a bump with wiggles in it a wavelet. There are lots of famous wavelets, but this particular one is the fourth derivative of a certain Gaussian. Mathematica calls this particular wavelet DGaussianWavelet[4], and you can look up the formula under ‘Details’ on their webpage.

However, the exact formula doesn’t matter at all now! If we call this wavelet $y,$ all that matters is that it’s a bump with wiggles on it, and that its mean value is 0, or more precisely:

$\displaystyle{ \int_{-\infty}^\infty y(t) \; d t = 0 }$

As we saw in the last section, this fact lets us take our function $x$ and the wavelet $y$ and see how much they ‘vary hand it hand’ simply by computing their inner product:

$\displaystyle{ \langle x , y \rangle = \int_{-\infty}^\infty x(t) y(t)\; d t }$

Loosely speaking, this measures the ‘amount of $y$-shaped wiggle in the function $x$’. It’s amazing how hard it is to say something in plain English that perfectly captures the meaning of a simple formula like the above one—so take the quoted phrase with a huge grain of salt. But it gives a rough intuition.

Our wavelet $y$ happens to be centered at $t = 0.$ However, we might be interested in $y$-shaped wiggles that are centered not at zero but at some other number $s.$ We could detect these by shifting the function $y$ before taking its inner product with $x$:

$\displaystyle{ \int_{-\infty}^\infty x(t) y(t-s)\; d t }$

We could also be interested in measuring the amount of some stretched-out or squashed version of a $y$-shaped wiggle in the function $x.$ Again we could do this by changing $y$ before taking its inner product with $x$:

$\displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t}{P}\right) \; d t }$

When $P$ is big, we get a stretched-out version of $y.$ People sometimes call $P$ the period, since the period of the wiggles in $y$ will be proportional to this (though usually not equal to it).

Finally, we can combine these ideas, and compute

$\displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t- s}{P}\right)\; dt }$

This is a function of the shift $s$ and period $P$ which says how much of the $s$-shifted, $P$-stretched wavelet $y$ is lurking in the function $x.$ It’s a version of the continuous wavelet transform!

Mathematica implements this idea for time series, meaning lists of numbers $x = (x_1,\dots,x_n)$ instead of functions $x : \mathbb{R} \to \mathbb{R}.$ The idea is that we think of the numbers as samples of a function $x$:

$x_1 = x(\Delta t)$

$x_2 = x(2 \Delta t)$

and so on, where $\Delta t$ is some time step, and replace the integral above by a suitable sum. Mathematica has a function ContinuousWaveletTransform that does this, giving

$\displaystyle{ w(s,P) = \frac{1}{\sqrt{P}} \sum_{i = 1}^n x_i \; y\left(\frac{i \Delta t - s}{P}\right) }$

The factor of $1/\sqrt{P}$ in front is a useful extra trick: it’s the right way to compensate for the fact that when you stretch out out your wavelet $y$ by a factor of $P,$ it gets bigger. So, when we’re doing integrals, we should define the continuous wavelet transform of $y$ by:

$\displaystyle{ w(s,P) = \frac{1}{\sqrt{P}} \int_{-\infty}^\infty x(t) y(\frac{t- s}{P})\; dt }$

### The results

Dara Shayda started with the air pressure anomaly at Darwin and Tahiti, measured monthly from 1866 to 2012. Taking DGaussianWavelet[4] as his wavelet, he computed the continuous wavelet transform $w(s,P)$ as above. To show us the answer, he created a scalogram:

This is a 2-dimensional color plot showing roughly how big the continuous wavelet transform $w(s,P)$ is for different shifts $s$ and periods $P.$ Blue means it’s very small, green means it’s bigger, yellow means even bigger and red means very large.

Tahiti gave this:

You’ll notice that the patterns at Darwin and Tahiti are similar in character, but notably different in detail. For example, the red spots, where our chosen wavelet shows up strongly with period of order ~100 months, occur at different times.

Puzzle 1. What is the meaning of the ‘spikes’ in these scalograms? What sort of signal would give a spike of this sort?

Puzzle 2. Do a Gabor transform, also known as a ‘windowed Fourier transform’, of the same data. Blake Pollard explained the Gabor transform in his article Milankovitch vs the Ice Ages. This is a way to see how much a signal wiggles at a given frequency at a given time: we multiply the signal by a shifted Gaussian and then takes its Fourier transform.

Puzzle 3. Read about continuous wavelet transforms. If we want to reconstruct our signal $x$ from its continuous wavelet transform, why should we use a wavelet $y$ with

$\displaystyle{\int_{-\infty}^\infty y(t) \; d t = 0 ? }$

In fact we want a somewhat stronger condition, which is implied by the above equation when the Fourier transform of $y$ is smooth and integrable:

Continuous wavelet transform, Wikipedia.

### Another way to understand correlations

David Tweed mentioned another approach from signal processing to understanding the quantity

$\displaystyle{ \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }$

If we’ve got two lists of data $x$ and $y$ that we want to compare to see if they behave similarly, the first thing we ought to do is multiplicatively scale each one so they’re of comparable magnitude. There are various possibilities for assigning a scale, but a reasonable one is to ensure they have equal ‘energy’

$\displaystyle{ \sum_{i=1}^n x_i^2 = \sum_{i=1}^n y_i^2 }$

(This can be achieved by dividing each list by its standard deviation, which is equivalent to what was done in the main derivation above.) Once we’ve done that then it’s clear that looking at

$\displaystyle{ \sum_{i=1}^n (x_i-y_i)^2 }$

gives small values when they have a very good match and progressively bigger values as they become less similar. Observe that

$\begin{array}{ccl} \displaystyle{\sum_{i=1}^n (x_i-y_i)^2 } &=& \displaystyle{ \sum_{i=1}^n (x_i^2 - 2 x_i y_i + y_i^2) }\\ &=& \displaystyle{ \sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2 } \end{array}$

Since we’ve scaled things so that $\sum_{i=1}^n x_i^2$ and $\sum_{i=1}^n y_i^2$ are constants, we can see that when $\sum_{i=1}^n x_i y_i$ becomes bigger,

$\displaystyle{ \sum_{i=1}^n (x_i-y_i)^2 }$

becomes smaller. So,

$\displaystyle{\sum_{i=1}^n x_i y_i}$

serves as a measure of how close the lists are, under these assumptions.

## Chemical Reaction Network Talks

26 June, 2014

A while ago I blogged about David Soloveichik’s talk at this workshop:

Programming with Chemical Reaction Networks: Mathematical Foundations, Banff International Research Station, 8-13 June 2014.

Now the slides for his talk are available:

• David Soloveichik, U.C. San Francisco, The computational power of chemical reaction networks.

And now I’d like to tell you about three more talks!

The first two are about ways one chemical reaction network can simulate another. This is important for a couple of reasons. First, in biology, a bunch of different chemical reactions can ‘accomplish the same task’—and we’d like to make this idea precise. That’s what Luca Cardelli spoke about. Second, people trying to do computation with chemistry are starting to simulate quite general reactions using DNA! That’s what Sheung Woo Shin spoke about.

### Luca Cardelli

Luca Cardelli was at Oxford when I was visiting this spring, but unfortunately I didn’t meet him! He works for Microsoft on the interface of biology and computation. At Banff, he talked about ways one chemical reaction network can simulate another. His slides are here:

• Luca Cardelli, Morphisms of reaction networks.

He has a paper that gives a more detailed explanation of these ideas:

• Luca Cardelli, Morphisms of reaction networks that couple structure to function.

Here is my own disorganized explanation… with lots of informative but confusing digressions. A population protocol is a chemical reaction with only 2-in, 2-out reactions. For example, this paper presents a population protocol that does ‘approximate majority detection':

• Dana Angluin, James Aspnes, and David Eisenstat, A simple population protocol for fast robust approximate majority, Distributed Computing 21 (2008), 87–102.

What’s the idea? We start with two kinds of molecules, say $x$’s and $y$’s, and we want to see which one is in the majority, so we run these chemical reactions:

$x + y \to x + b$

$x + y \to y + b$

$x + b \to 2x$

$y + b \to 2y$

See? All the reactions have 2 molecules going in and 2 going out. The $b$ molecules act as ‘undecided voters’ who become either an $x$ or a $y,$ depending on who they meet first.

If we start with about $n$ molecules, in $O(n \log n)$ time these reactions are very likely to convert all $x$’s and $y$’s to whatever kind of molecule was in the majority initially… at least if the gap in the number of $x$’s and $y$’s is big enough.

Here’s another population protocol that also does the job:

$x + y \to 2b$

$x + b \to 2x$

$y + b \to 2y$

And here’s a proof that one of these algorithms actually works—most of the time, when the initial difference in populations is big enough:

• Etienne Perron, Dinkar Vasudevan, and Milan Vojonvic, Using three states for binary consensus on complete graphs, Technical Report, MSR-TR-2008-114, Microsoft, September 2008.

If we use a discrete-time formalism to describe the dynamics, the proof seems to get harder. See the paper by Angluin, Aspnes, and Eisenstat for the only known proof!

Anyway, Luca Cardelli is interested in chemical reaction networks actually found in biology. This approximate majority algorithm is seen quite clearly in a certain biological system: a certain ‘epigenetic switch’. However, it is usually ‘obfuscated’ or ‘encrypted': hidden in a bigger, more complicated chemical reaction network. For example, see:

• Luca Cardelli and Attila Csikász-Nagy, The cell cycle switch is approximate majority obfuscated, Scientific Reports 2 (2012).

This got him interested in developing a theory of morphisms between reaction networks, which could answer questions like: When can one CRN emulate another? But these questions turn out to be much easier if we use the rate equation than with the master equation. So, he asks: when can one CRN give the same rate equation as another?

He found a sufficient condition that’s ‘purely syntactic': you can tell if it holds by looking at the reaction networks, regardless of the rate constants.

Here’s the idea. We say one network emulates another if for any rate constants of the second, we can find rate constants for the first that makes its rate equation have solutions exactly mimicking that of the second, but where several species in the first correspond to one in the second.

For this to make sense, we assume there is a map sending:

• species to species
• reactions to reactions

In a chemical reaction network homomorphism, the map on reactions is determined by the map on species in the obvious way. For example, if species $A$ is sent to $f(A)$ and species $B$ is sent to $f(B)$ then the reaction

$2A + B \to 3B$

is sent to the reaction

$2f(A) + f(B) \to 3 f(B)$

In this situation, to make the first network emulate the second, we need to set equal the initial concentrations of all species in the inverse image of a given species.

A reactant homomorphism from one chemical reaction network to another is more general: it sends species to species, and for any reaction in the first chemical reaction network with input

$A + B + C \cdots$

there’s a reaction in the second with input

$f(A) + f(B) + f(C) + \cdots$

(Reactant is another name for input.)

A stoichiomorphism is a kind of morphism that takes rate constants into account. See Cardelli’s paper for the definition.

The main theorem: given a stoichiomorphism from one chemical reaction network to another that’s also a reactant homomorphism, then the first emulates the second.

For a better explanation, read his paper! Here’s a cool picture from his paper showing a bunch of chemical reaction networks including the approximate majority network (labelled AM), many of which show up in biology, and morphisms between them:

Click to enlarge! These chemical reaction networks are drawn in a special style: as influence networks, consisting of ‘gates’ where process activates or deactivates another. Each gate is a chemical reaction network of a certain form, schematically like this:

$\mathrm{off} \leftrightarrow \mathrm{intermediate} \leftrightarrow \mathrm{on}$

where the forward reactions are catalyzed by one chemical and the reverse reactions are catalyzed by another. A gate is like a switch that can be turned on or off.

While listening to this talk, I thought the way in which one CRN emulates another in Cardelli’s formalism looks awfully similar to the way one dynamical system emulates another in Eugene Lerman’s formalism:

• Eugene Lerman, Networks of dynamical systems, Azimuth, 18 March 2014.

The following picture from Cardelli’s paper shows that one of his morphisms of reaction networks is like ‘covering map’. This reminds me a lot of what’s happening in Lerman’s work.

Again, click to enlarge!

### Seung Woo Shin

Seung Woo Shin was actually Brendan Fong’s roommate at the National University of Singapore while Brendan was working with me on chemical reaction networks. Apparently they never talked about their work!

Shin spoke about some other concepts of ‘morphism’ between chemical reaction networks. These other concepts do not involve reaction rates, just which chemicals can turn into which. You can see his slides here:

• Seung Woo Shin, Verifying CRN implementations.

and read his thesis for more details:

• Seung Woo Shin, Compiling and verifying DNA-based chemical reaction network implementations, Masters thesis, Caltech, 2012.

Abstract: One goal of molecular programming and synthetic biology is to build chemical circuits that can control chemical processes at the molecular level. Remarkably, it has been shown that synthesized DNA molecules can be used to construct complex chemical circuits that operate without any enzyme or cellular component. However, designing DNA molecules at the individual nucleotide base level is often difficult and laborious, and thus chemical reaction networks (CRNs) have been proposed as a higher-level programming language. So far, several general-purpose schemes have been described for designing synthetic DNA molecules that simulate the behavior of arbitrary CRNs, and many more are being actively investigated.

Here, we solve two problems related to this topic. First, we present a general-purpose CRN-to-DNA compiler that can apply user-defined compilation schemes for translating formal CRNs to domain-level specifications for DNA molecules. In doing so, we develop a language in which such schemes can be concisely and precisely described. This compiler can greatly reduce the amount of tedious manual labor faced by researchers working in the field. Second, we present a general method for the formal verification of the correctness of such compilation. We first show that this problem reduces to testing a notion of behavioral equivalence between two CRNs, and then we construct a mathematical formalism in which that notion can be precisely defined. Finally, we provide algorithms for testing that notion. This verification process can be thought of as an equivalent of model checking in molecular computation, and we hope that the generality of our verification techniques will eventually allow us to apply them not only to DNA-based CRN implementations but to a wider class of molecular programs.

His thesis built on this earlier paper:

• David Soloveichik, Georg Seelig and Erik Winfree, DNA as a universal substrate for chemical kinetics, Proceedings of the National Academy of Sciences (2010).

I think this work is fascinating and deeply related to category theory, so I talked to Shin and Winfree about it, and this is what we came up with:

CRN equivalences: progress report.

This is one of several reports on progress people at the workshop made on various open problems.

### David Anderson

Brendan Fong and I wrote about David Anderson’s work in Part 9 of the network theory series. It’s so impressive that I expected him to be older… older than me, I guess. He’s not!

In his tutorial, he gave an overview of chemical reaction networks with an emphasis on the deficiency zero theorem. Since many people were puzzled by the ‘deficiency’ concept, they asked lots of questions. But I’ve already explained that idea in Part 21. So, I’ll just mention a couple of cool theorems he told us about!

Theorem (Horn and Jackson). If a reaction network has a complex balanced equilibrium, then:

1. It has no equilibria that are not complex balanced.

2. The reaction network must be weakly reversible.

3. Every stochiometric compatibility class contains precisely one complex balanced equilibrium.

I should have known this, since this work is classic. But I don’t think I knew that the existence of one complex balanced equilibrium implied all this stuff!

He also mentioned this paper:

• Guy Shinar and Martin Feinberg, Structural sources of robustness in biochemical reaction networks, Science (2010).

which contains this amazing theorem:

Theorem (Shinar and Feinberg). Suppose there is a chemical reaction network such that:

1. its deficiency equals one;

2. it has a positive steady state;

3. it has two “non-terminal complexes” that differ only in one species $S.$ (“Non-terminal” is a concept that’s easier to explain with a picture of a reaction network).

Then the species $S$ is absolutely robust: with any initial conditions, the rate equation will approach an equilibrium where the concentration of $S$ approaches a specific fixed value $c,$ independent of the initial conditions!

However, things work very differently if we treat the system stochastically, using the master equation:

• David F. Anderson, German A. Enciso and Matthew D. Johnston, Stochastic analysis of biochemical reaction networks with absolute concentration robustness.

### More

A lot more happened at this workshop! There was a huge amount of discussion of the leader election problem, which is about how to cook up chemical reactions that create a ‘leader': a single molecule of some sort.

Leader election: the problem, and references.