Network Theory Seminar (Part 4)

5 November, 2014

 

Since I was in Banff, my student Franciscus Rebro took over this week and explained more about cospan categories. These are a tool for constructing categories where the morphisms are networks such as electrical circuit diagrams, signal flow diagrams, Markov processes and the like. For some more details see:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

Cospan categories are really best thought of as bicategories, and Franciscus gets into this aspect too.


Network Theory (Part 33)

4 November, 2014

Last time I came close to describing the ‘black box functor’, which takes an electrical circuit made of resistors

and sends it to its behavior as viewed from outside. From outside, all you can see is the relation between currents and potentials at the ‘terminals’—the little bits of wire that poke out of the black box:

I came close to defining the black box functor, but I didn’t quite make it! This time let’s finish the job.

The categories in question

The black box functor

\blacksquare : \mathrm{ResCirc} \to \mathrm{LinRel}

goes from the category \mathrm{ResCirc}, where morphisms are circuits made of resistors, to the category \mathrm{LinRel}, where morphisms are linear relations. Let me remind you how these categories work, and introduce a bit of new notation.

Here is the category \mathrm{ResCirc}:

• an object is a finite set;

• a morphism from X to Y is an isomorphism class of cospans

in the category of graphs with edges labelled by resistances: numbers in (0,\infty). Here we think of the finite sets X and Y as graphs with no edges. We call X the set of inputs and Y the set of outputs.

• we compose morphisms in \mathrm{ResCirc} by composing isomorphism classes of cospans.

And here is the category \mathrm{LinRel}:

• an object is a finite-dimensional real vector space;

• a morphism from U to V is a linear relation R : U \leadsto V, meaning a linear subspace R \subseteq U \times V;

• we compose a linear relation R \subseteq U \times V and a linear relation S \subseteq V \times W in the usual way we compose relations, getting:

SR = \{(u,w) \in U \times W : \; \exists v \in V \; (u,v) \in R \mathrm{\; and \;} (v,w) \in S \}

In case you’re wondering: I’ve just introduced the wiggly arrow notation

R : U \leadsto V

for a linear relation from U to V, because it suggests that a relation is a bit like a function but more general. Indeed, a function is a special case of a relation, and composing functions is a special case of composing relations.

The black box functor

Now, how do we define the black box functor?

Defining it on objects is easy. An object of \mathrm{ResCirc} is a finite set S, and we define

\blacksquare{S} = \mathbb{R}^S \times \mathbb{R}^S

The idea is that S could be a set of inputs or outputs, and then

(\phi, I) \in \mathbb{R}^S \times \mathbb{R}^S

is a list of numbers: the potentials and currents at those inputs or outputs.

So, the interesting part is defining the black box functor on morphisms!

For this we start with a morphism in \mathrm{ResCirc}:

The labelled graph \Gamma consists of:

• a set N of nodes,

• a set E of edges,

• maps s, t : E \to N sending each edge to its source and target,

• a map r : E \to (0,\infty) sending each edge to its resistance.

The cospan gives maps

i: X \to N, \qquad o: Y \to N

These say how the inputs and outputs are interpreted as nodes in the circuit. We’ll call the nodes that come from inputs or outputs ‘terminals’. So, mathematically,

T = \mathrm{im}(i) \cup \mathrm{im}(o) \subseteq N

is the set of terminals: the union of the images of i and o.

In the simplest case, the maps i and o are one-to-one, with disjoint ranges. Then each terminal either comes from a single input, or a single output, but not both! This is a good picture to keep in mind. But some subtleties arise when we leave this simplest case and consider other cases.

Now, the black box functor is supposed to send our circuit to a linear relation. I’ll call the circuit \Gamma for short, though it’s really the whole cospan

So, our black box functor is supposed to send this circuit to a linear relation

\blacksquare(\Gamma) : \mathbb{R}^X \times \mathbb{R}^X \leadsto \mathbb{R}^Y \times \mathbb{R}^Y

This is a relation between the potentials and currents at the input terminals and the potentials and currents at the output terminals! How is it defined?

I’ll start by outlining how this works.

First, our circuit picks out a subspace

dQ \subseteq \mathbb{R}^T \times \mathbb{R}^T

This is the subspace of allowed potentials and currents on the terminals. I’ll explain this and why it’s called dQ a bit later. Briefly, it comes from the principle of minimum power, described last time.

Then, the map

i: X \to T

gives a linear relation

S(i) : \mathbb{R}^X \times \mathbb{R}^X \leadsto \mathbb{R}^T \times \mathbb{R}^T

This says how the potentials and currents at the inputs are related to those at the terminals. Similarly, the map

o: Y \to T

gives a linear relation

S(o) : \mathbb{R}^Y \times \mathbb{R}^Y \leadsto \mathbb{R}^T \times \mathbb{R}^T

This says how the potentials and currents at the outputs are related to those at the terminals.

Next, we can ‘turn around’ any linear relation

R : \mathbb{R}^Y \times \mathbb{R}^Y \leadsto \mathbb{R}^T \times \mathbb{R}^T

to get a relation

R^\dagger : \mathbb{R}^T \times \mathbb{R}^T  \leadsto \mathbb{R}^Y \times \mathbb{R}^Y

defined by

R^\dagger = \{(\phi',-I',\phi,-I) : (\phi, I, \phi', I') \in R \}

Here we are just switching the input and output potentials, but when we switch the currents we also throw in a minus sign. The reason is that we care about the current flowing in to an input, but out of an output.

Finally, one more trick: given a linear subspace

L \subseteq V

of a vector space V we get a linear relation

1|_L : V \leadsto V

called the identity restricted to L, defined like this:

1|_L = \{ (v, v) :\; v \in L \} \subseteq V \times V

If L is all of V this relation is actually the identity function on V. Otherwise it’s a partially defined function that’s defined only on L, and is the identity there. (A partially defined function is an example of a relation.) My notation 1|_L is probably bad, but I don’t know a better one, so bear with me.

Let’s use all these ideas to define

\blacksquare(\Gamma) : \mathbb{R}^X \times \mathbb{R}^X \leadsto \mathbb{R}^Y \times \mathbb{R}^Y

To do this, we compose three linear relations:

1) We start with

S(i) : \mathbb{R}^X \times \mathbb{R}^X \leadsto \mathbb{R}^T \times \mathbb{R}^T

2) We compose this with

1|_{dQ} : \mathbb{R}^T \times \mathbb{R}^T \leadsto \mathbb{R}^T \times \mathbb{R}^T

3) Then we compose this with

S(o)^\dagger : \mathbb{R}^T \times \mathbb{R}^T \leadsto \mathbb{R}^Y \times \mathbb{R}^Y

Note that:

1) says how the potentials and currents at the inputs are related to those at the terminals,

2) picks out which potentials and currents at the terminals are actually allowed, and

3) says how the potentials and currents at the terminals are related to those at the outputs.

So, I hope all makes sense, at least in some rough way. In brief, here’s the formula:

\blacksquare(\Gamma) = S(o)^\dagger \; 1|_{dQ} \; S(i)

Now I just need to fill in some details. First, how do we define S(i) and S(o)? They work exactly the same way, by ‘copying potentials and adding currents’, so I’ll just talk about one. Second, how do we define the subspace dQ? This uses the principle of minimum power.

Duplicating potentials and adding currents

Any function between finite sets

i: X \to T

gives a linear map

i^* : \mathbb{R}^T \to \mathbb{R}^X

Mathematicians call this linear map the pullback along i, and for any \phi \in \mathbb{R}^T it’s defined by

i^*(\phi)(x) = \phi(i(x))

In our application, we think of \phi as a list of potentials at terminals. The function i could map a bunch of inputs to the same terminal, and the above formula says the potential at this terminal gives the potential at all those inputs. So, we are copying potentials.

We also get a linear map going the other way:

i_* : \mathbb{R}^X \to \mathbb{R}^T

Mathematicians call this the pushforward along i, and for any I \in \mathbb{R}^X it’s defined by

\displaystyle{ i_*(I)(t) = \sum_{x \; : \; i(x) = t } I(x) }

In our application, we think of I as a list of currents entering at some inputs. The function i could map a bunch of inputs to the same terminal, and the above formula says the current at this terminal is the sum of the currents at all those inputs. So, we are adding currents.

Putting these together, our map

i : X \to T

gives a linear relation

S(i) : \mathbb{R}^X \times \mathbb{R}^X \leadsto \mathbb{R}^T \times \mathbb{R}^T

where the pair (\phi, I) \in \mathbb{R}^X \times \mathbb{R}^X is related to the pair (\phi', I') \in \mathbb{R}^T \times \mathbb{R}^T iff

\phi = i^*(\phi')

and

I' = i_*(I)

So, here’s the rule of thumb when attaching the points of X to the input terminals of our circuit: copy potentials, but add up currents. More formally:

\begin{array}{ccl} S(i) &=& \{ (\phi, I, \phi', I') : \; \phi = i^*(\phi') , \; I' = i_*(I) \}  \\ \\  &\subseteq& \mathbb{R}^X \times \mathbb{R}^X \times \mathbb{R}^T \times \mathbb{R}^T \end{array}

The principle of minimum power

Finally, how does our circuit define a subspace

dQ \subseteq \mathbb{R}^T \times \mathbb{R}^T

of allowed potential-current pairs at the terminals? The trick is to use the ideas we discussed last time. If we know the potential at all nodes of our circuit, say \phi \in \mathbb{R}^N, we know the power used by the circuit:

P(\phi) = \displaystyle{ \sum_{e \in E} \frac{1}{r_e} \big(\phi(s(e)) - \phi(t(e))\big)^2 }

We saw last time that if we fix the potentials at the terminals, the circuit will choose potentials at the other nodes to minimize this power. We can describe the potential at the terminals by

\psi \in \mathbb{R}^T

So, the power for a given potential at the terminals is

Q(\psi) = \displaystyle{ \frac{1}{2} \min_{\phi \in \mathbb{R}^N \; : \; \phi|_T = \psi} \sum_{e \in E} \frac{1}{r_e} \big(\phi(s(e)) - \phi(t(e))\big)^2 }

Actually this is half the power: I stuck in a factor of 1/2 for some reason we’ll soon see. This Q is a quadratic function

Q : \mathbb{R}^T \to \mathbb{R}

so its derivative is linear. And, our work last time showed something interesting: to compute the current J_x flowing into a terminal x \in T, we just differentiate Q with respect to the potential at that terminal:

\displaystyle{ J_x = \frac{\partial Q(\psi)}{\partial \psi_x} }

This is the reason for the 1/2: when we take the derivative of Q, we bring down a 2 from differentiating all those squares, and to make that go away we need a 1/2.

The space of allowed potential-current pairs at the terminals is thus the linear subspace

dQ = \{ (\psi, J) : \; \displaystyle{ J_x = \frac{\partial Q(\psi)}{\partial \psi_x} \}  \subseteq \mathbb{R}^T \times \mathbb{R}^T }

And this completes our precise description of the black box functor!

The hard part is this:

Theorem. \blacksquare : \mathrm{ResCirc} \to \mathrm{LinRel} is a functor.

In other words, we have to prove that it preserves composition:

\blacksquare(fg) = \blacksquare(f) \blacksquare(g)

For that, read our paper:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.


Network Theory (Part 32)

20 October, 2014

Okay, today we will look at the ‘black box functor’ for circuits made of resistors. Very roughly, this takes a circuit made of resistors with some inputs and outputs:

and puts a ‘black box’ around it:

forgetting the internal details of the circuit and remembering only how the it behaves as viewed from outside. As viewed from outside, all the circuit does is define a relation between the potentials and currents at the inputs and outputs. We call this relation the circuit’s behavior. Lots of different choices of the resistances R_1, \dots, R_6 would give the same behavior. In fact, we could even replace the whole fancy circuit by a single edge with a single resistor on it, and get a circuit with the same behavior!

The idea is that when we use a circuit to do something, all we care about is its behavior: what it does as viewed from outside, not what it’s made of.

Furthermore, we’d like the behavior of a system made of parts to depend in a simple way on the external behaviors of its parts. We don’t want to have to ‘peek inside’ the parts to figure out what the whole will do! Of course, in some situations we do need to peek inside the parts to see what the whole will do. But in this particular case we don’t—at least in the idealization we are considering. And this fact is described mathematically by saying that black boxing is a functor.

So, how do circuits made of resistors behave? To answer this we first need to remember what they are!

Review

Remember that for us, a circuit made of resistors is a mathematical structure like this:

It’s a cospan where:

\Gamma is a graph labelled by resistances. So, it consists of a finite set N of nodes, a finite set E of edges, two functions

s, t : E \to N

sending each edge to its source and target nodes, and a function

r : E \to (0,\infty)

that labels each edge with its resistance.

i: I \to \Gamma is a map of graphs labelled by resistances, where I has no edges. A labelled graph with no edges has nothing but nodes! So, the map i is just a trick for specifying a finite set of nodes called inputs and mapping them to N. Thus i picks out some nodes of \Gamma and declares them to be inputs. (However, i may not be one-to-one! We’ll take advantage of that subtlety later.)

o: O \to \Gamma is another map of graphs labelled by resistances, where O again has no edges, and we call its nodes outputs.

The principle of minimum power

So what does a circuit made of resistors do? This is described by the principle of minimum power.

Recall from Part 27 that when we put it to work, our circuit has a current I_e flowing along each edge e \in E. This is described by a function

I: E \to \mathbb{R}

It also has a voltage across each edge. The word ‘across’ is standard here, but don’t worry about it too much; what matters is that we have another function

V: E \to \mathbb{R}

describing the voltage V_e across each edge e.

Resistors heat up when current flows through them, so they eat up electrical power and turn this power into heat. How much? The power is given by

\displaystyle{ P = \sum_{e \in E} I_e V_e }

So far, so good. But what does it mean to minimize power?

To understand this, we need to manipulate the formula for power using the laws of electrical circuits described in Part 27. First, Ohm’s law says that for linear resistors, the current is proportional to the voltage. More precisely, for each edge e \in E,

\displaystyle{ I_e = \frac{V_e}{r_e} }

where r_e is the resistance of that edge. So, the bigger the resistance, the less current flows: that makes sense. Using Ohm’s law we get

\displaystyle{ P = \sum_{e \in E} \frac{V_e^2}{r_e} }

Now we see that power is always nonnegative! Now it makes more sense to minimize it. Of course we could minimize it simply by setting all the voltages equal to zero. That would work, but that would be boring: it gives a circuit with no current flowing through it. The fun starts when we minimize power subject to some constraints.

For this we need to remember another law of electrical circuits: a spinoff of Kirchhoff’s voltage law. This says that we can find a function called the potential

\phi: N \to \mathbb{R}

such that

V_e = \phi_{s(e)} - \phi_{t(e)}

for each e \in E. In other words, the voltage across each edge is the difference of potentials at the two ends of this edge.

Using this, we can rewrite the power as

\displaystyle{ P = \sum_{e \in E} \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)})^2 }

Now we’re really ready to minimize power! Our circuit made of resistors has certain nodes called terminals:

T \subseteq N

These are the nodes that are either inputs or outputs. More precisely, they’re the nodes in the image of

i: I \to \Gamma

or

o: O \to \Gamma

The principle of minimum power says that:

If we fix the potential \phi on all terminals, the potential at other nodes will minimize the power

\displaystyle{ P(\phi) = \sum_{e \in E} \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)})^2 }

subject to this constraint.

This should remind you of all the other minimum or maximum principles you know, like the principle of least action, or the way a system in thermodynamic equilibrium maximizes its entropy. All these principles—or at least, most of them—are connected. I could talk about this endlessly. But not now!

Now let’s just use the principle of minimum power. Let’s see what it tells us about the behavior of an electrical circuit.

Let’s imagine changing the potential \phi by adding some multiple of a function

\psi: N \to \mathbb{R}

If this other function vanishes at the terminals:

\forall n \in T \; \; \psi(n) = 0

then \phi + x \psi doesn’t change at the terminals as we change the number x.

Now suppose \phi obeys the principle of minimum power. In other words, supposes it minimizes power subject to the constraint of taking the values it does at the terminals. Then we must have

\displaystyle{ \frac{d}{d x} P(\phi + x \psi)\Big|_{x = 0} }

whenever

\forall n \in T \; \; \psi(n) = 0

This is just the first derivative test for a minimum. But the converse is true, too! The reason is that our power function is a sum of nonnegative quadratic terms. Its graph will look like a paraboloid. So, the power has no points where its derivative vanishes except minima, even when we constrain \phi by making it lie on a linear subspace.

We can go ahead and start working out the derivative:

\displaystyle{ \frac{d}{d x} P(\phi + x \psi)! = ! \frac{d}{d x} \sum_{e \in E} \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)} + x(\psi_{s(e)} -\psi_{t(e)}))^2  }

To work out the derivative of these quadratic terms at x = 0, we only need to keep the part that’s proportional to x. The rest gives zero. So:

\begin{array}{ccl} \displaystyle{ \frac{d}{d t} P(\phi + x \psi)\Big|_{x = 0} } &=& \displaystyle{ \frac{d}{d x} \sum_{e \in E} \frac{x}{r_e} (\phi_{s(e)} - \phi_{t(e)}) (\psi_{s(e)} - \psi_{t(e)}) \Big|_{x = 0} } \\ \\  &=&   \displaystyle{  \sum_{e \in E} \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) (\psi_{s(e)} - \psi_{t(e)}) }  \end{array}

The principle of minimum power says this is zero whenever \psi : N \to \mathbb{R} is a function that vanishes at terminals. By linearity, it’s enough to consider functions \psi that are zero at every node except one node n that is not a terminal. By linearity we can also assume \psi(n) = 1.

Given this, the only nonzero terms in the sum

\displaystyle{ \sum_{e \in E} \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) (\psi_{s(e)} - \psi_{t(e)}) }

will be those involving edges whose source or target is n. We get

\begin{array}{ccc} \displaystyle{ \frac{d}{d x} P(\phi + x \psi)\Big|_{x = 0} } &=& \displaystyle{ \sum_{e: \; s(e) = n}  \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)})}  \\  \\        && -\displaystyle{ \sum_{e: \; t(e) = n}  \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) }   \end{array}

So, the principle of minimum power says precisely

\displaystyle{ \sum_{e: \; s(e) = n}  \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) = \sum_{e: \; t(e) = n}  \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) }

for all nodes n that aren’t terminals.

What does this mean? You could just say it’s a set of linear equations that must be obeyed by the potential \phi. So, the principle of minimum power says that fixing the potential at terminals, the potential at other nodes must be chosen in a way that obeys a set of linear equations.

But what do these equations mean? They have a nice meaning. Remember, Kirchhoff’s voltage law says

V_e = \phi_{s(e)} - \phi_{t(e)}

and Ohm’s law says

\displaystyle{ I_e = \frac{V_e}{r_e} }

Putting these together,

\displaystyle{ I_e = \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) }

so the principle of minimum power merely says that

\displaystyle{ \sum_{e: \; s(e) = n} I_e = \sum_{e: \; t(e) = n}  I_e }

for any node n that is not a terminal.

This is Kirchhoff’s current law: for any node except a terminal, the total current flowing into that node must equal the total current flowing out! That makes a lot of sense. We allow current to flow in or out of our circuit at terminals, but ‘inside’ the circuit charge is conserved, so if current flows into some other node, an equal amount has to flow out.

In short: the principle of minimum power implies Kirchoff’s current law! Conversely, we can run the whole argument backward and derive the principle of minimum power from Kirchhoff’s current law. (In both the forwards and backwards versions of this argument, we use Kirchhoff’s voltage law and Ohm’s law.)

When the node n is a terminal, the quantity

\displaystyle{  \sum_{e: \; s(e) = n} I_e \; - \; \sum_{e: \; t(e) = n}  I_e }

need not be zero. But it has an important meaning: it’s the amount of current flowing into that terminal!

We’ll call this I_n, the current at the terminal n \in T. This is something we can measure even when our circuit has a black box around it:

So is the potential \phi_n at the terminal n. It’s these currents and potentials at terminals that matter when we try to describe the behavior of a circuit while ignoring its inner workings.

Black boxing

Now let me quickly sketch how black boxing becomes a functor.

A circuit made of resistors gives a linear relation between the potentials and currents at terminals. A relation is something that can hold or fail to hold. A ‘linear’ relation is one defined using linear equations.

A bit more precisely, suppose we choose potentials and currents at the terminals:

\psi : T \to \mathbb{R}

J : T \to \mathbb{R}

Then we seek potentials and currents at all the nodes and edges of our circuit:

\phi: N \to \mathbb{R}

I : E \to \mathbb{R}

that are compatible with our choice of \psi and J. Here compatible means that

\psi_n = \phi_n

and

J_n = \displaystyle{  \sum_{e: \; s(e) = n} I_e \; - \; \sum_{e: \; t(e) = n}  I_e }

whenever n \in T, but also

\displaystyle{ I_e = \frac{1}{r_e} (\phi_{s(e)} - \phi_{t(e)}) }

for every e \in E, and

\displaystyle{  \sum_{e: \; s(e) = n} I_e \; = \; \sum_{e: \; t(e) = n}  I_e }

whenever n \in N - T. (The last two equations combine Kirchoff’s laws and Ohm’s law.)

There either exist I and \phi making all these equations true, in which case we say our potentials and currents at the terminals obey the relation… or they don’t exist, in which case we say the potentials and currents at the terminals don’t obey the relation.

The relation is clearly linear, since it’s defined by a bunch of linear equations. With a little work, we can make it into a linear relation between potentials and currents in

\mathbb{R}^I \oplus \mathbb{R}^I

and potentials and currents in

\mathbb{R}^O \oplus \mathbb{R}^O

Remember, I is our set of inputs and O is our set of outputs.

In fact, this process of getting a linear relation from a circuit made of resistors defines a functor:

\blacksquare : \mathrm{ResCirc} \to \mathrm{LinRel}

Here \mathrm{ResCirc} is the category where morphisms are circuits made of resistors, while \mathrm{LinRel} is the category where morphisms are linear relations.

More precisely, here is the category \mathrm{ResCirc}:

• an object of \mathrm{ResCirc} is a finite set;

• a morphism from I to O is an isomorphism class of circuits made of resistors:

having I as its set of inputs and O as its set of outputs;

• we compose morphisms in \mathrm{ResCirc} by composing isomorphism classes of cospans.

(Remember, circuits made of resistors are cospans. This lets us talk about isomorphisms between them. If you forget the how isomorphism between cospans work, you can review it in Part 31.)

And here is the category \mathrm{LinRel}:

• an object of \mathrm{LinRel} is a finite-dimensional real vector space;

• a morphism from U to V is a linear relation R \subseteq U \times V, meaning a linear subspace of the vector space U \times V;

• we compose a linear relation R \subseteq U \times V and a linear relation S \subseteq V \times W in the usual way we compose relations, getting:

SR = \{(u,w) \in U \times W : \; \exists v \in V \; (u,v) \in R \mathrm{\; and \;} (v,w) \in S \}

Next steps

So far I’ve set up most of the necessary background but not precisely defined the black boxing functor

\blacksquare : \mathrm{ResCirc} \to \mathrm{LinRel}

There are some nuances I’ve glossed over, like the difference between inputs and outputs as elements of I and O and their images in N. If you want to see the precise definition and the proof that it’s a functor, read our paper:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

The proof is fairly long: there may be a much quicker one, but at least this one has the virtue of introducing a lot of nice ideas that will be useful elsewhere.

Next time I’ll define the black box functor more carefully.


Network Theory Seminar (Part 2)

16 October, 2014

 

This time I explain more about how ‘cospans’ represent gadgets with two ends, an input end and an output end:

I describe how to glue such gadgets together by composing cospans. We compose cospans using a category-theoretic construction called a ‘pushout’, so I also explain pushouts. At the end, I explain how this gives us a category where the morphisms are electrical circuits made of resistors, and sketch what we’ll do next: study the behavior of these circuits.

These lecture notes provide extra details:

Network theory (part 31).


El Niño Project (Part 8)

14 October, 2014

So far we’ve rather exhaustively studied a paper by Ludescher et al which uses climate networks for El Niño prediction. This time I’d like to compare another paper:

• Y. Berezin, Avi Gozolchiani, O. Guez and Shlomo Havlin, Stability of climate networks with time, Scientific Reports 2 (2012).

Some of the authors are the same, and the way they define climate networks is very similar. But their goal here is different: they want to see see how stable climate networks are over time. This is important, since the other paper wants to predict El Niños by changes in climate networks.

They divide the world into 9 zones:

For each zone they construct several climate networks. Each one is an array of numbers W_{l r}^y, one for each year y and each pair of grid points l, r in that zone. They call W_{l r}^y a link strength: it’s a measure of how how correlated the weather is at those two grid points during that year.

I’ll say more later about how they compute these link strengths. In Part 3 we explained one method for doing it. This paper uses a similar but subtly different method.

The paper’s first big claim is that W_{l r}^y doesn’t change much from year to year, “in complete contrast” to the pattern of local daily air temperature and pressure fluctuations. In simple terms: the strength of the correlation between weather at two different points tends to be quite stable.

Moreover, the definition of link strength involves an adjustable time delay, \tau. We can measure the correlation between the weather at point l at any given time and point r at a time \tau days later. The link strength is computed by taking a maximum over time delays \tau. Naively speaking, the value of \tau that gives the maximum correlation is “how long it typically takes for weather at point l to affect weather at point r”. Or the other way around, if \tau is negative.

This is a naive way of explaining the idea, because I’m mixing up correlation with causation. But you get the idea, I hope.

Their second big claim is that when the link strength between two points l and r is big, the value of \tau that gives the maximum correlation doesn’t change much from year to year. In simple terms: if the weather at two locations is strongly correlated, the amount of time it takes for weather at one point to reach the other point doesn’t change very much.

The data

How do Berezin et al define their climate network?

They use data obtained from here:

NCEP-DOE Reanalysis 2.

This is not exactly the same data set that Ludescher et al use, namely:

NCEP/NCAR Reanalysis 1.

“Reanalysis 2″ is a newer attempt to reanalyze and fix up the same pile of data. That’s a very interesting issue, but never mind that now!

Berezin et al use data for:

• the geopotential height for six different pressures

and

• the air temperature at those different heights

The geopotential height for some pressure says roughly how high you have to go for air to have that pressure. Click the link if you want a more precise definition! Here’s the geopotential height field for the pressure of 500 millibars on some particular day of some particular year:



The height is in meters.

Berezin et al use daily values for this data for:

• locations world-wide on a grid with a resolution of 5° × 5°,

during:

• the years from 1948 to 2006.

They divide the globe into 9 zones, and separately study each zone:


So, they’ve got twelve different functions of space and time, where space is a rectangle discretized using a 5° × 5° grid, and time is discretized in days. From each such function they build a ‘climate network’.

How do they do it?

The climate networks

Berezin’s method of defining a climate network is similar to Ludescher et al‘s, but different. Compare Part 3 if you want to think about this.

Let \tilde{S}^y_l(t) be any one of their functions, evaluated at the grid point l on day t of year y.

Let S_l^y(t) be \tilde{S}^y_l(t) minus its climatological average. For example, if t is June 1st and y is 1970, we average the temperature at location l over all June 1sts from 1948 to 2006, and subtract that from \tilde{S}^y_l(t) to get S^y_l(t). In other words:

\displaystyle{  \tilde{S}^y_l(t) = S^y_l(t) - \frac{1}{N} \sum_y S^y_l(t)  }

where N is the number of years considered.

For any function of time f, let \langle f^y(t) \rangle be the average of the function over all days in year y. This is different than the ‘running average’ used by Ludescher et al, and I can’t even be 100% sure that Berezin mean what I just said: they use the notation \langle f^y(t) \rangle.

Let l and r be two grid points, and \tau any number of days in the interval [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}]. Define the cross-covariance function at time t by:

\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)

I believe Berezin mean to consider this quantity, because they mention two grid points l and r. Their notation omits the subscripts l and r so it is impossible to be completely sure what they mean! But what I wrote is the reasonable quantity to consider here, so I’ll assume this is what they meant.

They normalize this quantity and take its absolute value, forming:

\displaystyle{ X_{l r}^y(\tau) = \frac{\Big|\Big(f_l(t) - \langle f_l(t) \rangle\Big) \; \Big( f_r(t + \tau) - \langle f_r(t + \tau) \rangle \Big)\Big|}   {\sqrt{\Big\langle \Big(f_l(t)      - \langle f_l(t)\rangle \Big)^2 \Big\rangle  }  \; \sqrt{\Big\langle \Big(f_r(t+\tau) - \langle f_r(t+\tau)\rangle\Big)^2 \Big\rangle  } }  }

They then take the maximum value of X_{l r}^y(\tau) over delays \tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}], subtract its mean over delays in this range, and divide by the standard deviation. They write something like this:

\displaystyle{ W_{l r}^y = \frac{\mathrm{MAX}\Big( X_{l r}^y - \langle X_{l r}^y\rangle \Big) }{\mathrm{STD} X_{l r}^y} }

and say that the maximum, mean and standard deviation are taken over the (not written) variable \tau \in [-\tau_{\mathrm{max}}, \tau_{\mathrm{max}}].

Each number W_{l r}^y is called a link strength. For each year, the matrix of numbers W_{l r}^y where l and r range over all grid points in our zone is called a climate network.

We can think of a climate network as a weighted complete graph with the grid points l as nodes. Remember, an undirected graph is one without arrows on the edges. A complete graph is an undirected graph with one edge between any pair of nodes:



A weighted graph is an undirected graph where each edge is labelled by a number called its weight. But right now we’re also calling the weight the ‘link strength’.

A lot of what’s usually called ‘network theory’ is the study of weighted graphs. You can learn about it here:

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford U. Press, Oxford, 2011.

Suffice it to say that given a weighted graph, there are lot of quantities you can compute from it, which are believed to tell us interesting things!

The conclusions

I will not delve into the real meat of the paper, namely what they actually do with their climate networks! The paper is free online, so you can read this yourself.

I will just quote their conclusions and show you a couple of graphs.

The conclusions touch on an issue that’s important for the network-based approach to El Niño prediction. If climate networks are ‘stable’, not changing much in time, why would we use them to predict a time-dependent phenomenon like the El Niño Southern Oscillation?

We have established the stability of the network of connections between the dynamics of climate variables (e.g. temperatures and geopotential heights) in different geographical regions. This stability stands in fierce contrast to the observed instability of the original climatological field pattern. Thus the coupling between different regions is, to a large extent, constant and predictable. The links in the climate network seem to encapsulate information that is missed in analysis of the original field.

The strength of the physical connection, W_{l r}, that each link in this network represents, changes only between 5% to 30% over time. A clear boundary between links that represent real physical dependence and links that emerge due to noise is shown to exist. The distinction is based on both the high link average strength \overline{W_{l r}} and on the low variability of time delays \mathrm{STD}(T_{l r}).

Recent studies indicate that the strength of the links in the climate network changes during the El Niño Southern Oscillation and the North Atlantic Oscillation cycles. These changes are within the standard deviation of the strength of the links found here. Indeed in Fig. 3 it is clearly seen that the coefficient of variation of links in the El Niño basin (zone 9) is larger than other regions such as zone 1. Note that even in the El Niño basin the coefficient of variation is relatively small (less than 30%).

Beside the stability of single links, also the hierarchy of the link strengths in the climate network is preserved to a large extent. We have shown that this hierarchy is partially due to the two dimensional space in which the network is embedded, and partially due to pure physical coupling processes. Moreover the contribution of each of these effects, and the level of noise was explicitly estimated. The spatial effect is typically around 50% of the observed stability, and the noise reduces the stability value by typically 5%–10%.

The network structure was further shown to be consistent across different altitudes, and a monotonic relation between the altitude distance and the correspondence between the network structures is shown to exist. This yields another indication that the observed network structure represents effects of physical coupling.

The stability of the network and the contributions of different effects were summarized in specific relation to different geographical areas, and a clear distinction between equatorial and off–equatorial areas was observed. Generally, the network structure of equatorial regions is less stable and more fluctuative.

The stability and consistence of the network structure during time and across different altitudes stands in contrast to the known unstable variability of the daily anomalies of climate variables. This contrast indicates an analogy between the behavior of nodes in the climate network and the behavior of coupled chaotic oscillators. While the fluctuations of each coupled oscillators are highly erratic and unpredictable, the interactions between the oscillators is stable and can be predicted. The possible outreach of such an analogy lies in the search for known behavior patterns of coupled chaotic oscillators in the climate system. For example, existence of phase slips in coupled chaotic oscillators is one of the fingerprints for their cooperated behavior, which is evident in each of the individual oscillators. Some abrupt changes in climate variables, for example, might be related to phase slips, and can be understood better in this context.

On the basis of our measured coefficient of variation of single links (around 15%), and the significant overall network stability of 20–40%, one may speculatively assess the extent of climate change. However, for this assessment our current available data is too short and does not include enough time from periods before the temperature trends. An assessment of the relation between the network stability and climate change might be possible mainly through launching of global climate model “experiments” realizing other climate conditions, which we indeed intend to perform.

A further future outreach of our work can be a mapping between network features (such as network motifs) and known physical processes. Such a mapping was previously shown to exist between an autonomous cluster in the climate network and El Niño. Further structures without such a climate interpretation might point towards physical coupling processes which were not observed earlier.

(I have expanded some acronyms and deleted some reference numbers.)

Finally, here two nice graphs showing the average link strength as a function of distance. The first is based on four climate networks for Zone 1, the southern half of South America:



The second is based on four climate networks for Zone 9, a big patch of the Pacific north of the Equator which roughly corresponds to the ‘El Niño basin':



As we expect, temperatures and geopotential heights get less correlated at points further away. But the rate at which the correlation drops off conveys interesting information! Graham Jones has made some interesting charts of this for the rectangle of the Pacific that Ludescher et al use for El Niño prediction, and I’ll show you those next time.

The series so far

El Niño project (part 1): basic introduction to El Niño and our project here.

El Niño project (part 2): introduction to the physics of El Niño.

El Niño project (part 3): summary of the work of Ludescher et al.

El Niño project (part 4): how Graham Jones replicated the work by Ludescher et al, using software written in R.

El Niño project (part 5): how to download R and use it to get files of climate data.

El Niño project (part 6): Steve Wenner’s statistical analysis of the work of Ludescher et al.

El Niño project (part 7): the definition of El Niño.

El Niño project (part 8): Berezin et al on the stability of climate networks.


Network Theory (Part 31)

13 October, 2014

Last time we came up with a category of labelled graphs and described circuits as ‘cospans’ in this category.

Cospans may sound scary, but they’re not. A cospan is just a diagram consisting of an object with two morphisms going into it:

We can talk about cospans in any category. A cospan is an abstract way of thinking about a ‘chunk of stuff’ \Gamma with two ‘ends’ I and O. It could be any sort of stuff: a set, a graph, an electrical circuit, a network of any kind, or even a piece of matter (in some mathematical theory of matter).

We call the object \Gamma the apex of the cospan and call the morphisms i: I \to \Gamma, o : O \to \Gamma the legs of the cospan. We sometimes call the objects I and O the feet of the cospan. We call I the input and O the output. We say the cospan goes from I to O, though the direction is just a convention: we can flip a cospan and get a cospan going the other way!

If you’re wondering about the name ‘cospan’, it’s because a span is a diagram like this:

Since a ‘span’ is another name for a bridge, and this looks like a bridge from I to O, category theorists called it a span! And category theorists use the prefix ‘co-‘ when they turn all the arrows around. Spans came first historically, and we will use those too at times. But now let’s think about how to compose cospans.

Composing cospans is supposed to be like gluing together chunks of stuff by attaching the output of the first to the input of the second. So, we say two cospans are composable if the output of the first equals the input of the second, like this:

We then compose them by forming a new cospan going all the way from X to Z:

The new object \Gamma +_Y \Gamma' and the new morphisms i'', o'' are built using a process called a ‘pushout’ which I’ll explain in a minute. The result is cospan from X to Z, called the composite of the cospans we started with. Here it is:

So how does a pushout work? It’s a general construction that you can define in any category, though it only exists if the category is somewhat nice. (Ours always will be.) You start with a diagram like this:

and you want to get a commuting diamond like this:

which is in some sense ‘the best’ given the diagram we started with. For example, suppose we’re in the category of sets and Y is a set included in both \Gamma and \Gamma'. Then we’d like A to be the union of \Gamma and \Gamma. There are other choices of A that would give a commuting diamond, but the union is the best. Something similar is happening when we compose circuits, but instead of the category of sets we’re using the category of labelled graphs we discussed last time.

How do we make precise the idea that A is ‘the best’? We consider any other potential solution to this problem, that is, some other commuting diamond:

Then A is ‘the best’ if there exists a unique morphism q from A to the ‘competitor’ Q making the whole combined diagram commute:

This property is called a universal property: instead of saying that A is the ‘best’, grownups say it is universal.

When A has this universal property we call it the pushout of the original diagram, and we may write it as \Gamma +_Y \Gamma'. Actually we should call the whole diagram

the pushout, or a pushout square, because the morphisms i'', o'' matter too. The universal property is not really a property just of A, but of the whole pushout square. But often we’ll be sloppy and call just the object A the pushout.

Puzzle 1. Suppose we have a diagram in the category of sets

where Y = \Gamma \cap \Gamma' and the maps i, o' are the inclusions of this intersection in the sets \Gamma and \Gamma'. Prove that A = \Gamma \cup \Gamma' is the pushout, or more precisely the diagram

is a pushout square, where i'', o'' are the inclusions of \Gamma and \Gamma in the union A = \Gamma \cup \Gamma'.

More generally, a pushout in the category of sets is a way of gluing together sets \Gamma and \Gamma' with some ‘overlap’ given by the maps

And this works for labelled graphs, too!

Puzzle 2. Suppose we have two circuits of resistors that are composable, like this:

and this:

These give cospans in the category L\mathrm{Graph} where

L = (0,\infty)

(Remember from last time that L\mathrm{Graph} is the category of graphs with edges labelled by elements of some set L.) Show that if we compose these cospans we get a cospan corresponding to this circuit:

If you’re a mathematician you might find it easier to solve this kind of problem in general, which requires pondering how pushouts work in L\mathrm{Graph}. Alternatively, you might find it easier to think about this particular example: then you can just check that the answer we want has the desired property of a pushout!

If this stuff seems complicated, well, just know that category theory is a very general, powerful tool and I’m teaching you just the microscopic fragment of it that we need right now. Category theory ultimately seems very simple: I can’t really think of any math that’s simpler! It only seem complicated when it’s unfamiliar and you have a fragmentary view of it.

So where are we? We know that circuits made of resistors are a special case of cospans. We know how to compose cospans. So, we know how to compose circuits… and in the last puzzle, we saw this does just what we want.

The advantage of this rather highbrow approach is that a huge amount is known about composing cospans! In particular, suppose we have any category C where pushouts exist: that is, where we can always complete any diagram like this:

to a pushout square. Then we can form a category \mathrm{Cospan}(C) where:

• an object is an object of C

• a morphism from an object I \in C to an object O \in C is an equivalence classes of cospans from I to O:

• we compose cospans in the manner just described.

Why did I say ‘equivalence class’? It’s because the pushout is not usually unique. It’s unique only up to isomorphism. So, composing cospans would be ill-defined unless we work with some kind of equivalence class of cospans.

To be precise, suppose we have two cospans from I to O:

Then a map of cospans from one to the other is a commuting diagram like this:

We say that this is an isomorphism of cospans if f is an isomorphism.

This gives our equivalence relation on cospans! It’s an old famous theorem in category theory—so famous that it’s hard to find a reference for the proof—that whenever C is a category with pushouts, there’s a category \mathrm{Cospan}(C) where:

• an object is an object of C

• a morphism from an object I \in C to an object O \in C is an isomorphism class of cospans from I to O.

• we compose isomorphism classes of cospans by picking representatives, composing them and then taking the isomorphism class.

This takes some work to prove, but it’s true, so this is how we get our category of circuits!

Next time we’ll do something with this category. Namely, we’ll cook up a category of ‘behaviors’. The behavior of a circuit made of resistors just says which currents and potentials its terminals can have. If we put a circuit in a metaphorical ‘black box’ and refuse to peek inside, all we can see is its behavior.

Then we’ll cook up a functor from the category of circuits to the category of behaviors. We’ll call this the ‘black box functor’. Saying that it’s a functor mainly means that

\blacksquare(f g) = \blacksquare(f) \blacksquare(g)

Here f and g are circuits that we can compose, and f g is their composite. The black square is the black box functor, so \blacksquare(fg) is the behavior of the circuit f g. There’s a way to compose behaviors, too, and the equation above says that the behavior of the composite circuit is the composite of their behaviors!

This is very important, because it says we can figure out what a big circuit does if we know what its pieces do. And this is one of the grand themes of network theory: understanding big complicated networks by understanding their pieces. We may not always be able to do this, in practice! But it’s something we’re always concerned with.


Network Theory Seminar (Part 1)

11 October, 2014

 

Check out this video! I start with a quick overview of network theory, and then begin building a category where the morphisms are electrical circuits. These lecture notes provide extra details:

Network theory (part 30).

With luck, this video will be the first of a series. I’m giving a seminar on network theory at U.C. Riverside this fall. I’ll start by sketching the results here:

• John Baez and Brendan Fong, A compositional framework for passive linear networks.

But this is a big paper, and I also want to talk about other papers, so I certainly won’t explain everything in here—just enough to help you get started! If you have questions, don’t be shy about asking them.

I thank Blake Pollard for filming this seminar, and Muhammad “Siddiq” Siddiqui-Ali for providing the videocamera and technical support.


Follow

Get every new post delivered to your Inbox.

Join 2,874 other followers