The Large-Number Limit for Reaction Networks (Part 3)

8 October, 2014

joint with Arjun Jain

We used to talk about reaction networks quite a lot here. When Arjun Jain was visiting the CQT, we made a lot of progress understanding how the master equation reduces to the rate equation in the limit where there are very large numbers of things of each kind. But we never told you the end of the story, and by now it’s been such a long time that you may need a reminder of some basic facts!

So…

The rate equation treats the number of things of each kind as continuous—a nonnegative real number—and says how it changes in a deterministic way.

The master equation treats the number of things of each kind as discrete—a nonnegative integer—and says how it changes in a probabilistic way.

You can think of the master equation as the ‘true’ description, and the rate equation as an approximation that’s good in some limit where there are large numbers of molecules — or more precisely, where the probability distribution of having some number of molecules of each kind is sharply peaked near some large value.

You may remember that in the master equation, the state of a chemical system is described by a vector \psi in a kind of ‘Fock space’, while time evolution is described with the help of an operator on this space, called the ‘Hamiltonian’ H:

\displaystyle{ \frac{d}{dt} \psi(t) = H \psi(t) }

The Hamiltonian is built from annihilation and creation operators, so all of this looks very much like quantum field theory. The details are here, and we won’t try to review them all:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics.

The point is this: the ‘large-number limit’ where the master equation reduces to the rate equation smells a lot like the ‘classical limit’ of quantum field theory, where the description of light in terms of photons reduces to the good old Maxwell equations. So, maybe we can understand the large-number limit by borrowing techniques from quantum field theory!

How do we take the classical limit of quantum electromagnetism and get the classical Maxwell equations? For simplicity let’s ignore charged particles and consider the ‘free electromagnetic field’: just photons, described by the quantum version of Maxwell’s equations. When we take the classical limit we let Planck’s constant \hbar go to zero: that much is obvious. However, that’s not all! The energy of each photon is proportional to \hbar, so to take the classical limit and get a solution of the classical Maxwell’s equations with nonzero energy we also need to increase the number of photons. We cleverly do this in such a way that the total energy remains constant as \hbar \to 0.

So, in quantum electromagnetism the classical limit is also a large-number limit!

That’s a good sign. It suggests the same math will also apply to our reaction network problem.

But then we hit an apparent roadblock. What’s the analogue of Planck’s constant in chemical reaction networks? What should go to zero?

We told you the answer to that puzzle a while ago: it’s the reciprocal of Avogadro’s number!

You see, chemists measure numbers of molecules in ‘moles’. There’s a large number of molecules in each mole: Avogadro’s number. If we let the reciprocal of Avogadro’s number go to zero, we are taking a limit where chemistry becomes ‘continuous’ and the discreteness of molecules goes away. Of course this is just a mathematical trick, but it’s a very useful one.

So, we got around that roadblock. And then something nice happened.

When taking the classical limit of quantum electromagnetism, we focus attention on certain quantum states that are the ‘best approximations’ to classical states. These are called ‘coherent states’, and it’s very easy to study how the behave as we simultaneously let \hbar \to 0 and let the expected number of photons go to infinity.

And the nice thing is that these coherent states are also important in chemistry! But because chemistry involves probabilities rather than amplitudes, they have a different name: ‘Poisson distributions’. On this blog, Brendan Fong used them to give a new proof of a great result in mathematical chemistry, the Anderson–Craciun–Kurtz theorem.

So, we have most of the concepts and tools in place, and we can tackle the large-number limit using quantum techniques.

You can review the details here:

The large-number limit for reaction networks (part 1).

The large-number limit for reaction networks (part 2) .

So, after a quick refresher on the notation, we’ll plunge right in.

As you’ll see, we solve the problem except for one important technical detail: passing a derivative through a limit! This means our main result is not a theorem. Rather, it’s an idea for how to prove a theorem. Or if we act like physicists, we can call it a theorem.

Review of notation

The rate equation says

\displaystyle{\frac{d}{dt}{x}(t) = \sum_{\tau \in T} r(\tau) (t(\tau)-s(\tau)) {x}(t)^{s(\tau)} }

where:

x(t) \in \mathbb{R}^k is a vector describing concentrations of k different species at time t. In chemistry these species could be different kinds of molecules.

• Each \tau \in T is a transition, or in chemistry, a reaction.

s(\tau) \in \mathbb{N}^k is a vector of natural numbers saying how many items of each species appear as inputs to the reaction \tau. This is called the source of the reaction.

t(\tau) \in \mathbb{N}^k is a vector of natural numbers saying how many items of each species appear as outputs of the reaction \tau. This is called the target of the reaction. So, t(\tau) - s(\tau) says the net change of the number of items of each species in the reaction \tau.

• The rate at which the reaction \tau occurs is proportional to the rate constant r(\tau) times the number

x(t)^{s(\tau)}

Here we are raising a vector to a vector power and getting a number, using this rule:

x^r = x_1^{r_1} \cdots x_k^{r_k}

where r is any vector of natural numbers (r_1, \dots, r_k), and x is any vector of nonnegative real numbers. From now on we’ll call a vector of natural numbers a multi-index.

In this paper:

• John Baez, Quantum techniques for reaction networks.

it was shown that the master equation implies

\displaystyle{\frac{d}{dt}\langle N_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle N^{\, \underline{s(\tau)}} \, \psi(t)\rangle }

Here:

\psi(t) is the stochastic state saying the probability of having any particular number of items of each species at each time t. We won’t review the precise details of how this work; for that reread the relevant bit of Part 8.

N_i is the ith number operator, defined using annihilation and creation operators as in quantum mechanics:

N_i = a_i^\dagger a_i

For the annihilation and creation operators, see Part 8.

\langle N_i \psi(t) \rangle is the expected number of items of the ith species at time t.

• Similarly, \langle N^{\underline{s(\tau)}} \psi(t)\rangle is the expected value of a certain product of operators. For any multi-index r we define the falling power

N_i^{\underline{r}_i} = N_i (N_i - 1) (N_i - 2) \cdots (N_i - r_i +1)

and then we define

N^{\underline{r}} = N_1^{\underline{r_1}} \cdots N_k^{\underline{r_k}}

The large-number limit

Okay. Even if you don’t understand any of what we just said, you’ll see the master and rate equation look similar. The master equation implies this:

\displaystyle{\frac{d}{dt}\langle N_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle N^{\, \underline{s(\tau)}} \, \psi(t)\rangle }

while the rate equation says this:

\displaystyle{\frac{d}{dt}{x}(t) = \sum_{\tau \in T} r(\tau) (t(\tau)-s(\tau)) \; {x}(t)^{s(\tau)} }

So, we will try to get from the first to the second second with the help of a ‘large-number limit’.

We start with a few definitions. We introduce an adjustable dimensionless quantity which we call \hbar. This is just a positive number, which has nothing to do with quantum theory except that we’re using a mathematical analogy to quantum mechanics to motivate everything we’re doing.

Definition. The rescaled number operators are defined as \widetilde{N}_i = \hbar N_i. This can be thought of as a rescaling of the number of objects, so that instead of counting objects individually, we count them in bunches of size 1/\hbar.

Definition. For any multi-index r we define the rescaled falling power of the number operator N_i by:

\widetilde{N}_i^{\underline{r_i}} = \widetilde{N}_i (\widetilde{N}_i - \hbar)(\widetilde{N}_i-2\hbar)\cdots (\widetilde{N}_i-r_i\hbar+\hbar)

and also define

\widetilde{N}^{\underline{r}} = \widetilde{N}_1^{\underline{r_1}} \; \cdots \;\widetilde{N}_k^{\underline{r_k}}

for any multi-index r.

Using these, we get the following equation:

\displaystyle{\frac{1}{\hbar}\frac{d}{dt} \langle \widetilde{N}_i \psi(t)\rangle = \sum_{\tau \in T} r(\tau) (t_i(\tau)-s_i(\tau)) \; \langle \widetilde{N}^{\underline{s(\tau)}} \psi(t)\rangle \; \frac{1}{\hbar^{|s(\tau)|}} }

where for any multi-index r we set

|r| = r_1 + \cdots + r_k

This suggests a way to rescale the rate constants in the master equation:

Definition. The rescaled rate constants are

\displaystyle{\widetilde{r}(\tau) = \frac{r(\tau)}{\hbar^{|s(\tau)|-1}}}

From here onwards, we change our viewpoint. We consider the rescaled rate constants \widetilde{r}(\tau) to be fixed, instead of the original rate constants r(\tau). So, as we decrease \hbar, we are studying situations where the original rate constants change to ensure that the rescaled rate constants stays fixed!

So, we switch to working with a rescaled master equation:

Definition. The rescaled master equation is:

\displaystyle{\frac{d}{dt} \langle \widetilde{N}_i\widetilde{\psi}(t)\rangle = \sum_{\tau \in T} \widetilde{r}(\tau) (t_i(\tau)-s_i(\tau)) \; \langle \widetilde{N}^{\underline{s(\tau)}} \widetilde{\psi}(t)\rangle }

This is really a one-parameter family of equations, depending on \hbar\in (0,\infty). We write a solution of the rescaled master equation as \widetilde{\psi}(t), but it is really one solution for each value of \hbar.

Following the same procedure as above, we can rescale the rate equation, using the same definition of the rescaled rate constants:

Definition. The rescaled number of objects of the i\mathrm{th} species is defined as \widetilde{x_i}=\hbar x_i, where x_i is the original number of objects of the i\mathrm{th} species. Here again, we are counting in bunches of 1/\hbar.

Using this to rescale the rate equation, we get

Definition. The rescaled rate equation is

\displaystyle{\frac{d}{dt}\widetilde{x}(t) = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\; \widetilde{x}(t)^{s(\tau)} }

where

\widetilde{x}(t)=(\widetilde{x_1}(t),\widetilde{x_2}(t),\dots, \widetilde{x_k}(t))

Therefore, to go from the rescaled master equation to the rescaled rate equation, we require that

\langle\widetilde{N}^{\underline{r}}\, \widetilde{\psi}(t)\rangle \to \langle\widetilde{N}\widetilde{\psi}(t)\rangle^r

as \hbar \to 0. If this holds, we can identify \langle\widetilde{N}\widetilde{\psi}(t)\rangle with \widetilde{x}(t) and get the rate equation from the master equation!

To this end, we introduce the following crucial idea:

Definition. A semiclassical family of states, \widetilde{\psi}, is defined as a one-parameter family of states depending on \hbar \in (0,\infty) such that for some \widetilde{c} \in [0,\infty)^k we have

\langle\widetilde{N}^r\widetilde{\psi}\rangle \to \widetilde{c}^{\, r}

for every r\in \mathbb{N}^k as \hbar \to 0.

In particular, this implies

\langle\widetilde{N}_i\widetilde{\psi}\rangle \to \widetilde{c}_i

for every index i.

Intuitively, a semiclassical family is a family of probability distributions that becomes more and more sharply peaked with a larger and larger mean as \hbar decreases. We would like to show that in this limit, the rescaled master equation gives the rescaled rate equation.

We make this precise in the following propositions.

Proposition 1. If \widetilde{\psi} is a semiclassical family as defined above, then in the \hbar \to 0 limit, we have \langle\widetilde{N}^{\underline{r}}\widetilde{\psi}\rangle \to \widetilde{c}^{\; r} as well.

Proof. For each index i,

\displaystyle{ \langle\widetilde{N}_i^{\; \underline{r_i}}\, \widetilde{\psi}\rangle = \displaystyle{ \langle \widetilde{N}_i (\widetilde{N}_i - \hbar)(\widetilde{N}_i-2\hbar)\cdots(\widetilde{N}_i-r_i\hbar+\hbar)\,\widetilde{\psi}\rangle} }

\displaystyle{ = \Big\langle\Big(\widetilde{N}_i^r + \hbar\frac{r_i(r_i-1)}{2}\widetilde{N}_i^{r_i-1}+\cdots + \hbar^{r_i-1}(r_i-1)!\Big)\,\widetilde{\psi}\Big\rangle }

By the definition of a semiclassical family,

\displaystyle{ \lim_{\hbar\to 0} \langle\Big(\widetilde{N}_i^{r_i} + \hbar\frac{r_i(r_i-1)}{2}\widetilde{N}_i^{r_i-1}+ \cdots + \hbar^{r_i-1}(r_i-1)!\Big)\;\widetilde{\psi}\rangle} = \widetilde{c}_i^{\; r_i}

since every term but the first approaches zero. Thus, we have

\displaystyle{ \lim_{\hbar \to 0} \langle\widetilde{N}_i^{\; \underline{r_i}}\, \widetilde{\psi}\rangle =   \widetilde{c}_i^{\; r_i} }

A similar but more elaborate calculation shows that

\displaystyle{ \lim_{\hbar \to 0} \langle\widetilde{N}_1^{\, \underline{r_1}} \cdots\widetilde{N}_k^{\, \underline{r_k}} \, \widetilde{\psi}\rangle = \lim_{\hbar \to 0} \langle\widetilde{N}_1^{\, r_1}\cdots \widetilde{N}_k^{\, r_k} \widetilde{\psi}\rangle= \lim_{\hbar \to 0}\langle\widetilde{N}^{\, r} \, \widetilde{\psi}\rangle }

or in other words

\langle\widetilde{N}^{\, \underline{r}}\,\widetilde{\psi}\rangle \to \widetilde{c}^{\, r}

as \hbar \to 0.   █

Proposition 2. If \widetilde{\psi} is a semiclassical family of states, then

\displaystyle{  \langle (\widetilde{N}-\widetilde{c})^{r}\, \widetilde{\psi}\rangle \to 0 }

for any multi-index r.

Proof. Consider the r_i\mathrm{th} centered moment of the i\mathrm{th} number operator:

\displaystyle{\langle(\widetilde{N}_i-\widetilde{c}_i)^{r_i}\widetilde{\psi}\rangle = \sum_{p =0}^{r_i} {r_i \choose p}\langle\widetilde{N}_i^p\widetilde{\psi}\rangle(-\widetilde{c}_i)^{r_i-p} }

Taking the limit as \hbar goes to zero, this becomes

\begin{array}{ccl} \displaystyle{ \lim_{\hbar \to 0}\sum_{p =0}^{r_i} {r_i \choose p}\langle\widetilde{N}_i^p\widetilde{\psi}\rangle(-\widetilde{c}_i)^{r_i-p} } &=& \displaystyle{ \sum_{p =0}^{r_i} {r_i \choose p}(\widetilde{c}_i)^p(-\widetilde{c}_i)^{r_i-p} } \\ \\  &=& (\widetilde{c}_i-\widetilde{c}_i)^{r_i} \\ \\  &=& 0 \end{array}

For a general multi-index r we can prove the same sort of thing with a more elaborate calculation. First note that

\langle (\widetilde{N}-\widetilde{c})^{r}\widetilde{\psi}\rangle=\langle(\widetilde{N_1}-\widetilde{c_1})^{r_1} \cdots (\widetilde{N_k}-\widetilde{c_k})^{r_k})\widetilde{\psi}\rangle

The right-hand side can be expanded as

\displaystyle{ \langle(\sum_{p_1 =0}^{r_1} {r_1 \choose p_1}\widetilde{N}_1^{p_1}(-\widetilde{c}_1)^{r_1-p_1} ) \cdots (\sum_{p_k =0}^{r_k} {r_k \choose p_k}\widetilde{N}_k^{p_k}(-\widetilde{c}_k)^{r_k-p_k} )\widetilde{\psi} }\rangle

We can write this more tersely as

\displaystyle{ \sum_{p=0}^r} {r\choose p} \langle \widetilde{N}^p\widetilde{\psi}\rangle (-\widetilde{c})^{r-p}

where for any multi-index r we define

\displaystyle{{\sum_{p=0}^{r}}= \sum_{p_1 =0}^{r_1} \cdots  \sum_{p_k =0}^{r_k}  }

and for any multi-indices r, p we define

\displaystyle{ {r \choose p}={r_1 \choose p_1}{r_2 \choose p_2}\cdots {r_k \choose p_k}}

Now using the definition of a semiclassical state, we see

\displaystyle{ \lim_{\hbar \to 0} \sum_{p=0}^r} {r\choose p} \langle \widetilde{N}^p\widetilde{\psi}\rangle (-\widetilde{c})^{r-p}= \displaystyle{ \sum_{p=0}^r} {r\choose p} (\widetilde{c})^{p} (-\widetilde{c})^{r-p}

But this equals zero, as the last expression, expanded, is

\displaystyle{ (\widetilde{c})^r \left( \sum_{p_1=0}^{r_1} {r_1\choose p_1} (-1)^{r_1-p_1}\right) \cdots \left( \sum_{p_k=0}^{r_k} {r_k\choose p_k} (-1)^{r_k-p_k} \right) }

where each individual sum is zero.   █

Here is the theorem that would finish the job if we could give a fully rigorous proof:

“Theorem.” If \widetilde{\psi}(t) is a solution of the rescaled master equation and also a semiclassical family for the time interval [t_0,t_1], then \widetilde{x}(t) = \langle \widetilde{N} \widetilde{\psi}(t) \rangle is a solution of the rescaled rate equation for t \in [t_0,t_1].

Proof sketch. We sketch a proof that relies on the assumption that we can pass the \hbar \to 0 limit through a time derivative. Of course, to make this rigorous, we would need to justify this. Perhaps it is true only in certain cases.

Assuming that we can pass the limit through the derivative:

\displaystyle{\lim_{\hbar \to 0}\frac{d}{dt} \langle \widetilde{N}\widetilde{\psi}(t)\rangle = \lim_{\hbar \to 0} \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\langle \widetilde{N}^{\, \underline{s(\tau)}} \, \widetilde{\psi}(t)\rangle }

and thus

\displaystyle{\frac{d}{dt}\lim_{\hbar \to 0} \langle \widetilde{N}\widetilde{\psi}(t)\rangle = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau)) \lim_{\hbar \to 0}\langle \widetilde{N}^{\, \underline{s(\tau)}} \, \widetilde{\psi}(t)\rangle }

and thus

\displaystyle{\frac{d}{dt}\widetilde{x}(t) = \sum_{\tau \in T} \widetilde{r}(\tau) (t(\tau)-s(\tau))\widetilde{x}^{\, s(\tau)} }.

As expected, we obtain the rescaled rate equation.   █

Another question is this: if we start with a semiclassical family of states as our initial data, does it remain semiclassical as we evolve it in time? This will probably be true only in certain cases.

An example: rescaled coherent states

The best-behaved semiclassical states are the coherent states.
Consider the family of coherent states

\displaystyle{\widetilde{\psi}_{\widetilde{c}} = \frac{e^{(\widetilde{c}/\hbar) z}}{e^{\widetilde{c}/\hbar}}}

using the notation developed in the earlier mentioned paper. In that paper it was shown that for any multi-index m and any coherent state \Psi we have

\langle N^{\underline{m}}\Psi\rangle = \langle N\Psi \rangle^m

Using this result for \widetilde{\psi}_{\widetilde{c}} we get

\displaystyle{\langle \widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~=~\hbar^{|m|}\langle N^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~=~\hbar^{|m|}\langle N\widetilde{\psi}_{\widetilde{c}}\rangle^m~=~\hbar^{|m|}\frac{\widetilde{c}^m}{\hbar^{|m|}}~=~\widetilde{c}^m}

Since \langle\widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle equals \langle \widetilde{N}^{m} \widetilde{\psi}_{\widetilde{c}}\rangle plus terms of order \hbar, as \hbar \to 0 we have

\langle\widetilde{N}^{\underline{m}}\widetilde{\psi}_{\widetilde{c}}\rangle~\to~\langle\widetilde{N}^{m}\widetilde{\psi}_{\widetilde{c}}\rangle=\widetilde{c}^{m}

showing that our chosen \widetilde{\psi}_{\widetilde{c}} is indeed a semiclassical family.


The Stochastic Resonance Program (Part 2)

28 August, 2014

guest post by David Tanzer

Last time we introduced the concept of stochastic resonance. Briefly, it’s a way that noise can amplify a signal, by giving an extra nudge that helps a system receiving that signal make the jump from one state to another. Today we’ll describe a program that demonstrates this concept. But first, check it out:

Stochastic resonance.

No installation required! It runs as a web page which allows you to set the parameters of the model and observe the resulting output signal. It has a responsive behavior, because it runs right in your browser, as javascript.

There are sliders for controlling the amounts of sine wave and noise involved in the mix. As explained in the previous article, when we set the wave to a level not quite sufficient to cause the system to oscillate between states, and we add in the right amount of noise, stochastic resonance should kick in:


The program implements a mathematical model that runs in discrete time. It has two stable states, and is driven by a combination of a sine forcing function and a noise source.

The code builds on top of a library called JSXGraph, which supports function plotting, interactive graphics, and data visualization.

Running the program

If you haven’t already, go try the program. On one plot it shows a sine wave, called the forcing signal, and a chaotic time-series, called the output signal.

There are four sliders, which we’ll call Amplitude, Frequency, Noise and Sample-Path.

• The Amplitude and Frequency sliders control the sine wave. Try them out.

• The output signal depends, in a complex way, on the sine wave. Vary Amplitude and Frequency to see how they affect the output signal.

• The amount of randomization involved in the process is controlled by the Noise slider. Verify this.

• Change the Sample-Path slider to alter the sequence of random numbers that are fed to the process. This will cause a different instance of the process to be displayed.

Now try to get stochastic resonance to kick in…

Going to the source

Time to look at the blueprints. It’s easy.

• Open the model web page. The code is now running in your browser.

• While there, run your browser’s view-source function. For Firefox on the Mac, click Apple-U. For Firefox on the PC, click Ctrl-U.

• You should see the html file for the web page itself.

• See the “script” directives at the head of this file. Each one refers to javascript program on the internet. When the browser sees it, the program is fetched and loaded into the browser’s internal javascript interpreter. Here are the directives:

http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default

http://cdnjs.cloudflare.com/ajax/libs/jsxgraph/0.93/jsxgraphcore.js

http://./StochasticResonanceEuler.js

http://./normals.js

The first one loads MathJax, which is a formula-rendering engine. Next comes JSXGraph, a library that provides support for plotting and interactive graphics. Next, StochchasticResonanceEuler.js is the main code for the model, and finally, normals.js provides random numbers.

• In the source window, click on the link for StochasticResonanceEuler.js — and you’ve reached the source!

Anatomy of the program

The program implements a stochastic difference equation, which defines the changes in the output signal as a function of its current value and a random noise value.

It consists of the following components:

  1. Interactive controls to set parameters
  2. Plot of the forcing signal

  3. Plot of the output signal

  4. A function that defines a particular SDE

  5. A simulation loop, which renders the output signal.

The program contains seven functions. The top-level function is initCharts. It dispatches to initControls, which builds the sliders, and initSrBoard, which builds the curve objects for the forcing function and the output signal (called “position curve” in the program). Each curve object is assigned a function that computes the (x,t) values for the time series, which gets called whenever the input parameters change. The function that is assigned to the forcing curve computes the sine wave, and reads the amplitude and frequency values from the sliders.

The calculation method for the output signal is set to the function mkSrPlot, which performs the simulation. It begins by defining a function for the deterministic part of the derivative:

deriv = Deriv(t,x) = SineCurve(t) + BiStable(x),

Then it constructs a “stepper” function, through the call Euler(deriv, tStep). A stepper function maps the current point (t,x) and a noise sample to the next point (t’,x’). The Euler stepper maps

((t,x), noiseSample)

to

(t + tStep, x + tStep * Deriv(t,x) + noiseSample).

The simulation loop is then performed by the function sdeLoop, which is given:

• The stepper function

• The noise amplitude (“dither”)

• The initial point (t0,x0)

• A randomization offset

• The number of points to generate

The current point is initialized to (t0,x0), and then the stepper is repeatedly applied to the current point and the current noise sample. The output returned is the sequence of (t,x) values.

The noise samples are normally distributed random numbers stored in an array. They get scaled by the noise amplitude when they are used. The array contains more values than are needed. By changing the starting point in the array, different instances of the process are obtained.

Making your own version of the program

Now let’s tweak the program to do new things.

First let’s make a local copy of the program on your local machine, and get it to run there. Make a directory, say /Users/macbookpro/stochres. Open the html file in the view source window. Paste it into the file /Users/macbookpro/stochres/stochres.html. Next, in the view source window, click on the link to StochasticResonanceEuler.js. Paste the text into /Users/macbookpro/stochres/StochasticResonanceEuler.js.

Now point your browser to the file, with the URL file:///Users/macbookpro/stochres/stochres.html. To prove that you’re really executing the local copy, make a minor edit to the html text, and check that it shows up when you reload the page. Then make a minor edit to StochasticResonanceEuler.js, say by changing the label text on the slider from “forcing function” to “forcing signal.”

Programming exercises

Now let’s get warmed up with some bite-sized programming exercises.

  1. Change the color of the sine wave.
  • Change the exponent in the bistable polynomial to values other than 2, to see how this affects the output.

  • Add an integer-valued slider to control this exponent.

  • Modify the program to perform two runs of the process, and show the output signals in different colors.

  • Modify it to perform ten runs, and change the output signal to display the point-wise average of these ten runs.

  • Add an input slider to control the number of runs.

  • Add another plot, which shows the standard deviation of the output signals, at each point in time.

  • Replace the precomputed array of normally distributed random numbers with a run-time computation that uses a random number generator. Use the Sample-Path slider to seed the random number generator.

  • When the sliders are moved, explain the flow of events that causes the recalculation to take place.

  • A small research project

    What is the impact of the frequency of the forcing signal on its transmission through stochastic resonance?

    • Make a hypothesis about the relationship.

    • Check your hypothesis by varying the Frequency slider.

    • Write a function to measure the strength of the output signal at the forcing frequency. Let sinwave be a discretely sampled sine wave at the forcing frequency, and coswave be a discretely sampled cosine wave. Let sindot = the dot product of sinwave and the output signal, and similarly for cosdot. Then the power measure is sindot2 + cosdot2.

    • Modify the program to perform N trials at each frequency over some specified range of frequency, and measure the average power over all the N trials. Plot the power as a function of frequency.

    • The above plot required you to fix a wave amplitude and noise level. Choose five different noise levels, and plot the five curves in one figure. Choose your noise levels in order to explore the range of qualitative behaviors.

    • Produce several versions of this five-curve plot, one for each sine amplitude. Again, choose your amplitudes in order to explore the range of qualitative behaviors.


    Exploring Climate Data (Part 1)

    1 August, 2014

    joint with Dara O Shayda

    Emboldened by our experiments in El Niño analysis and prediction, people in the Azimuth Code Project have been starting to analyze weather and climate data. A lot of this work is exploratory, with no big conclusions. But it’s still interesting! So, let’s try some blog articles where we present this work.

    This one will be about the air pressure on the island of Tahiti and in a city called Darwin in Australia: how they’re correlated, and how each one varies. This article will also be a quick introduction to some basic statistics, as well as ‘continuous wavelet transforms’.

    Darwin, Tahiti and El Niños

    The El Niño Southern Oscillation is often studied using the air pressure in Darwin, Australia versus the air pressure in Tahiti. When there’s an El Niño, it gets stormy in the eastern Pacific so the air temperatures tend to be lower in Tahiti and higher in Darwin. When there’s a La Niña, it’s the other way around:



    The Southern Oscillation Index or SOI is a normalized version of the monthly mean air pressure anomaly in Tahiti minus that in Darwin. Here anomaly means we subtract off the mean, and normalized means that we divide by the standard deviation.

    So, the SOI tends to be negative when there’s an El Niño. On the other hand, when there’s an El Niño the Niño 3.4 index tends to be positive—this says it’s hotter than usual in a certain patch of the Pacific.

    Here you can see how this works:



    When the Niño 3.4 index is positive, the SOI tends to be negative, and vice versa!

    It might be fun to explore precisely how well correlated they are. You can get the data to do that by clicking on the links above.

    But here’s another question: how similar are the air pressure anomalies in Darwin and in Tahiti? Do we really need to take their difference, or are they so strongly anticorrelated that either one would be enough to detect an El Niño?

    You can get the data to answer such questions here:

    Southern Oscillation Index based upon annual standardization, Climate Analysis Section, NCAR/UCAR. This includes links to monthly sea level pressure anomalies in Darwin and Tahiti, in either ASCII format (click the second two links) or netCDF format (click the first one and read the explanation).

    In fact this website has some nice graphs already made, which I might as well show you! Here’s the SOI and also the sum of the air pressure anomalies in Darwin and Tahiti, normalized in some way:


    (Click to enlarge.)

    If the sum were zero, the air pressure anomalies in Darwin and Tahiti would contain the same information and life would be simple. But it’s not!

    How similar in character are the air pressure anomalies in Darwin and Tahiti? There are many ways to study this question. Dara tackled it by taking the air pressure anomaly data from 1866 to 2012 and computing some ‘continuous wavelet transforms’ of these air pressure anomalies. This is a good excuse for explaining how a continuous wavelet transform works.

    Very basic statistics

    It helps to start with some very basic statistics. Suppose you have a list of numbers

    x = (x_1, \dots, x_n)

    You probably know how to take their mean, or average. People often write this with angle brackets:

    \displaystyle{ \langle x \rangle = \frac{1}{n} \sum_{i = 1}^n x_i }

    You can also calculate the mean of their squares:

    \displaystyle{  \langle x^2 \rangle = \frac{1}{n} \sum_{i = 1}^n x_i^2 }

    If you were naive you might think \langle x^2 \rangle = \langle x \rangle^2, but in fact we have:

    \langle x^2 \rangle \ge \langle x \rangle^2

    and they’re equal only if all the x_i are the same. The point is that if the numbers x_i are spread out, the squares of the big ones (positive or negative) contribute more to the average of the squares than if we had averaged them out before squaring. The difference

    \langle x^2 \rangle - \langle x \rangle^2

    is called the variance; it says how spread out our numbers are. The square root of the variance is the standard deviation:

    \sigma_x = \sqrt{\langle x^2 \rangle - \langle x \rangle^2 }

    and this has the slight advantage that if you multiply all the numbers x_i by some constant c, the standard deviation gets multiplied by |c|. (The variance gets multiplied by c^2.)

    We can generalize the variance to a situation where we have two lists of numbers:

    x = (x_1, \dots, x_n)

    y = (y_1, \dots, y_n)

    Namely, we can form the covariance

    \langle x y \rangle - \langle x \rangle \langle y \rangle

    This reduces to the variance when x = y. It measures how much x and y vary together — ‘hand in hand’, as it were. A bit more precisely: if x_i is greater than its mean value mainly for i such that y_i is greater than its mean value, the covariance is positive. On the other hand, if x_i tends to be greater than average when y_i is smaller than average — like with the air pressures at Darwin and Tahiti — the covariance will be negative.

    For example, if

    x = (1,-1), \quad y = (1,-1)

    then they ‘vary hand in hand’, and the covariance

    \langle x y \rangle - \langle x \rangle \langle y \rangle = 1 - 0 = 1

    is positive. But if

    x = (1,-1), \quad y = (-1,1)

    then one is positive when the other is negative, so the covariance

    \langle x y \rangle - \langle x \rangle \langle y \rangle = -1 - 0 = -1

    is negative.

    Of course the covariance will get bigger if we multiply both x and y by some big number. If we don’t want this effect, we can normalize the covariance and get the correlation:

    \displaystyle{ \frac{ \langle x y \rangle - \langle x \rangle \langle y \rangle }{\sigma_x \sigma_y} }

    which will always be between -1 and 1.

    For example, if we compute the correlation between the air pressure anomalies at Darwin and Tahiti, measured monthly from 1866 to 2012, we get
    -0.253727. This indicates that when one goes up, the other tends to go down. But since we’re not getting -1, it means they’re not completely locked into a linear relationship where one is some negative number times the other.

    Okay, we’re almost ready for continuous wavelet transforms! Here is the main thing we need to know. If the mean of either x or y is zero, the formula for covariance simplifies a lot, to

    \displaystyle{  \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }

    So, this quantity says how much the numbers x_i ‘vary hand in hand’ with the numbers y_i, in the special case when one (or both) has mean zero.

    We can do something similar if x, y : \mathbb{R} \to \mathbb{R} are functions of time defined for all real numbers t. The sum becomes an integral, and we have to give up on dividing by n. We get:

    \displaystyle{  \int_{-\infty}^\infty x(t) y(t)\; d t }

    This is called the inner product of the functions x and y, and often it’s written \langle x, y \rangle, but it’s a lot like the covariance.

    Continuous wavelet transforms

    What are continuous wavelet transforms, and why should we care?

    People have lots of tricks for studying ‘signals’, like series of numbers x_i or functions x : \mathbb{R} \to \mathbb{R}. One method is to ‘transform’ the signal in a way that reveals useful information. The Fourier transform decomposes a signal into sines and cosines of different frequencies. This lets us see how much power the signal has at different frequencies, but it doesn’t reveal how the power at different frequencies changes with time. For that we should use something else, like the Gabor transform explained by Blake Pollard in a previous post.

    Sines and cosines are great, but we might want to look for other patterns in a signal. A ‘continuous wavelet transform’ lets us scan a signal for appearances of a given pattern at different times and also at different time scales: a pattern could go by quickly, or in a stretched out slow way.

    To implement the continuous wavelet transform, we need a signal and a pattern to look for. The signal could be a function x : \mathbb{R} \to \mathbb{R}. The pattern would then be another function y: \mathbb{R} \to \mathbb{R}, usually called a wavelet.

    Here’s an example of a wavelet:


    If we’re in a relaxed mood, we could call any function that looks like a bump with wiggles in it a wavelet. There are lots of famous wavelets, but this particular one is the fourth derivative of a certain Gaussian. Mathematica calls this particular wavelet DGaussianWavelet[4], and you can look up the formula under ‘Details’ on their webpage.

    However, the exact formula doesn’t matter at all now! If we call this wavelet y, all that matters is that it’s a bump with wiggles on it, and that its mean value is 0, or more precisely:

    \displaystyle{ \int_{-\infty}^\infty y(t) \; d t = 0 }

    As we saw in the last section, this fact lets us take our function x and the wavelet y and see how much they ‘vary hand it hand’ simply by computing their inner product:

    \displaystyle{ \langle x , y \rangle = \int_{-\infty}^\infty x(t) y(t)\; d t }

    Loosely speaking, this measures the ‘amount of y-shaped wiggle in the function x’. It’s amazing how hard it is to say something in plain English that perfectly captures the meaning of a simple formula like the above one—so take the quoted phrase with a huge grain of salt. But it gives a rough intuition.

    Our wavelet y happens to be centered at t  = 0. However, we might be interested in y-shaped wiggles that are centered not at zero but at some other number s. We could detect these by shifting the function y before taking its inner product with x:

    \displaystyle{ \int_{-\infty}^\infty x(t) y(t-s)\; d t }

    We could also be interested in measuring the amount of some stretched-out or squashed version of a y-shaped wiggle in the function x. Again we could do this by changing y before taking its inner product with x:

    \displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t}{P}\right) \; d t }

    When P is big, we get a stretched-out version of y. People sometimes call P the period, since the period of the wiggles in y will be proportional to this (though usually not equal to it).

    Finally, we can combine these ideas, and compute

    \displaystyle{ \int_{-\infty}^\infty x(t) \; y\left(\frac{t- s}{P}\right)\; dt }

    This is a function of the shift s and period P which says how much of the s-shifted, P-stretched wavelet y is lurking in the function x. It’s a version of the continuous wavelet transform!

    Mathematica implements this idea for time series, meaning lists of numbers x = (x_1,\dots,x_n) instead of functions x : \mathbb{R} \to \mathbb{R}. The idea is that we think of the numbers as samples of a function x:

    x_1 = x(\Delta t)

    x_2 = x(2 \Delta t)

    and so on, where \Delta t is some time step, and replace the integral above by a suitable sum. Mathematica has a function ContinuousWaveletTransform that does this, giving

    \displaystyle{  w(s,P) = \frac{1}{\sqrt{P}} \sum_{i = 1}^n x_i \; y\left(\frac{i \Delta t - s}{P}\right) }

    The factor of 1/\sqrt{P} in front is a useful extra trick: it’s the right way to compensate for the fact that when you stretch out out your wavelet y by a factor of P, it gets bigger. So, when we’re doing integrals, we should define the continuous wavelet transform of y by:

    \displaystyle{ w(s,P) = \frac{1}{\sqrt{P}} \int_{-\infty}^\infty x(t) y(\frac{t- s}{P})\; dt }

    The results

    Dara Shayda started with the air pressure anomaly at Darwin and Tahiti, measured monthly from 1866 to 2012. Taking DGaussianWavelet[4] as his wavelet, he computed the continuous wavelet transform w(s,P) as above. To show us the answer, he created a scalogram:


    This is a 2-dimensional color plot showing roughly how big the continuous wavelet transform w(s,P) is for different shifts s and periods P. Blue means it’s very small, green means it’s bigger, yellow means even bigger and red means very large.

    Tahiti gave this:


    You’ll notice that the patterns at Darwin and Tahiti are similar in character, but notably different in detail. For example, the red spots, where our chosen wavelet shows up strongly with period of order ~100 months, occur at different times.

    Puzzle 1. What is the meaning of the ‘spikes’ in these scalograms? What sort of signal would give a spike of this sort?

    Puzzle 2. Do a Gabor transform, also known as a ‘windowed Fourier transform’, of the same data. Blake Pollard explained the Gabor transform in his article Milankovitch vs the Ice Ages. This is a way to see how much a signal wiggles at a given frequency at a given time: we multiply the signal by a shifted Gaussian and then takes its Fourier transform.

    Puzzle 3. Read about continuous wavelet transforms. If we want to reconstruct our signal x from its continuous wavelet transform, why should we use a wavelet y with

    \displaystyle{\int_{-\infty}^\infty y(t) \; d t = 0 ? }

    In fact we want a somewhat stronger condition, which is implied by the above equation when the Fourier transform of y is smooth and integrable:

    Continuous wavelet transform, Wikipedia.

    Another way to understand correlations

    David Tweed mentioned another approach from signal processing to understanding the quantity

    \displaystyle{  \langle x y \rangle = \frac{1}{n} \sum_{i = 1}^n x_i y_i }

    If we’ve got two lists of data x and y that we want to compare to see if they behave similarly, the first thing we ought to do is multiplicatively scale each one so they’re of comparable magnitude. There are various possibilities for assigning a scale, but a reasonable one is to ensure they have equal ‘energy’

    \displaystyle{  \sum_{i=1}^n x_i^2 = \sum_{i=1}^n y_i^2 }

    (This can be achieved by dividing each list by its standard deviation, which is equivalent to what was done in the main derivation above.) Once we’ve done that then it’s clear that looking at

    \displaystyle{  \sum_{i=1}^n (x_i-y_i)^2 }

    gives small values when they have a very good match and progressively bigger values as they become less similar. Observe that

    \begin{array}{ccl}  \displaystyle{\sum_{i=1}^n (x_i-y_i)^2 }  &=& \displaystyle{ \sum_{i=1}^n (x_i^2 - 2 x_i y_i + y_i^2) }\\  &=& \displaystyle{ \sum_{i=1}^n x_i^2 - 2 \sum_{i=1}^n x_i y_i + \sum_{i=1}^n y_i^2 }  \end{array}

    Since we’ve scaled things so that \sum_{i=1}^n x_i^2 and \sum_{i=1}^n y_i^2 are constants, we can see that when \sum_{i=1}^n x_i y_i becomes bigger,

    \displaystyle{ \sum_{i=1}^n (x_i-y_i)^2 }

    becomes smaller. So,

    \displaystyle{\sum_{i=1}^n x_i y_i}

    serves as a measure of how close the lists are, under these assumptions.


    The Stochastic Resonance Program (Part 1)

    10 May, 2014

    guest post by David Tanzer

    At the Azimuth Code Project, we are aiming to produce educational software that is relevant to the Earth sciences and the study of climate. Our present software takes the form of interactive web pages, which allow you to experiment with the parameters of models and view their outputs. But to fully understand the meaning of a program, we need to know about the concepts and theories that inform it. So we will be writing articles to explain both the programs themselves and the math and science behind them.

    In this two-part series, I’ll explain this program:

    Stochastic resonance.

    Check it out—it runs on your browser! It was created by Allan Erskine and Glyn Adgie. In the Azimuth blog article Increasing the Signal-to-Noise Ratio with More Noise, Glyn Adgie and Tim van Beek give a nice explanation of the idea of stochastic resonance, which includes some clear and exciting graphs.

    My goal today is give a compact, developer-oriented introduction to stochastic resonance, which will set the context for the next blog article, where I’ll dissect the program itself. By way of introduction, I am a software developer with research training in computer science. It’s a new area for me, and any clarifications will be welcome!

    The concept of stochastic resonance

    Stochastic resonance is a phenomenon, occurring under certain circumstances, in which a noise source may amplify the effect of a weak signal. This concept was used in an early hypothesis about the timing of ice-age cycles, and has since been applied to a wide range of phenomena, including neuronal detection mechanisms and patterns of traffic congestion.

    Suppose we have a signal detector whose internal, analog state is driven by an input signal, and suppose the analog states are partitioned into two regions, called “on” and “off” — this is a digital state, abstracted from the analog state. With a light switch, we could take the force as the input signal, the angle as the analog state, and the up/down classification of the angle as the digital state.

    Consider the effect of a periodic input signal on the digital state. Suppose the wave amplitude is not large enough to change the digital state, yet large enough to drive the analog state close to the digital state boundary. Then, a bit of random noise, occurring near the peak of an input cycle, may “tap” the system over to the other digital state. So we will see a probability of state-transitions that is synchronized with the input signal. In a complex way, the noise has amplified the input signal.

    But it’s a pretty funky amplifier! Here is a picture from the Azimuth library article on stochastic resonance:

    Stochastic resonance has been found in the signal detection mechanisms of neurons. There are, for example, cells in the tails of crayfish that are tuned to low-frequency signals in the water caused by predator motions. These signals are too weak to cross the firing threshold for the neurons, but with the right amount of noise, they do trigger the neurons.

    See:

    Stochastic resonance, Azimuth Library.

    Stochastic resonance in neurobiology, David Lyttle.

    Bistable stochastic resonance and Milankovitch theories of ice-age cycles

    Stochastic resonance was originally formulated in terms of systems that are bistable — where each digital state is the basin of attraction of a stable equilibrium.

    An early application of stochastic resonance was to a hypothesis, within the framework of bistable climate dynamics, about the timing of the ice-age cycles. Although it has not been confirmed, it remains of interest (1) historically, (2) because the timing of ice-age cycles remains an open problem, and (3) because the Milankovitch hypothesis upon which it rests is an active part of the current research.

    In the bistable model, the climate states are a cold, “snowball” Earth and a hot, iceless Earth. The snowball Earth is stable because it is white, and hence reflects solar energy, which keeps it frozen. The iceless Earth is stable because it is dark, and hence absorbs solar energy, which keeps it melted.

    The Milankovitch hypothesis states that the drivers of climate state change are long-duration cycles in the insolation — the solar energy received in the northern latitudes — caused by periodic changes in the Earth’s orbital parameters. The north is significant because that is where the glaciers are concentrated, and so a sufficient “pulse” in northern temperatures could initiate a state change.

    Three relevant astronomical cycles have been identified:

    • Changing of the eccentricity of the Earth’s elliptical orbit, with a period of 100 kiloyears

    • Changing of the obliquity (tilt) of the Earth’s axis, with a period of 41 kiloyears

    • Precession (swiveling) of the Earth’s axis, with a period of 23 kiloyears

    In the stochastic resonance hypothesis, the Milankovitch signal is amplified by random events to produce climate state changes. In more recent Milankovitch theories, a deterministic forcing mechanism is used. In a theory by Didier Paillard, the climate is modeled with three states, called interglacial, mild glacial and full glacial, and the state changes depend on the volume of ice as well as the insolation.

    See:

    Milankovitch cycle, Azimuth Library.

    Mathematics of the environment (part 10), John Baez. This gives an exposition of Paillard’s theory.

    Bistable systems defined by a potential function

    Any smooth function with two local minima can be used to define a bistable system. For instance, consider the function V(x) = x^4/4 - x^2/2:

    To define the bistable system, construct a differential equation where the time derivative of x is set to the negative of the derivative of the potential at x:

    dx/dt = -V'(x) = -x^3 + x = x(1 - x^2)

    So, for instance, where the potential graph is sloping upward as x increases, -V'(x) is negative, and this sends X(t) ‘downhill’ towards the minimum.

    The roots of V'(x) yield stable equilibria at 1 and -1, and an unstable equilibrium at 0. The latter separates the basins of attraction for the stable equilibria.

    Discrete stochastic resonance

    Now let’s look at a discrete-time model which exhibits stochastic resonance. This is the model used in the Azimuth demo program.

    We construct the discrete-time derivative, using the potential function, a sampled sine wave, and a normally distributed random number:

    \Delta X_t = -V'(X_t) * \Delta t + \mathrm{Wave}(t) + \mathrm{Noise}(t) =
    X_t (1 - X_t^2) \Delta t + \alpha * \sin(\omega t) + \beta * \mathrm{GaussianSample}(t)

    where \Delta t is a constant and t is restricted to multiples of \Delta t.

    This equation is the discrete-time counterpart to a continuous-time stochastic differential equation.

    Next time, we will look into the Azimuth demo program itself.


    Noether’s Theorem: Quantum vs Stochastic

    3 May, 2014

    guest post by Ville Bergholm

    In 1915 Emmy Noether discovered an important connection between the symmetries of a system and its conserved quantities. Her result has become a staple of modern physics and is known as Noether’s theorem.

    Photo of Emmy Noether

    The theorem and its generalizations have found particularly wide use in quantum theory. Those of you following the Network Theory series here on Azimuth might recall Part 11 where John Baez and Brendan Fong proved a version of Noether’s theorem for stochastic systems. Their result is now published here:

    • John Baez and Brendan Fong, A Noether theorem for stochastic mechanics, J. Math. Phys. 54:013301 (2013).

    One goal of the network theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. John and Brendan proved their stochastic version of Noether’s theorem by exploiting ‘stochastic mechanics’ which was formulated in the network theory series to mathematically resemble quantum theory. Their result, which we will outline below, was different than what would be expected in quantum theory, so it is interesting to try to figure out why.

    Recently Jacob Biamonte, Mauro Faccin and myself have been working to try to get to the bottom of these differences. What we’ve done is prove a version of Noether’s theorem for Dirichlet operators. As you may recall from Parts 16 and 20 of the network theory series, these are the operators that generate both stochastic and quantum processes. In the language of the series, they lie in the intersection of stochastic and quantum mechanics. So, they are a subclass of the infinitesimal stochastic operators considered in John and Brendan’s work.

    The extra structure of Dirichlet operators—compared with the wider class of infinitesimal stochastic operators—provided a handle for us to dig a little deeper into understanding the intersection of these two theories. By the end of this article, astute readers will be able to prove that Dirichlet operators generate doubly stochastic processes.

    Before we get into the details of our proof, let’s recall first how conservation laws work in quantum mechanics, and then contrast this with what John and Brendan discovered for stochastic systems. (For a more detailed comparison between the stochastic and quantum versions of the theorem, see Part 13 of the network theory series.)

    The quantum case

    I’ll assume you’re familiar with quantum theory, but let’s start with a few reminders.

    In standard quantum theory, when we have a closed system with n states, the unitary time evolution of a state |\psi(t)\rangle is generated by a self-adjoint n \times n matrix H called the Hamiltonian. In other words, |\psi(t)\rangle satisfies Schrödinger’s equation:

    i \hbar \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle.

    The state of a system starting off at time zero in the state |\psi_0 \rangle and evolving for a time t is then given by

    |\psi(t) \rangle = e^{-i t H}|\psi_0 \rangle.

    The observable properties of a quantum system are associated with self-adjoint operators. In the state |\psi \rangle, the expected value of the observable associated to a self-adjoint operator O is

    \langle O \rangle_{\psi} = \langle \psi | O | \psi \rangle

    This expected value is constant in time for all states if and only if O commutes with the Hamiltonian H:

    [O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t.

    In this case we say O is a ‘conserved quantity’. The fact that we have two equivalent conditions for this is a quantum version of Noether’s theorem!

    The stochastic case

    In stochastic mechanics, the story changes a bit. Now a state |\psi(t)\rangle is a probability distribution: a vector with n nonnegative components that sum to 1. Schrödinger’s equation gets replaced by the master equation:

    \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle

    If we start with a probability distribution |\psi_0 \rangle at time zero and evolve it according to this equation, at any later time have

    |\psi(t)\rangle = e^{t H} |\psi_0 \rangle.

    We want this always be a probability distribution. To ensure that this is so, the Hamiltonian H must be infinitesimal stochastic: that is, a real-valued n \times n matrix where the off-diagonal entries are nonnegative and the entries of each column sum to zero. It no longer needs to be self-adjoint!

    When H is infinitesimal stochastic, the operators e^{t H} map the set of probability distributions to itself whenever t \ge 0, and we call this family of operators a continuous-time Markov process, or more precisely a Markov semigroup.

    In stochastic mechanics, we say an observable O is a real diagonal n \times n matrix, and its expected value is given by

    \langle O\rangle_{\psi} = \langle \hat{O} | \psi \rangle

    where \hat{O} is the vector built from the diagonal entries of O. More concretely,

    \langle O\rangle_{\psi} = \displaystyle{ \sum_i O_{i i} \psi_i }

    where \psi_i is the ith component of the vector |\psi\rangle.

    Here is a version of Noether’s theorem for stochastic mechanics:

    Noether’s Theorem for Markov Processes (Baez–Fong). Suppose H is an infinitesimal stochastic operator and O is an observable. Then

    [O,H] =0

    if and only if

    \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0

    and

    \displaystyle{\frac{d}{d t}}\langle O^2 \rangle_{\psi(t)} = 0

    for all t \ge 0 and all \psi(t) obeying the master equation.   █

    So, just as in quantum mechanics, whenever [O,H]=0 the expected value of O will be conserved:

    \displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

    for any \psi_0 and all t \ge 0. However, John and Brendan saw that—unlike in quantum mechanics—you need more than just the expectation value of the observable O to be constant to obtain the equation [O,H]=0. You really need both

    \displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0

    together with

    \displaystyle{\frac{d}{d t}} \langle O^2\rangle_{\psi(t)} = 0

    for all initial data \psi_0 to be sure that [O,H]=0.

    So it’s a bit subtle, but symmetries and conserved quantities have a rather different relationship than they do in quantum theory.

    A Noether theorem for Dirichlet operators

    But what if the infinitesimal generator of our Markov semigroup is also self-adjoint? In other words, what if H is both an infinitesimal stochastic matrix but also its own transpose: H = H^\top? Then it’s called a Dirichlet operator… and we found that in this case, we get a stochastic version of Noether’s theorem that more closely resembles the usual quantum one:

    Noether’s Theorem for Dirichlet Operators. If H is a Dirichlet operator and O is an observable, then

    [O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

    Proof. The \Rightarrow direction is easy to show, and it follows from John and Brendan’s theorem. The point is to show the \Leftarrow direction. Since H is self-adjoint, we may use a spectral decomposition:

    H = \displaystyle{ \sum_k E_k |\phi_k \rangle \langle \phi_k |}

    where \phi_k are an orthonormal basis of eigenvectors, and E_k are the corresponding eigenvalues. We then have:

    \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = \langle \hat{O} | H e^{t H} |\psi_0 \rangle = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0

    \iff \quad \langle \hat{O}| H e^{t H} = 0 \quad \forall t \ge 0

    \iff \quad \sum_k \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} \langle \phi_k| = 0 \quad \forall t \ge 0

    \iff \quad \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} = 0 \quad \forall t \ge 0

    \iff \quad |\hat{O} \rangle \in \mathrm{Span}\{|\phi_k \rangle \, : \; E_k = 0\} = \ker \: H,

    where the third equivalence is due to the vectors |\phi_k \rangle being linearly independent. For any infinitesimal stochastic operator H the corresponding transition graph consists of m connected components iff we can reorder (permute) the states of the system such that H becomes block-diagonal with m blocks. Now it is easy to see that the kernel of H is spanned by m eigenvectors, one for each block. Since H is also symmetric, the elements of each such vector can be chosen to be ones within the block and zeros outside it. Consequently

    |\hat{O} \rangle \in \ker \: H

    implies that we can choose the basis of eigenvectors of O to be the vectors |\phi_k \rangle, which implies

    [O, H] = 0

    Alternatively,

    |\hat{O} \rangle \in \ker \, H

    implies that

    |\hat{O^2} \rangle \in \ker \: H \; \iff \; \cdots \; \iff \; \displaystyle{\frac{d}{d t}} \langle O^2 \rangle_{\psi(t)} = 0 \; \forall \: |\psi_0 \rangle, \forall t \ge 0,

    where we have used the above sequence of equivalences backwards. Now, using John and Brendan’s original proof, we can obtain [O, H] = 0.   █

    In summary, by restricting ourselves to the intersection of quantum and stochastic generators, we have found a version of Noether’s theorem for stochastic mechanics that looks formally just like the quantum version! However, this simplification comes at a cost. We find that the only observables O whose expected value remains constant with time are those of the very restricted type described above, where the observable has the same value in every state in a connected component.

    Puzzles

    Suppose we have a graph whose graph Laplacian matrix H generates a Markov semigroup as follows:

    U(t) = e^{t H}

    Puzzle 1. Suppose that also H = H^\top, so that H is a Dirichlet operator and hence i H generates a 1-parameter unitary group. Show that the indegree and outdegree of any node of our graph must be equal. Graphs with this property are called balanced.

    Puzzle 2. Suppose that U(t) = e^{t H} is doubly stochastic Markov semigroup, meaning that for all t \ge 0 each row and each column of U(t) sums to 1:

    \displaystyle{ \sum_i U(t)_{i j} = \sum_j U(t)_{i j} = 1 }

    and all the matrix entries are nonnegative. Show that the Hamiltonian H obeys

    \displaystyle{\sum_i H_{i j} = \sum_j H_{i j} = 0 }

    and all the off-diagonal entries of H are nonnegative. Show the converse is also true.

    Puzzle 3. Prove that any doubly stochastic Markov semigroup U(t) is of the form e^{t H} where H is the graph Laplacian of a balanced graph.

    Puzzle 4. Let O(t) be a possibly time-dependent observable, and write \langle O(t) \rangle_{\psi(t)} for its expected value with respect to some initial state \psi_0 evolving according to the master equation. Show that

    \displaystyle{ \frac{d}{d t}\langle O(t)\rangle_{\psi(t)} = \left\langle [O(t), H] \right\rangle_{\psi(t)} + \left\langle \frac{\partial O(t)}{\partial t}\right\rangle_{\psi(t)} }

    This is a stochastic version of the Ehrenfest theorem.


    Network Theory III

    16 March, 2014

     

    In the last of my Oxford talks I explain how entropy and relative entropy can be understood using certain categories related to probability theory… and how these categories also let us understand Bayesian networks!

    The first two parts are explanations of these papers:

    • John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss

    • John Baez and Tobias Fritz, A Bayesian characterization of relative entropy.

    Somewhere around here the talk was interrupted by a fire drill, waking up the entire audience!

    By the way, in my talk I mistakenly said that relative entropy is a continuous functor; in fact it’s just lower semicontinuous. I’ve fixed this in my slides.

    The third part of my talk was my own interpretation of Brendan Fong’s master’s thesis:

    • Brendan Fong, Causal Theories: a Categorical Perspective on Bayesian Networks.

    I took a slightly different approach, by saying that a causal theory \mathcal{C}_G is the free category with products on certain objects and morphisms coming from a directed acyclic graph G. In his thesis he said \mathcal{C}_G was the free symmetric monoidal category where each generating object is equipped with a cocommutative comonoid structure. This is close to a category with finite products, though perhaps not quite the same: a symmetric monoidal category where every object is equipped with a cocommutative comonoid structure in a natural way (i.e., making a bunch of squares commute) is a category with finite products. It would be interesting to see if this difference hurts or helps.

    By making this slight change, I am claiming that causal theories can be seen as algebraic theories in the sense of Lawvere. This would be a very good thing, since we know a lot about those.

    You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!


    Network Theory II

    12 March, 2014

     

    Chemists are secretly doing applied category theory! When chemists list a bunch of chemical reactions like

    C + O₂ → CO₂

    they are secretly describing a ‘category’.

    That shouldn’t be surprising. A category is simply a collection of things called objects together with things called morphisms going from one object to another, often written

    f: x → y

    The rules of a category say:

    1) we can compose a morphism f: x → y and another morphism g: y → z to get an arrow gf: x → z,

    2) (hg)f = h(gf), so we don’t need to bother with parentheses when composing arrows,

    3) every object x has an identity morphism 1ₓ: x → x that obeys 1ₓ f = f and f 1ₓ = f.

    Whenever we have a bunch of things (objects) and processes (arrows) that take one thing to another, we’re likely to have a category. In chemistry, the objects are bunches of molecules and the arrows are chemical reactions. But we can ‘add’ bunches of molecules and also add reactions, so we have something more than a mere category: we have something called a symmetric monoidal category.

    My talk here, part of a series, is an explanation of this viewpoint and how we can use it to take ideas from elementary particle physics and apply them to chemistry! For more details try this free book:

    • John Baez and Jacob Biamonte, A Course on Quantum Techniques for Stochastic Mechanics.

    as well as this paper on the Anderson–Craciun–Kurtz theorem (discussed in my talk):

    • John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

    You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!


    Follow

    Get every new post delivered to your Inbox.

    Join 3,222 other followers