## The Stochastic Resonance Program (Part 1)

10 May, 2014

guest post by David Tanzer

At the Azimuth Code Project, we are aiming to produce educational software that is relevant to the Earth sciences and the study of climate. Our present software takes the form of interactive web pages, which allow you to experiment with the parameters of models and view their outputs. But to fully understand the meaning of a program, we need to know about the concepts and theories that inform it. So we will be writing articles to explain both the programs themselves and the math and science behind them.

In this two-part series, I’ll explain this program:

Check it out—it runs on your browser! It was created by Allan Erskine and Glyn Adgie. In the Azimuth blog article Increasing the Signal-to-Noise Ratio with More Noise, Glyn Adgie and Tim van Beek give a nice explanation of the idea of stochastic resonance, which includes some clear and exciting graphs.

My goal today is give a compact, developer-oriented introduction to stochastic resonance, which will set the context for the next blog article, where I’ll dissect the program itself. By way of introduction, I am a software developer with research training in computer science. It’s a new area for me, and any clarifications will be welcome!

### The concept of stochastic resonance

Stochastic resonance is a phenomenon, occurring under certain circumstances, in which a noise source may amplify the effect of a weak signal. This concept was used in an early hypothesis about the timing of ice-age cycles, and has since been applied to a wide range of phenomena, including neuronal detection mechanisms and patterns of traffic congestion.

Suppose we have a signal detector whose internal, analog state is driven by an input signal, and suppose the analog states are partitioned into two regions, called “on” and “off” — this is a digital state, abstracted from the analog state. With a light switch, we could take the force as the input signal, the angle as the analog state, and the up/down classification of the angle as the digital state.

Consider the effect of a periodic input signal on the digital state. Suppose the wave amplitude is not large enough to change the digital state, yet large enough to drive the analog state close to the digital state boundary. Then, a bit of random noise, occurring near the peak of an input cycle, may “tap” the system over to the other digital state. So we will see a probability of state-transitions that is synchronized with the input signal. In a complex way, the noise has amplified the input signal.

But it’s a pretty funky amplifier! Here is a picture from the Azimuth library article on stochastic resonance:

Stochastic resonance has been found in the signal detection mechanisms of neurons. There are, for example, cells in the tails of crayfish that are tuned to low-frequency signals in the water caused by predator motions. These signals are too weak to cross the firing threshold for the neurons, but with the right amount of noise, they do trigger the neurons.

See:

Stochastic resonance, Azimuth Library.

Stochastic resonance in neurobiology, David Lyttle.

### Bistable stochastic resonance and Milankovitch theories of ice-age cycles

Stochastic resonance was originally formulated in terms of systems that are bistable — where each digital state is the basin of attraction of a stable equilibrium.

An early application of stochastic resonance was to a hypothesis, within the framework of bistable climate dynamics, about the timing of the ice-age cycles. Although it has not been confirmed, it remains of interest (1) historically, (2) because the timing of ice-age cycles remains an open problem, and (3) because the Milankovitch hypothesis upon which it rests is an active part of the current research.

In the bistable model, the climate states are a cold, “snowball” Earth and a hot, iceless Earth. The snowball Earth is stable because it is white, and hence reflects solar energy, which keeps it frozen. The iceless Earth is stable because it is dark, and hence absorbs solar energy, which keeps it melted.

The Milankovitch hypothesis states that the drivers of climate state change are long-duration cycles in the insolation — the solar energy received in the northern latitudes — caused by periodic changes in the Earth’s orbital parameters. The north is significant because that is where the glaciers are concentrated, and so a sufficient “pulse” in northern temperatures could initiate a state change.

Three relevant astronomical cycles have been identified:

• Changing of the eccentricity of the Earth’s elliptical orbit, with a period of 100 kiloyears

• Changing of the obliquity (tilt) of the Earth’s axis, with a period of 41 kiloyears

• Precession (swiveling) of the Earth’s axis, with a period of 23 kiloyears

In the stochastic resonance hypothesis, the Milankovitch signal is amplified by random events to produce climate state changes. In more recent Milankovitch theories, a deterministic forcing mechanism is used. In a theory by Didier Paillard, the climate is modeled with three states, called interglacial, mild glacial and full glacial, and the state changes depend on the volume of ice as well as the insolation.

See:

Milankovitch cycle, Azimuth Library.

Mathematics of the environment (part 10), John Baez. This gives an exposition of Paillard’s theory.

### Bistable systems defined by a potential function

Any smooth function with two local minima can be used to define a bistable system. For instance, consider the function $V(x) = x^4/4 - x^2/2$:

To define the bistable system, construct a differential equation where the time derivative of x is set to the negative of the derivative of the potential at x:

$dx/dt = -V'(x) = -x^3 + x = x(1 - x^2)$

So, for instance, where the potential graph is sloping upward as $x$ increases, $-V'(x)$ is negative, and this sends $X(t)$ ‘downhill’ towards the minimum.

The roots of $V'(x)$ yield stable equilibria at 1 and -1, and an unstable equilibrium at 0. The latter separates the basins of attraction for the stable equilibria.

### Discrete stochastic resonance

Now let’s look at a discrete-time model which exhibits stochastic resonance. This is the model used in the Azimuth demo program.

We construct the discrete-time derivative, using the potential function, a sampled sine wave, and a normally distributed random number:

$\Delta X_t = -V'(X_t) * \Delta t + \mathrm{Wave}(t) + \mathrm{Noise}(t) =$
$X_t (1 - X_t^2) \Delta t + \alpha * \sin(\omega t) + \beta * \mathrm{GaussianSample}(t)$

where $\Delta t$ is a constant and $t$ is restricted to multiples of $\Delta t.$

This equation is the discrete-time counterpart to a continuous-time stochastic differential equation.

Next time, we will look into the Azimuth demo program itself.

## Noether’s Theorem: Quantum vs Stochastic

3 May, 2014

guest post by Ville Bergholm

In 1915 Emmy Noether discovered an important connection between the symmetries of a system and its conserved quantities. Her result has become a staple of modern physics and is known as Noether’s theorem.

The theorem and its generalizations have found particularly wide use in quantum theory. Those of you following the Network Theory series here on Azimuth might recall Part 11 where John Baez and Brendan Fong proved a version of Noether’s theorem for stochastic systems. Their result is now published here:

• John Baez and Brendan Fong, A Noether theorem for stochastic mechanics, J. Math. Phys. 54:013301 (2013).

One goal of the network theory series here on Azimuth has been to merge ideas appearing in quantum theory with other disciplines. John and Brendan proved their stochastic version of Noether’s theorem by exploiting ‘stochastic mechanics’ which was formulated in the network theory series to mathematically resemble quantum theory. Their result, which we will outline below, was different than what would be expected in quantum theory, so it is interesting to try to figure out why.

Recently Jacob Biamonte, Mauro Faccin and myself have been working to try to get to the bottom of these differences. What we’ve done is prove a version of Noether’s theorem for Dirichlet operators. As you may recall from Parts 16 and 20 of the network theory series, these are the operators that generate both stochastic and quantum processes. In the language of the series, they lie in the intersection of stochastic and quantum mechanics. So, they are a subclass of the infinitesimal stochastic operators considered in John and Brendan’s work.

The extra structure of Dirichlet operators—compared with the wider class of infinitesimal stochastic operators—provided a handle for us to dig a little deeper into understanding the intersection of these two theories. By the end of this article, astute readers will be able to prove that Dirichlet operators generate doubly stochastic processes.

Before we get into the details of our proof, let’s recall first how conservation laws work in quantum mechanics, and then contrast this with what John and Brendan discovered for stochastic systems. (For a more detailed comparison between the stochastic and quantum versions of the theorem, see Part 13 of the network theory series.)

### The quantum case

I’ll assume you’re familiar with quantum theory, but let’s start with a few reminders.

In standard quantum theory, when we have a closed system with $n$ states, the unitary time evolution of a state $|\psi(t)\rangle$ is generated by a self-adjoint $n \times n$ matrix $H$ called the Hamiltonian. In other words, $|\psi(t)\rangle$ satisfies Schrödinger’s equation:

$i \hbar \displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle.$

The state of a system starting off at time zero in the state $|\psi_0 \rangle$ and evolving for a time $t$ is then given by

$|\psi(t) \rangle = e^{-i t H}|\psi_0 \rangle.$

The observable properties of a quantum system are associated with self-adjoint operators. In the state $|\psi \rangle,$ the expected value of the observable associated to a self-adjoint operator $O$ is

$\langle O \rangle_{\psi} = \langle \psi | O | \psi \rangle$

This expected value is constant in time for all states if and only if $O$ commutes with the Hamiltonian $H$:

$[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t.$

In this case we say $O$ is a ‘conserved quantity’. The fact that we have two equivalent conditions for this is a quantum version of Noether’s theorem!

### The stochastic case

In stochastic mechanics, the story changes a bit. Now a state $|\psi(t)\rangle$ is a probability distribution: a vector with $n$ nonnegative components that sum to 1. Schrödinger’s equation gets replaced by the master equation:

$\displaystyle{\frac{d}{d t}} |\psi(t) \rangle = H |\psi(t) \rangle$

If we start with a probability distribution $|\psi_0 \rangle$ at time zero and evolve it according to this equation, at any later time have

$|\psi(t)\rangle = e^{t H} |\psi_0 \rangle.$

We want this always be a probability distribution. To ensure that this is so, the Hamiltonian $H$ must be infinitesimal stochastic: that is, a real-valued $n \times n$ matrix where the off-diagonal entries are nonnegative and the entries of each column sum to zero. It no longer needs to be self-adjoint!

When $H$ is infinitesimal stochastic, the operators $e^{t H}$ map the set of probability distributions to itself whenever $t \ge 0,$ and we call this family of operators a continuous-time Markov process, or more precisely a Markov semigroup.

In stochastic mechanics, we say an observable $O$ is a real diagonal $n \times n$ matrix, and its expected value is given by

$\langle O\rangle_{\psi} = \langle \hat{O} | \psi \rangle$

where $\hat{O}$ is the vector built from the diagonal entries of $O.$ More concretely,

$\langle O\rangle_{\psi} = \displaystyle{ \sum_i O_{i i} \psi_i }$

where $\psi_i$ is the $i$th component of the vector $|\psi\rangle.$

Here is a version of Noether’s theorem for stochastic mechanics:

Noether’s Theorem for Markov Processes (Baez–Fong). Suppose $H$ is an infinitesimal stochastic operator and $O$ is an observable. Then

$[O,H] =0$

if and only if

$\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0$

and

$\displaystyle{\frac{d}{d t}}\langle O^2 \rangle_{\psi(t)} = 0$

for all $t \ge 0$ and all $\psi(t)$ obeying the master equation.   █

So, just as in quantum mechanics, whenever $[O,H]=0$ the expected value of $O$ will be conserved:

$\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0$

for any $\psi_0$ and all $t \ge 0.$ However, John and Brendan saw that—unlike in quantum mechanics—you need more than just the expectation value of the observable $O$ to be constant to obtain the equation $[O,H]=0.$ You really need both

$\displaystyle{\frac{d}{d t}} \langle O\rangle_{\psi(t)} = 0$

together with

$\displaystyle{\frac{d}{d t}} \langle O^2\rangle_{\psi(t)} = 0$

for all initial data $\psi_0$ to be sure that $[O,H]=0.$

So it’s a bit subtle, but symmetries and conserved quantities have a rather different relationship than they do in quantum theory.

### A Noether theorem for Dirichlet operators

But what if the infinitesimal generator of our Markov semigroup is also self-adjoint? In other words, what if $H$ is both an infinitesimal stochastic matrix but also its own transpose: $H = H^\top$? Then it’s called a Dirichlet operator… and we found that in this case, we get a stochastic version of Noether’s theorem that more closely resembles the usual quantum one:

Noether’s Theorem for Dirichlet Operators. If $H$ is a Dirichlet operator and $O$ is an observable, then

$[O, H] = 0 \quad \iff \quad \displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0$

Proof. The $\Rightarrow$ direction is easy to show, and it follows from John and Brendan’s theorem. The point is to show the $\Leftarrow$ direction. Since $H$ is self-adjoint, we may use a spectral decomposition:

$H = \displaystyle{ \sum_k E_k |\phi_k \rangle \langle \phi_k |}$

where $\phi_k$ are an orthonormal basis of eigenvectors, and $E_k$ are the corresponding eigenvalues. We then have:

$\displaystyle{\frac{d}{d t}} \langle O \rangle_{\psi(t)} = \langle \hat{O} | H e^{t H} |\psi_0 \rangle = 0 \quad \forall \: |\psi_0 \rangle, \forall t \ge 0$

$\iff \quad \langle \hat{O}| H e^{t H} = 0 \quad \forall t \ge 0$

$\iff \quad \sum_k \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} \langle \phi_k| = 0 \quad \forall t \ge 0$

$\iff \quad \langle \hat{O} | \phi_k \rangle E_k e^{t E_k} = 0 \quad \forall t \ge 0$

$\iff \quad |\hat{O} \rangle \in \mathrm{Span}\{|\phi_k \rangle \, : \; E_k = 0\} = \ker \: H,$

where the third equivalence is due to the vectors $|\phi_k \rangle$ being linearly independent. For any infinitesimal stochastic operator $H$ the corresponding transition graph consists of $m$ connected components iff we can reorder (permute) the states of the system such that $H$ becomes block-diagonal with $m$ blocks. Now it is easy to see that the kernel of $H$ is spanned by $m$ eigenvectors, one for each block. Since $H$ is also symmetric, the elements of each such vector can be chosen to be ones within the block and zeros outside it. Consequently

$|\hat{O} \rangle \in \ker \: H$

implies that we can choose the basis of eigenvectors of $O$ to be the vectors $|\phi_k \rangle,$ which implies

$[O, H] = 0$

Alternatively,

$|\hat{O} \rangle \in \ker \, H$

implies that

$|\hat{O^2} \rangle \in \ker \: H \; \iff \; \cdots \; \iff \; \displaystyle{\frac{d}{d t}} \langle O^2 \rangle_{\psi(t)} = 0 \; \forall \: |\psi_0 \rangle, \forall t \ge 0,$

where we have used the above sequence of equivalences backwards. Now, using John and Brendan’s original proof, we can obtain $[O, H] = 0.$   █

In summary, by restricting ourselves to the intersection of quantum and stochastic generators, we have found a version of Noether’s theorem for stochastic mechanics that looks formally just like the quantum version! However, this simplification comes at a cost. We find that the only observables $O$ whose expected value remains constant with time are those of the very restricted type described above, where the observable has the same value in every state in a connected component.

### Puzzles

Suppose we have a graph whose graph Laplacian matrix $H$ generates a Markov semigroup as follows:

$U(t) = e^{t H}$

Puzzle 1. Suppose that also $H = H^\top,$ so that $H$ is a Dirichlet operator and hence $i H$ generates a 1-parameter unitary group. Show that the indegree and outdegree of any node of our graph must be equal. Graphs with this property are called balanced.

Puzzle 2. Suppose that $U(t) = e^{t H}$ is doubly stochastic Markov semigroup, meaning that for all $t \ge 0$ each row and each column of $U(t)$ sums to 1:

$\displaystyle{ \sum_i U(t)_{i j} = \sum_j U(t)_{i j} = 1 }$

and all the matrix entries are nonnegative. Show that the Hamiltonian $H$ obeys

$\displaystyle{\sum_i H_{i j} = \sum_j H_{i j} = 0 }$

and all the off-diagonal entries of $H$ are nonnegative. Show the converse is also true.

Puzzle 3. Prove that any doubly stochastic Markov semigroup $U(t)$ is of the form $e^{t H}$ where $H$ is the graph Laplacian of a balanced graph.

Puzzle 4. Let $O(t)$ be a possibly time-dependent observable, and write $\langle O(t) \rangle_{\psi(t)}$ for its expected value with respect to some initial state $\psi_0$ evolving according to the master equation. Show that

$\displaystyle{ \frac{d}{d t}\langle O(t)\rangle_{\psi(t)} = \left\langle [O(t), H] \right\rangle_{\psi(t)} + \left\langle \frac{\partial O(t)}{\partial t}\right\rangle_{\psi(t)} }$

This is a stochastic version of the Ehrenfest theorem.

## Network Theory III

16 March, 2014

In the last of my Oxford talks I explain how entropy and relative entropy can be understood using certain categories related to probability theory… and how these categories also let us understand Bayesian networks!

The first two parts are explanations of these papers:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy.

Somewhere around here the talk was interrupted by a fire drill, waking up the entire audience!

By the way, in my talk I mistakenly said that relative entropy is a continuous functor; in fact it’s just lower semicontinuous. I’ve fixed this in my slides.

The third part of my talk was my own interpretation of Brendan Fong’s master’s thesis:

• Brendan Fong, Causal Theories: a Categorical Perspective on Bayesian Networks.

I took a slightly different approach, by saying that a causal theory $\mathcal{C}_G$ is the free category with products on certain objects and morphisms coming from a directed acyclic graph $G$. In his thesis he said $\mathcal{C}_G$ was the free symmetric monoidal category where each generating object is equipped with a cocommutative comonoid structure. This is close to a category with finite products, though perhaps not quite the same: a symmetric monoidal category where every object is equipped with a cocommutative comonoid structure in a natural way (i.e., making a bunch of squares commute) is a category with finite products. It would be interesting to see if this difference hurts or helps.

By making this slight change, I am claiming that causal theories can be seen as algebraic theories in the sense of Lawvere. This would be a very good thing, since we know a lot about those.

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!

## Network Theory II

12 March, 2014

Chemists are secretly doing applied category theory! When chemists list a bunch of chemical reactions like

C + O₂ → CO₂

they are secretly describing a ‘category’.

That shouldn’t be surprising. A category is simply a collection of things called objects together with things called morphisms going from one object to another, often written

f: x → y

The rules of a category say:

1) we can compose a morphism f: x → y and another morphism g: y → z to get an arrow gf: x → z,

2) (hg)f = h(gf), so we don’t need to bother with parentheses when composing arrows,

3) every object x has an identity morphism 1ₓ: x → x that obeys 1ₓ f = f and f 1ₓ = f.

Whenever we have a bunch of things (objects) and processes (arrows) that take one thing to another, we’re likely to have a category. In chemistry, the objects are bunches of molecules and the arrows are chemical reactions. But we can ‘add’ bunches of molecules and also add reactions, so we have something more than a mere category: we have something called a symmetric monoidal category.

My talk here, part of a series, is an explanation of this viewpoint and how we can use it to take ideas from elementary particle physics and apply them to chemistry! ﻿For more details try this free book:

• John Baez and Jacob Biamonte, A Course on Quantum Techniques for Stochastic Mechanics.

as well as this paper on the Anderson–Craciun–Kurtz theorem (discussed in my talk):

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

You can also see the slides of this talk. Click on any picture in the slides, or any text in blue, and get more information!

## Markov Models of Social Change (Part 1)

24 February, 2014

guest post by Alastair Jamieson-Lane

The world is complex, and making choices in a complex world is sometimes difficult.

As any leader knows, decisions must often be made with incomplete information. To make matters worse, the experts and scientists who are meant to advise on these important matters are also doing so with incomplete information—usually limited to only one or two specialist fields. When decisions need to be made that are dependent on multiple real-world systems, and your various advisors find it difficult to communicate, this can be problematic!

The generally accepted approach is to listen to whichever advisor tells you the things you want to hear.

When such an approach fails (for whatever mysterious and inexplicable reason) it might be prudent to consider such approaches as Bayesian inference, analysis of competing hypotheses or cross-impact balance analysis.

Because these methods require experts to formalize their opinions in an explicit, discipline neutral manner, we avoid many of the problems mentioned above. Also, if everything goes horribly wrong, you can blame the algorithm, and send the rioting public down to the local university to complain there.

In this blog article I will describe cross-impact balance analysis and a recent extension to this method, explaining its use, as well as some basic mathematical underpinnings. No familiarity with cross-impact balance analysis will be required.

### Wait—who is this guy?

Since this is my first time writing a blog post here, I hear introductions are in order.

Hi. I’m Alastair.

I am currently a Master’s student at the University of British Columbia, studying mathematics. In particular, I’m aiming to use evolutionary game theory to study academic publishing and hiring practices… and from there hopefully move on to studying governments (we’ll see how the PhD goes). I figure that both those systems seem important to solving the problems we’ve built for ourselves, and both may be under increasing pressure in coming years.

But that’s not what I’m here for today! Today I’m here to tell the story of cross-impact balance analysis, a tool I was introduced to at the complex systems summer school in Santa Fe.

### The story

Suppose (for example) that the local oracle has foretold that burning the forests will anger the nature gods

… and that if you do not put restrictions in place, your crops will wither and die.

Well, that doesn’t sound very good.

The merchant’s guild claims that such restrictions will cause all trade to grind to a halt.

Your most trusted generals point out that weakened trade will leave you vulnerable to invasion from all neighboring kingdoms.

The sailors guild adds that the wrath of Poseidon might make nautical trade more difficult.

The alchemists propose alternative sources of heat…

… while the druids propose special crops as a way of resisting the wrath of the gods…

… and so on.

Given this complex web of interaction, it might be a good time to consult the philosophers.

### Overview of CIB

This brings us to the question of what CIB (Cross-Impact Balance) analysis is, and how to use it.

At its heart, CIB analysis demands this: first, you must consider what aspects of the world you are interested in studying. This could be environmental or economic status, military expenditure, or the laws governing genetic modification. These we refer to as “descriptors”. For each “descriptor” we must create a list of possible “states”.

For example, if the descriptor we are interested in were “global temperature change” our states might be “+5 degree”, “+4 degrees” and so on down to “-2 degrees”.

The states of a descriptor are not meant to be all-encompassing, or offer complete detail, and they need not be numerical. For example, the descriptor “Agricultural policy” might have such states as “Permaculture subsidy”, “Genetic engineering”, “Intensive farming” or “No policy”.

For each of these states, we ask our panel of experts whether such a state would increase or decrease the tendency for some other descriptor to be in a particular state.

For example, we might ask: “On a scale from -3 to 3, how much does the agricultural policy of Intensive farming increase the probability that we will see global temperature increases of +2 degrees?”

By combining the opinions of a variety of experts in each field, and weighting based on certainty and expertise, we are able to construct matrices, much like the one below:

The above matrix is a description of my ant farm. The health of my colony is determined by the population, income, and education levels of my ants. For a less ant focused version of the above, please refer to:

• Elisabeth A. Lloyd and Vanessa J. Schweizer, Objectivity and a comparison of methodological scenario approaches for climate change research, Synthese (2013).

For any possible combination of descriptor states (referred to as a scenario) we can calculate the total impact on all possible descriptors. In the current scenario we have low population, high income and medium education (see highlighted rows).

Because the current scenario has high ant income, this strongly influences us to have low population (+3) and prevents a jump to high population (-3). This combined with the non-influence from education (zeros) leads to low population being the most favoured state for our population descriptor. Thus we expect no change. We say this is “consistent”.

Education however sees a different story. Here we have a strong influence towards high education levels (summing the column gives a total of 13). Thus our current state (medium education) is inconsistent, and we would expect the abundance of ant wealth to lead to an improvements in the ant schooling system.

Classical CIB analysis acts as a way to classify which hypothetical situations are consistent, and which are not.

Now, it is all well and good to claim that some scenarios are stable, but the real use of such a tool is in predicting (and influencing) the future.

By applying a deterministic rule that determines how inconsistencies are resolved, we can produce a “succession rule”. The most straight-forward example is to replace all descriptor states with whichever state is most favoured by the current scenario. In the example above we would switch to “Low population, medium income, high education”. A generation later we would switch back to “Low population, High income, medium education”, soon finding ourselves trapped in a loop.

All such rules will always lead to either a loop or a “sink”: a self consistent scenario which is succeeded only by itself.

So, how can we use this? How will this help us deal with the wrath of the gods (or ant farms)?

Firstly: we can identify loops and consistent scenarios which we believe are most favourable. It’s all well and good imagining some future utopia, but if it is inconsistent with itself, and will immediately lead to a slide into less favourable scenarios then we should not aim for it, we should find that most favourable realistic scenario and aim for that one.

Secondly: We can examine all our consistent scenarios, and determine whose “basin of attraction” we find ourselves in: that is, which scenario are we likely to end up in.

Thirdly: Suppose we could change our influence matrix slightly? How would we change it to favour scenarios we most prefer? If you don’t like the rules, change the game—or at the very least find out WHAT we would need to change to have the best effect.

### Concerns and caveats

So… what are the problems we might encounter? What are the drawbacks?

Well, first of all, we note that the real world does not tend to reach any form of eternal static scenario or perfect cycle. The fact that our model does might be regarded as reason for suspicion.

Secondly, although the classical method contains succession analysis, this analysis is not necessarily intended as a completely literal “prediction” of events. It gives a rough idea of the basins of attraction of our cycles and consistent scenarios, but is also somewhat arbitrary. What succession rule is most appropriate? Do all descriptors update simultaneously? Or only the one with the most “pressure”? Are our descriptors given in order of malleability, and only the fastest changing descriptor will change?

Thirdly, in collapsing our description of the world down into a finite number of states we are ignoring many tiny details. Most of these details are not important, but in assuming that our succession rules are deterministic, we imply that these details have no impact whatsoever.

If we instead treat succession as a somewhat random process, the first two of these problems can be solved, and the third somewhat reduced.

### Stochastic succession

In the classical CIB succession analysis, some rule is selected which deterministically decides which scenario follows from the present. Stochastic succession analysis instead tells us the probability that a given scenario will lead to another.

The simplest example of a stochastic succession rule is to simply select a single descriptor at random each time step, and only consider updates that might happen to that descriptor. This we refer to as dice succession. This (in some ways) represents hidden information: two systems that might look identical on the surface from the point of view of our very blockish CIB analysis might be different enough underneath to lead to different outcomes. If we have a shaky agricultural system, but a large amount of up-and-coming research, then which of these two factors becomes important first is down to the luck of the draw. Rather than attempt to model this fine detail, we instead merely accept it and incorporate this uncertainty into our model.

Even this most simplistic change leads to dramatics effects on our system. Most importantly, almost all cycles vanish from our results, as forks in the road allow us to diverge from the path of the cycle.

We can take stochastic succession further and consider more exotic rules for our transitions, ones that allow any transition to take place, not merely those that are most favored. For example:

$P(x,y) = A e^{I_x(y)/T}$

Here $x$ is our current scenario, $y$ is some possible future scenario, and $I_x(y)$ is the total impact score of $y$ from the perspective of $x$. $A$ is a simple normalizing constant, and $T$ is our system’s temperature. High temperature systems are dominated by random noise, while low temperature systems are dominated by the influences described by our experts.

Impact score is calculated by summing the impact of each state of our current scenario, on each state of our target scenario. For example, for the above, suppose we want to find $I_x(y)$ when $x$ is the given scenario “Low population, High income, medium education” and $y$ was the scenario “Medium population, medium income, High education”. We consider all numbers that are in rows which were states of $x$ and in columns that are states of $y$. This would give:

$I_x(y)= (0+0+0) + (-2 +0 +10) +(6+7+0) = 21$

Here each bracket refers to the sum of a particular column.
More generically we can write the formula as:

$\displaystyle{ I_x(y)= \sum_{i \subset x, \;j \subset y} M_{i,j} }$

Here $M_{i,j}$ refers to an entry in our cross-impact balance matrix, $i$ and $j$ are both states, and $i \subset x$ reads as “$i$ is a state of $x$”.

We refer to this function for computing transition probabilities as the Boltzmann succession law, due to its similarity to the Boltzmann distribution found in physics. We use it merely as an example, and by no means wish to imply that we expect the transitions for our true system to act in a precisely Boltzmann-like manner. Alternative functions can, and should, be experimented with. The Boltzmann succession law is however an effective example and has a number of nice properties: $P(x,y)$ is always positive, unchanged by adding a constant to every element of the cross-impact balance matrix, contains adjustable parameters, and unbounded above.

The Boltzmann succession rule is what I will refer to as fully stochastic: it allows transitions even against our experts’ judgement (with low probability). This is in contrast to dice succession which picks a direction at random, but still contains scenarios from which our system can not escape.

### Effects of stochastic succession

‘Partially stochastic’ processes such as the dice rule have very limited effect on the long term behavior of the model. Aside from removing most cycles, they behave almost exactly like our deterministic succession rules. So, let us instead discuss the more interesting fully stochastic succession rules.

In the fully stochastic system we can ask “after a very long time, what is the probability we will be in scenario $x$?”

By asking this question we can get some idea of the relative importance of all our future scenarios and states.

For example, if the scenario “high population, low education, low income” has a 40% probability in the long term, while most other scenarios have a probability of 0.2%, we can see that this scenario is crucial to the understanding of our system. Often scenarios already identified by deterministic succession analysis are the ones with the greatest long term probability—but by looking at long term probability we also gain information about the relative importance of each scenario.

In addition, we can encounter scenarios which are themselves inconsistent, but form cycles and/or clusters of interconnected scenarios. We can also notice scenarios that while technically ‘consistent’ in the deterministic rules are only barely so, and have limited weight due to a limited basin of attraction. We might identify scenarios that seem familiar in the real world, but are apparently highly unlikely in our analysis, indicating either that we should expect change… or perhaps suggesting a missing descriptor or a cross-impact in need of tweaking.

Armed with such a model, we can investigate what we can do to increase the short term and long term likelihood of desirable scenarios, and decrease the likelihood of undesirable scenarios.

As a last note, here are a few freely available resources that may prove useful. For a more formal introduction to CIB, try:

• Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

• Wolfgang Weimer-Jehle, Properties of cross-impact balance analysis.

You can find free software for doing a classical CIB analysis here:

• ZIRIUS, ScenarioWizard.

ZIRIUS is the Research Center for Interdisciplinary Risk and Innovation Studies of the University of Stuttgart.

Here are some examples of CIB in action:

• Gerhard Fuchs, Ulrich Fahl, Andreas Pyka, Udo Staber, Stefan Voegele and Wolfgang Weimer-Jehle, Generating innovation scenarios using the cross-impact methodology, Department of Economics, University of Bremen, Discussion-Papers Series No. 007-2008.

• Ortwin Renn, Alexander Jager, Jurgen Deuschle and Wolfgang Weimer-Jehle, A normative-functional concept of sustainability and its indicators, International Journal of Global Environmental Issues, 9 (2008), 291–317.

Finally, this page contains a more complete list of articles, both practical and theoretical:

## Logic, Probability and Reflection

26 December, 2013

Last week I attended the Machine Intelligence Research Institute’s sixth Workshop on Logic, Probability, and Reflection. This one was in Berkeley, where the institute has its headquarters.

You may know this institute under their previous name: the Singularity Institute. It seems to be the brainchild of Eliezer Yudkowsky, a well-known advocate of ‘friendly artificial intelligence’, whom I interviewed in week311, week312 and week313 of This Week’s Finds. He takes an approach to artificial intelligence that’s heavily influenced by mathematical logic, and I got invited to the workshop because I blogged about a paper he wrote with Mihaly Barasz, Paul Christiano and Marcello Herresho ff on probability theory and logic.

I only have the energy to lay the groundwork for a good explanation of what happened in the workshop. So, after you read my post, please read this:

• Benja Fallenstein, Results from MIRI’s December workshop, Less Wrong, 28 December 2013.

The workshop had two main themes, so let me tell you what they were.

### Scientific induction in mathematics

The first theme is related to that paper I just mentioned. How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

probability 1 to every mathematical statement that’s provable,

probability 0 to every statement whose negation is provable,

and

to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It’s natural to try a version of Solomonoff induction, where our probability estimate, before any evidence comes in, favors statements that are simple. However, this runs up against problems. If you’re interested in learning more about this, try:

• Jeremy Hahn, Scientific induction in probabilistic mathematics.

It’s a summary of ideas people came up with during the workshop. I would like to explain them sometime, but for now I should move on.

### The Löbian obstacle

The second main theme was the ‘Löbian obstacle’. Löb’s theorem is the flip side of Gödel’s first incompleteness theorem, less famous but just as shocking. It seems to put limitations on how much a perfectly rational being can trust itself.

Since it’s the day after Christmas, let’s ease our way into these deep waters with the Santa Claus paradox, also known as Curry’s paradox.

If you have a child who is worried that Santa Claus might not exist, you can reassure them using this sentence:

If this sentence is true, Santa Claus exists.

Call it P, for short.

Assume, for the sake of argument, that P is true. Then what it says is true: “If P is true, Santa Claus exists.” And we’re assuming P is true. So, Santa Claus exists.

So, we’ve proved that if P is true, Santa Claus exists.

But that’s just what P says!

So, P is true.

So, Santa Claus exists!

There must be something wrong about this argument, even if Santa Claus does exist, because if it were valid you could you use it to prove anything at all. The self-reference is obviously suspicious. The sentence in question is a variant of the Liar Paradox:

This sentence is false.

since we can rewrite the Liar Paradox as

If this sentence is true, 0 = 1.

and then replace “0=1” by any false statement you like.

However, Gödel figured out a way to squeeze solid insights from these dubious self-referential sentences. He did this by creating a statement in the language of arithmetic, referring to nothing but numbers, which nonetheless manages to effectively say

This sentence is unprovable.

If it were provable, you’d get a contradiction! So, either arithmetic is inconsistent or this sentence is unprovable. But if it’s unprovable, it’s true. So, there are true but unprovable statements in arithmetic… unless arithmetic is inconsistent! This discovery shook the world of mathematics.

Here I’m being quite sloppy, just to get the idea across.

For one thing, when I’m saying ‘provable’, I mean provable given some specific axioms for arithmetic, like the Peano axioms. If we change our axioms, different statements will be provable.

For another, the concept of ‘true’ statements in arithmetic is often shunned by logicians. That may sound shocking, but there are many reasons for this: for example, Tarski showed that the truth of statements about arithmetic is undefinable in arithmetic. ‘Provability’ is much easier to deal with.

So, a better way of thinking about Gödel’s result is that he constructed a statement that is neither provable nor disprovable from Peano’s axioms of arithmetic, unless those axioms are inconsistent (in which case we can prove everything, but it’s all worthless).

Furthermore, this result applies not just to Peano’s axioms but to any stronger set of axioms, as long as you can write a computer program to list those axioms.

In 1952, the logician Leon Henkin flipped Gödel’s idea around and asked about a sentence in the language of arithmetic that says:

This sentence is provable.

He asked: is this provable or not? The answer is much less obvious than for Gödel’s sentence. Play around with it and see what I mean.

But in 1954, Martin Hugo Löb showed that Henkin’s sentence is provable!

And Henkin noticed something amazing: Löb’s proof shows much more.

At this point it pays to become a bit more precise. Let us write $\mathrm{PA} \vdash P$ to mean the statement $P$ is provable from the Peano axioms of arithmetic. Gödel figured out how to encode statements in arithmetic as numbers, so let’s write $\# P$ for the Gödel number of any statement $P.$ And Gödel figured out how to write a statement in arithmetic, say

$\mathrm{Provable}(n)$

which says that the statement with Gödel number $n$ is provable using the Peano axioms.

Using this terminology, what Henkin originally did was find a number $n$ such that the sentence

$\mathrm{Provable}(n)$

has Gödel number $n.$ So, this sentence says

This sentence is provable from the Peano axioms of arithmetic.

What Löb did was show

$\mathrm{PA} \vdash \mathrm{Provable}(n)$

In other words, he showed that Henkin sentence really is provable from the Peano axioms!

What Henkin then did is prove that for any sentence $P$ in the language of arithmetic, if

$\mathrm{PA} \vdash \mathrm{Provable}(\# P) \implies P$

then

$\mathrm{PA} \vdash P$

In other words, suppose we can prove that the provability of $P$ implies $P.$ Then we can prove $P$!

At first this merely sounds nightmarishly complicated. But if you think about it long enough, you’ll see it’s downright terrifying! For example, suppose $P$ is some famous open question in arithmetic, like the Twin Prime Conjecture. You might hope to prove

The provability of the Twin Prime Conjecture implies the Twin Prime Conjecture.

Indeed, that seems like a perfectly reasonable thing to want. But it turns out that proving this is as hard as proving the Twin Prime Conjecture! Why? Because if we can prove the boldface sentence above, Löb and Henkin’s work instantly gives us a proof of Twin Prime Conjecture!

What does all this have to do with artificial intelligence?

Well, what I just said is true not only for Peano arithmetic, but any set of axioms including Peano arithmetic that a computer program can list. Suppose your highly logical AI mathematician has some such set of axioms, say $\mathrm{AI}.$ You might want it to trust itself. In other words, you might want

$\mathrm{AI} \vdash \mathrm{Provable}(\# P) \implies P$

for every sentence $P.$ This says, roughly, that whatever the AI can prove it can prove, it can prove.

But then Löb’s theorem would kick in and give

$\mathrm{AI} \vdash P$

for every sentence $P.$ And this would be disastrous: our AI would be inconsistent, because it could prove everything!

This is just the beginning of the problems. It gets more involved when we consider AI’s that spawn new AI’s and want to trust them. For more see:

• Eliezer Yudkowsky and Marcello Herreshoff, Tiling agents for self-modifying AI, and the Löbian obstacle.

At workshop various people made progress on this issue, which is recorded in these summaries:

• Eliezer Yudkowsky, The procrastination paradox.

Abstract. A theorem by Marcello Herresho , Benja Fallenstein, and Stuart Armstrong shows that if there exists an infinite series of theories $T_i$ extending $\mathrm{PA}$ where each $T_i$ proves the soundness of $T_{i+1}$, then all the $T_i$ must have only nonstandard models. We call this the Procrastination Theorem for reasons which will become apparent.

Here Fallenstein constructs a di fferent sequence of theories $T_i$ extending Peano arithmetic such that each $T_i$ proves the consistency of $T_{i+1},$ and all the theories are sound for $\Pi_1$ sentences—that is, sentences with only one $\forall$ quantifier outside the rest of the stuff.

The following summaries would take more work to explain:

• Nate Soares, Fallenstein’s monster.

• Nisan Stiennon, Recursively-defined logical theories are well-defined.

• Benja Fallenstein, The 5-and-10 problem and the tiling agents formalism.

## Quantum Network Theory (Part 2)

13 August, 2013

guest post by Tomi Johnson

Last time I told you how a random walk called the ‘uniform escape walk’ could be used to analyze a network. In particular, Google uses it to rank nodes. For the case of an undirected network, the steady state of this random walk tells us the degrees of the nodes—that is, how many edges come out of each node.

Now I’m going to prove this to you. I’ll also exploit the connection between this random walk and a quantum walk, also introduced last time. In particular, I’ll connect the properties of this quantum walk to the degrees of a network by exploiting its relationship with the random walk.

This is pretty useful, considering how tricky these quantum walks can be. As the parts of the world that we model using quantum mechanics get bigger and have more complicated structures, like biological network, we need all the help in understanding quantum walks that we can get. So I’d better start!

### Flashback

Starting with any (simple, connected) graph, we can get an old-fashioned ‘stochastic’ random walk on this graph, but also a quantum walk. The first is the uniform escape stochastic walk, where the walker has an equal probability per time of walking along any edge leaving the node they are standing at. The second is the related quantum walk we’re going to study now. These two walks are generated by two matrices, which we called $S$ and $Q.$ The good thing is that these matrices are similar, in the technical sense.

We studied this last time, and everything we learned is summarized here:

where:

$G$ is a simple graph that specifies

$A$ the adjacency matrix (the generator of a quantum walk) with elements $A_{i j}$ equal to unity if nodes $i$ and $j$ are connected, and zero otherwise ($A_{i i} = 0$), which subtracted from

$D$ the diagonal matrix of degrees $D_{i i} = \sum_j A_{i j}$ gives

$L = D - A$ the symmetric Laplacian (generator of stochastic and quantum walks), which when normalized by $D$ returns both

$S = L D^{-1}$ the generator of the uniform escape stochastic walk and

$Q = D^{-1/2} L D^{-1/2}$ the quantum walk generator to which it is similar!

Now I hope you remember where we are. Next I’ll talk you through the mathematics of the uniform escape stochastic walk $S$ and how it connects to the degrees of the nodes in the large-time limit. Then I’ll show you how this helps us solve aspects of the quantum walk generated by $Q.$

### Stochastic walk

The uniform escape stochastic walk generated by $S$ is popular because it has a really useful stationary state.

To recap from Part 20 of the network theory series, a stationary state of a stochastic walk is one that does not change in time. By the master equation

$\displaystyle{ \frac{d}{d t} \psi(t) = -S \psi(t)}$

the stationary state must be an eigenvector of $S$ with eigenvalue $0.$

A fantastic pair of theorems hold:

• There is always a unique (up to multiplication by a positive number) positive eigenvector $\pi$ of $S$ with eigenvalue $0.$ That is, there is a unique stationary state $\pi.$

• Regardless of the initial state $\psi(0),$ any solution of the master equation approaches this stationary state $\pi$ in the large-time limit:

$\displaystyle{ \lim_{t \rightarrow \infty} \psi(t) = \pi }$

To find this unique stationary state, consider the Laplacian $L,$ which is both infinitesimal stochastic and symmetric. Among other things, this means the rows of $L$ sum to zero:

$\displaystyle{ \sum_j L_{i j} = 0 }$

Thus, the ‘all ones’ vector $\mathbf{1}$ is an eigenvector of $L$ with zero eigenvalue:

$L \mathbf{1} = 0$

Inserting the identity $I = D^{-1} D$ into this equation we then find $D \mathbf{1}$ is a zero eigenvector of $S$:

$L \mathbf{1} = ( L D^{-1} ) ( D \mathbf{1} ) = S ( D \mathbf{1} ) = 0$

Therefore we just need to normalize this to get the large-time stationary state of the walk:

$\displaystyle{ \pi = \frac{D \mathbf{1}}{\sum_i D_{i i}} }$

If we write $i$ for the basis vector that is zero except at the ith node of our graph, and 1 at that node, the inner product $\langle i , \pi \rangle$ is large-time probability of finding a walker at that node. The equation above implies this is proportional to the degree $D_{i i}$ of node $i.$

We can check this for the following graph:

We find that $\pi$ is

$\displaystyle{ \left( \begin{matrix} 1/6 \\ 1/6 \\ 1/4 \\ 1/4 \\ 1/6 \end{matrix} \right) }$

which implies large-time probability $1/6$ for nodes $1,$ $2$ and $5,$ and $1/4$ for nodes $3$ and $4.$ Comparing this to the original graph, this exactly reflects the arrangement of degrees, as we knew it must.

Math works!

### The quantum walk

Next up is the quantum walk generated by $Q.$ Not a lot is known about quantum walks on networks of arbitrary geometry, but below we’ll see some analytical results are obtained by exploiting the similarity of $S$ and $Q.$

Where to start? Well, let’s start at the bottom, what quantum physicists call the ground state. In contrast to stochastic walks, for a quantum walk every eigenvector $\phi_k$ of $Q$ is a stationary state of the quantum walk. (In Puzzle 5, at the bottom of this page, I ask you to prove this). The stationary state $\phi_0$ is of particular interest physically and mathematically. Physically, since eigenvectors of the $Q$ correspond to states of well-defined energy equal to the associated eigenvalue, $\phi_0$ is the state of lowest energy, energy zero, hence the name ‘ground state’. (In Puzzle 3, I ask you to prove that all eigenvalues of $Q$ are non-negative, so zero really does correspond to the ground state.)

Mathematically, the relationship between eigenvectors implied by the similarity of $S$ and $Q$ means

$\phi_0 \propto D^{-1/2} \pi \propto D^{1/2} \mathbf{1}$

So in the ground state, the probability of our quantum walker being found at node $i$ is

$| \langle i , \phi_0 \rangle |^2 \propto | \langle i , D^{1/2} \rangle \mathbf{1} |^2 = D_{i i}$

Amazingly, this probability is proportional to the degree and so is exactly the same as $\langle i , \pi \rangle,$ the probability in the stationary state $\pi$ of the stochastic walk!

In short: a zero energy quantum walk $Q$ leads to exactly the same distribution of the walker over the nodes as in the large-time limit of the uniform escape stochastic walk $S.$ The classically important notion of degree distribution also plays a role in quantum walks!

This is already pretty exciting. What else can we say? If you are someone who feels faint at the sight of quantum mechanics, well done for getting this far, but watch out for what’s coming next.

What if the walker starts in some other initial state? Is there some quantum walk analogue of the unique large-time state of a stochastic walk?

In fact, the quantum walk in general does not converge to a stationary state. But there is a probability distribution that can be thought to characterize the quantum walk in the same way as the large-time state characterizes the stochastic walk. It’s the large-time average probability vector $P.$

If you didn’t know the time that had passed since the beginning of a quantum walk, then the best estimate for the probability of your measuring the walker to be at node $i$ would be the large-time average probability

$\displaystyle{ \langle i , P \rangle = \lim_{T \rightarrow \infty} \frac{1}{T} \int_0^T | \psi_i (t) |^2 d t }$

There’s a bit that we can do to simplify this expression. As usual in quantum mechanics, let’s start with the trick of diagonalizing $Q.$ This amounts to writing

$\displaystyle{ Q= \sum_k \epsilon_k \Phi_k }$

where $\Phi_k$ are projectors onto the eigenvectors $\phi_k$ of $Q,$ and $\epsilon_k$ are the corresponding eigenvalues of $Q.$ If we insert this equation into

$\psi(t) = e^{-Q t} \psi(0)$

we get

$\displaystyle{ \psi(t) = \sum_k e^{-\epsilon_k t} \Phi_k \psi(0) }$

and thus

$\displaystyle{ \langle i , P \rangle = \lim_{T \rightarrow \infty} \frac{1}{T} \int_0^T | \sum_k e^{-i \epsilon_k t} \langle i, \Phi_k \psi (0) \rangle |^2 d t }$

Due to the integral over all time, the interference between terms corresponding to different eigenvalues averages to zero, leaving:

$\displaystyle{ \langle i , P \rangle = \sum_k | \langle i, \Phi_k \psi(0) \rangle |^2 }$

The large-time average probability is then the sum of terms contributed by the projections of the initial state onto each eigenspace.

So we have a distribution that characterizes a quantum walk for a general initial state, but it’s a complicated beast. What can we say about it?

Our best hope of understanding the large-time average probability is through the term $| \langle i, \Phi_0 \psi (0) \rangle |^2$ associated with the zero energy eigenspace, since we know everything about this space.

For example, we know the zero energy eigenspace is one-dimensional and spanned by the eigenvector $\phi_0.$ This means that the projector is just the usual outer product

$\Phi_0 = | \phi_0 \rangle \langle \phi_0 | = \phi_0 \phi_0^\dagger$

where we have normalized $\phi_0$ according to the inner product $\langle \phi_0, \phi_0\rangle = 1.$ (If you’re wondering why I’m using all these angled brackets, well, they’re a notation named after Dirac that is adored by quantum physicists.)

The zero eigenspace contribution to the large-time average probability then breaks nicely into two:

$\begin{array}{ccl} | \langle i, \Phi_0 \psi (0) \rangle |^2 &=& | \langle i, \phi_0\rangle \; \langle \phi_0, \psi (0) \rangle |^2 \\ \\ &=& | \langle i, \phi_0\rangle |^2 \; | \langle \phi_0 , \psi (0) \rangle |^2 \\ \\ &=& \langle i , \pi \rangle \; | \langle \phi_0 , \psi (0) \rangle |^2 \end{array}$

This is just the product of two probabilities:

• first, the probability $\langle i , \pi \rangle$ for a quantum state in the zero energy eigenspace to be at node $i,$ as we found above,

and

• second, the probability $| \langle \phi_0, \psi (0)\rangle |^2$ of being in this eigenspace to begin with. (Remember, in quantum mechanics the probability of measuring the system to have an energy is the modulus squared of the projection of the state onto the associated eigenspace, which for the one-dimensional zero energy eigenspace means just the inner product with the ground state.)

This is all we need to say something interesting about the large-time average probability for all states. We’ve basically shown that we can break the large-time probability vector $P$ into a sum of two normalized probability vectors:

$P = (1- \eta) \pi + \eta \Omega$

the first $\pi$ being the stochastic stationary state associated with the zero energy eigenspace, and the second $\Omega$ associated with the higher energy eigenspaces, with

$\displaystyle{ \langle i , \Omega \rangle = \frac{ \sum_{k\neq 0} | \langle i, \Phi_k \psi (0) \rangle |^2 }{ \eta} }$

The weight of each term is governed by the parameter

$\eta = 1 - | \langle \phi_0, \psi (0)\rangle |^2$

which you could think of as the quantumness of the result. This is one minus the probability of the walker being in the zero energy eigenspace, or equivalently the probability of the walker being outside the zero energy eigenspace.

So even if we don’t know $\Omega,$ we know its importance is controlled by a parameter $\eta$ that governs how close the large-time average distribution $P$ of the quantum walk is to the corresponding stochastic stationary distribution $\pi.$

What do we mean by ‘close’? Find out for yourself:

Puzzle 1. Show, using a triangle inequality, that the trace distance between the two characteristic stochastic and quantum distributions $\{ \langle i , P \rangle \}_i$ and $\{ \langle i , \pi \rangle \}_i$ is upper-bounded by $2 \eta.$

Can we say anything physical about when the quantumness $\eta$ is big or small?

Because the eigenvalues of $Q$ have a physical interpretation in terms of energy, the answer is yes. The quantumness $\eta$ is the probability of being outside the zero energy state. Call the next lowest eigenvalue $\Delta = \min_{k \neq 0} \epsilon_k$ the energy gap. If the quantum walk is not in the zero energy eigenspace then it must be in an eigenspace of energy greater or equal to $\Delta.$ Therefore the expected energy $E$ of the quantum walker must bound the quantumness $E \ge \eta \Delta.$

This tells us that a quantum walk with a low energy is similar to a stochastic walk in the large-time limit. We already knew this was exactly true in the zero energy limit, but this result goes further.

So little is known about quantum walks on networks of arbitrary geometry that we were very pleased to find this result. It says there is a special case in which the walk is characterized by the degree distribution of the network, and a clear physical parameter that bounds how far the walk is from this special case.

Also, in finding it we learned that the difficulties of the initial state dependence, enhanced by the lack of convergence to a stationary state, could be overcome for a quantum walk, and that the relationships between quantum and stochastic walks extend beyond those with shared generators.

### What next?

That’s all for the latest bit of idea sharing at the interface between stochastic and quantum systems.

Other questions we have include: What holds analytically about the form of the quantum correction? Numerically it is known that the so-called quantum correction $\Omega$ tends to enhance the probability of being found on nodes of low degree compared to $\pi.$ Can someone explain why? What happens if a small amount of stochastic noise is added to a quantum walk? Or a lot of noise?

It’s difficult to know who is best placed to answer these questions: experts in quantum physics, graph theory, complex networks or stochastic processes? I suspect it’ll take a bit of help from everyone.

A couple of textbooks with comprehensive sections on non-negative matrices and continuous-time stochastic processes are:

• Peter Lancaster and Miron Tismenetsky, The Theory of Matrices: with Applications, 2nd edition, Academic Press, San Diego, 1985.

• James R. Norris, Markov Chains, Cambridge University Press, Cambridge, 1997.

There is, of course, the book that arose from the Azimuth network theory series, which considers several relationships between quantum and stochastic processes on networks:

• John Baez and Jacob Biamonte, A Course on Quantum Techniques for Stochastic Mechanics, 2012.

Another couple of books on complex networks are:

• Mark Newman, Networks: An Introduction, Oxford University Press, Oxford, 2010.

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford University Press, Oxford, 2011. Note that the first chapter is available free online.

There are plenty more useful references in our article on this topic:

• Mauro Faccin, Tomi Johnson, Jacob Biamonte, Sabre Kais and Piotr Migdał, Degree distribution in quantum walks on complex networks.

### Puzzles for the enthusiastic

Sadly I didn’t have space to show proofs of all the theorems I used. So here are a few puzzles that guide you to doing the proofs for yourself:

#### Stochastic walks and stationary states

Puzzle 2. (For the hard core.) Prove there is always a unique positive eigenvector for a stochastic walk generated by $S.$ You’ll need the assumption that the graph $G$ is connected. It’s not simple, and you’ll probably need help from a book, perhaps one of those above by Lancaster and Tismenetsky, and Norris.

Puzzle 3. Show that the eigenvalues of $S$ (and therefore $Q$) are non-negative. A good way to start this proof is to apply the Perron-Frobenius theorem to the non-negative matrix $M = - S + I \max_i S_{i i}.$ This implies that $M$ has a positive eigenvalue $r$ equal to its spectral radius

$r = \max_k | \lambda_k |$

where $\lambda_k$ are the eigenvalues of $M,$ and the associated eigenvector $v$ is positive. Since $S = - M + I \max_i S_{i i},$ it follows that $S$ shares the eigenvectors of $M$ and the associated eigenvalues are related by inverted translation:

$\epsilon_k = - \lambda_k + \max_i S_{i i}$

Puzzle 4. Prove that regardless of the initial state $\psi(0),$ the zero eigenvector $\pi$ is obtained in the large-time limit $\lim_{t \rightarrow \infty} \psi(t) = \pi$ of the walk generated by $S.$ This breaks down into two parts:

(a) Using the approach from Puzzle 5, to show that $S v = \epsilon_v v,$ the positivity of $v$ and the infinitesimal stochastic property $\sum_i S_{i j} = 0$ imply that $\epsilon_v = \epsilon_0 = 0$ and thus $v = \pi$ is actually the unique zero eigenvector and stationary state of $S$ (its uniqueness follows from puzzle 4, you don’t need to re-prove it).

(b) By inserting the decomposition $S = \sum_k \epsilon_k \Pi_k$ into $e^{-S t}$ and using the result of puzzle 5, complete the proof.

(Though I ask you to use the diagonalizability of $S,$ the main results still hold if the generator is irreducible but not diagonalizable.)

#### Quantum walks

Here are a couple of extra puzzles for those of you interested in quantum mechanics:

Puzzle 5. In quantum mechanics, probabilities are given by the moduli squared of amplitudes, so multiplying a state by a number of modulus unity has no physical effect. By inserting

$\displaystyle{ Q= \sum_k \epsilon_k \Phi_k }$

into the quantum time evolution matrix $e^{-Q t},$ show that if

$\psi(0) = \phi_k$

then

$\psi(t) = e^{ - i \epsilon_k t} \psi(0)$

hence $\phi_k$ is a stationary state in the quantum sense, as probabilities don’t change in time.

Puzzle 6. By expanding the initial state $\psi(0)$ in terms of the complete orthogonal basis vectors $\phi_k$ show that for a quantum walk $\psi(t)$ never converges to a stationary state unless it began in one.