## Resource Convertibility (Part 2)

10 April, 2015

guest post by Tobias Fritz

In Part 1, I introduced ordered commutative monoids as a mathematical formalization of resources and their convertibility. Today I’m going to say something about what to do with this formalization. Let’s start with a quick recap!

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Recall also that we think of the $x,y\in A$ as resource objects such that $x+y$ represents the object consisting of $x$ and $y$ together, and $x\geq y$ means that the resource object $x$ can be converted into $y.$

When confronted with an abstract definition like this, many people ask: so what is it useful for? The answer to this is twofold: first, it provides a language which we can use to guide our thoughts in any application context. Second, the definition itself is just the very start: we can now also prove theorems about ordered commutative monoids, which can be instantiated in any particular application context. So the theory of ordered commutative monoids will provide a useful toolbox for talking about concrete resource theories and studying them. In the remainder of this post, I’d like to say a bit about what this toolbox contains. For more, you’ll have to read the paper!

To start, let’s consider catalysis as one of the resource-theoretic phenomena neatly captured by ordered commutative monoids. Catalysis is the phenomenon that certain conversions become possible only due to the presence of a catalyst, which is an additional resource object which does not get consumed in the process of the conversion. For example, we have

$\text{timber + nails}\not\geq \text{table},$

$\text{timber + nails + saw + hammer} \geq \text{table + saw + hammer}$

because making a table from timber and nails requires a saw and a hammer as tools. So in this example, ‘saw $+$ hammer’ is a catalyst for the conversion of ‘timber $+$ nails’ into ‘table’. In mathematical language, catalysis occurs precisely when the ordered commutative monoid is not cancellative, which means that $x + z\geq y + z$ sometimes holds even though $x\geq y$ does not. So, the notion of catalysis perfectly matches up with a very natural and familiar notion from algebra.

One can continue along these lines and study those ordered commutative monoids which are cancellative. It turns out that every ordered commutative monoid can be made cancellative in a universal way; in the resource-theoretic interpretation, this boils down to replacing the convertibility relation by catalytic convertibility, in which $x$ is declared to be convertible into $y$ as soon as there exists a catalyst which achieves this conversion. Making an ordered commutative monoid cancellative like this is a kind of ‘regularization’: it leads to a mathematically more well-behaved structure. As it turns out, there are several additional steps of regularization that can be performed, and all of these are both mathematically natural and have an appealing resource-theoretic interpretation. These regularizations successively take us from the world of ordered commutative monoids to the realm of linear algebra and functional analysis, where powerful theorems are available. For now, let me not go into the details, but only try to summarize one of the consequences of this development. This requires a bit of preparation.

In many situations, it is not just of interest to convert a single copy of some resource object $x$ into a single copy of some $y;$ instead, one may be interested in converting many copies of $x$ into many copies of $y$ all together, and thereby maximizing (or minimizing) the ratio of the resulting number of $y$‘s compared to the number of $x$‘s that get consumed. This ratio is measured by the maximal rate:

$\displaystyle{ R_{\mathrm{max}}(x\to y) = \sup \left\{ \frac{m}{n} \:|\: nx \geq my \right\} }$

Here, $m$ and $n$ are natural numbers, and $nx$ stands for the $n$-fold sum $x+\cdots+x,$ and similarly for $my.$ So this maximal rate quantifies how many $y$’ s we can get out of one copy of $x,$ when working in a ‘mass production’ setting. There is also a notion of regularized rate, which has a slightly more complicated definition that I don’t want to spell out here, but is similar in spirit. The toolbox of ordered commutative monoids now provides the following result:

Rate Theorem: If $x\geq 0$ and $y\geq 0$ in an ordered commutative monoid $A$ which satisfies a mild technical assumption, then the maximal regularized rate from $x$ to $y$ can be computed like this:

$\displaystyle{ R_{\mathrm{max}}^{\mathrm{reg}}(x\to y) = \inf_f \frac{f(y)}{f(x)} }$

where $f$ ranges over all functionals on $A$ with $f(y)\neq 0.$

Wait a minute, what’s a ‘functional’? It’s defined to be a map $f:A\to\mathbb{R}$ which is monotone,

$x\geq y \:\Rightarrow\: f(x)\geq f(y)$

$f(x+y) = f(x) + f(y)$

In economic terms, we can think of a functional as a consistent assignment of prices to all resource objects. If $x$ is at least as useful as $y,$ then the price of $x$ should be at least as high as the price of $y$; and the price of two objects together should be the sum of their individual prices. So the $f$ in the rate formula above ranges over all ‘markets’ on which resource objects can be ‘traded’ at consistent prices. The term ‘functional’ is supposed to hint at a relation to functional analysis. In fact, the proof of the theorem crucially relies on the Hahn–Banach Theorem.

The mild technical mentioned in the Rate Theorem is that the ordered commutative monoid needs to have a generating pair. This turns out to hold in the applications that I have considered so far, and I hope that it will turn out to hold in most others as well. For the full gory details, see the paper.

So this provides some idea of what kinds of gadgets one can find in the toolbox of ordered commutative monoids. Next time, I’ll show some applications to graph theory and zero-error communication and say a bit about where this project might be going next.

## Resource Convertibility (Part 1)

7 April, 2015

guest post by Tobias Fritz

Hi! I am Tobias Fritz, a mathematician at the Perimeter Institute for Theoretical Physics in Waterloo, Canada. I like to work on all sorts of mathematical structures which pop up in probability theory, information theory, and other sorts of applied math. Today I would like to tell you about my latest paper:

It should be of interest to Azimuth readers as it forms part of what John likes to call ‘green mathematics’. So let’s get started!

Resources and their management are an essential part of our everyday life. We deal with the management of time or money pretty much every day. We also consume natural resources in order to afford food and amenities for (some of) the 7 billion people on our planet. Many of the objects that we deal with in science and engineering can be considered as resources. For example, a communication channel is a resource for sending information from one party to another. But for now, let’s stick with a toy example: timber and nails constitute a resource for making a table. In mathematical notation, this looks like so:

$\mathrm{timber} + \mathrm{nails} \geq \mathrm{table}$

We interpret this inequality as saying that “given timber and nails, we can make a table”. I like to write it as an inequality like this, which I think of as stating that having timber and nails is at least as good as having a table, because the timber and nails can always be turned into a table whenever one needs a table.

To be more precise, we should also take into account that making the table requires some tools. These tools do not get consumed in the process, so we also get them back out:

$\text{timber} + \text{nails} + \text{saw} + \text{hammer} \geq \text{table} + \text{hammer} + \text{saw}$

Notice that this kind of equation is analogous to a chemical reaction equation like this:

$2 \mathrm{H}_2 + \mathrm{O}_2 \geq \mathrm{H}_2\mathrm{O}$

So given a hydrogen molecules and an oxygen molecule, we can let them react such as to form a molecule of water. In chemistry, this kind of equation would usually be written with an arrow ‘$\rightarrow$’ instead of an ordering symbol ‘$\geq$’ , but here we interpret the equation slightly differently. As with the timber and the nails and nails above, the inequality says that if we have two hydrogen atoms and an oxygen atom, then we can let them react to a molecule of water, but we don’t have to. In this sense, having two hydrogen atoms and an oxygen atom is at least as good as having a molecule of water.

So what’s going on here, mathematically? In all of the above equations, we have a bunch of stuff on each side and an inequality ‘$\geq$’ in between. The stuff on each side consists of a bunch of objects tacked together via ‘$+$’ . With respect to these two pieces of structure, the collection of all our resource objects forms an ordered commutative monoid:

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Here, the third axiom is the most important, since it tells us how the additive structure interacts with the ordering structure.

Ordered commutative monoids are the mathematical formalization of resource convertibility and combinability as follows. The elements $x,y\in A$ are the resource objects, corresponding to the ‘collections of stuff’ in our earlier examples, such as $x = \text{timber} + \text{nails}$ or $y = 2 \text{H}_2 + \text{O}_2.$ Then the addition operation simply joins up collections of stuff into bigger collections of stuff. The ordering relation $\geq$ is what formalizes resource convertibility, as in the examples above. The third axiom states that if we can convert $x$ into $y,$ then we can also convert $x$ together with $z$ into $y$ together with $z$ for any $z,$ for example by doing nothing to $z.$

A mathematically minded reader might object that requiring $A$ to form a partially ordered set under $\geq$ is too strong a requirement, since it requires two resource objects to be equal as soon as they are mutually interconvertible: $x \geq y$ and $y \geq x$ implies $x = y.$ However, I think that this is not an essential restriction, because we can regard this implication as the definition of equality: ‘$x = y$’ is just a shorthand notation for ‘$x\geq y$ and $y\geq x$’ which formalizes the perfect interconvertibility of resource objects.

We could now go back to the original examples and try to model carpentry and chemistry in terms of ordered commutative monoids. But as a mathematician, I needed to start out with something mathematically precise and rigorous as a testing ground for the formalism. This helps ensure that the mathematics is sensible and useful before diving into real-world applications. So, the main example in my paper is the ordered commutative monoid of graphs, which has a resource-theoretic interpretation in terms of zero-error information theory. As graph theory is a difficult and traditional subject, this application constitutes the perfect training camp for the mathematics of ordered commutative monoids. I will get to this in Part 3.

In Part 2, I will say something about what one can do with ordered commutative monoids. In the meantime, I’d be curious to know what you think about what I’ve said so far!

## Thermodynamics with Continuous Information Flow

21 March, 2015

guest post by Blake S. Pollard

Over a century ago James Clerk Maxwell created a thought experiment that has helped shape our understanding of the Second Law of Thermodynamics: the law that says entropy can never decrease.

Maxwell’s proposed experiment was simple. Suppose you had a box filled with an ideal gas at equilibrium at some temperature. You stick in an insulating partition, splitting the box into two halves. These two halves are isolated from one another except for one important caveat: somewhere along the partition resides a being capable of opening and closing a door, allowing gas particles to flow between the two halves. This being is also capable of observing the velocities of individual gas particles. Every time a particularly fast molecule is headed towards the door the being opens it, letting fly into the other half of the box. When a slow particle heads towards the door the being keeps it closed. After some time, fast molecules would build up on one side of the box, meaning half the box would heat up! To an observer it would seem like the box, originally at a uniform temperature, would for some reason start splitting up into a hot half and a cold half. This seems to violate the Second Law (as well as all our experience with boxes of gas).

Of course, this apparent violation probably has something to do with positing the existence of intelligent microscopic doormen. This being, and the thought experiment itself, are typically referred to as Maxwell’s demon.

Photo credit: Peter MacDonald, Edmonds, UK

When people cook up situations that seem to violate the Second Law there is typically a simple resolution: you have to consider the whole system! In the case of Maxwell’s demon, while the entropy of the box decreases, the entropy of the system as a whole, demon include, goes up. Precisely quantifying how Maxwell’s demon doesn’t violate the Second Law has led people to a better understanding of the role of information in thermodynamics.

At the American Physical Society March Meeting in San Antonio, Texas, I had the pleasure of hearing some great talks on entropy, information, and the Second Law. Jordan Horowitz, a postdoc at Boston University, gave a talk on his work with Massimiliano Esposito, a researcher at the University of Luxembourg, on how one can understand situations like Maxwell’s demon (and a whole lot more) by analyzing the flow of information between subsystems.

Consider a system made up of two parts, $X$ and $Y$. Each subsystem has a discrete set of states. Each systems makes transitions among these discrete states. These dynamics can be modeled as Markov processes. They are interested in modeling the thermodynamics of information flow between subsystems. To this end they consider a bipartite system, meaning that either $X$ transitions or $Y$ transitions, never both at the same time. The probability distribution $p(x,y)$ of the whole system evolves according to the master equation:

$\displaystyle{ \frac{dp(x,y)}{dt} = \sum_{x', y'} H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}p(x,y) }$

where $H_{x,x'}^{y,y'}$ is the rate at which the system transitions from $(x',y') \to (x,y).$ The ‘bipartite’ condition means that $H$ has the form

$H_{x,x'}^{y,y'} = \left\{ \begin{array}{cc} H_{x,x'}^y & x \neq x'; y=y' \\ H_x^{y,y'} & x=x'; y \neq y' \\ 0 & \text{otherwise.} \end{array} \right.$

The joint system is an open system that satisfies the second law of thermodynamics:

$\displaystyle{ \frac{dS_i}{dt} = \frac{dS_{XY}}{dt} + \frac{dS_e}{dt} \geq 0 }$

where

$\displaystyle{ S_{XY} = - \sum_{x,y} p(x,y) \ln ( p(x,y) ) }$

is the Shannon entropy of the system, satisfying

$\displaystyle{ \frac{dS_{XY} }{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{p(x',y')}{p(x,y)} \right) }$

and

$\displaystyle{ \frac{dS_e}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{ H_{x,x'}^{y,y'} } {H_{x',x}^{y',y} } \right) }$

is the entropy change of the environment.

We want to investigate how the entropy production of the whole system relates to entropy production in the bipartite pieces $X$ and $Y$. To this end they define a new flow, the information flow, as the time rate of change of the mutual information

$\displaystyle{ I = \sum_{x,y} p(x,y) \ln \left( \frac{p(x,y)}{p(x)p(y)} \right) }$

Its time derivative can be split up as

$\displaystyle{ \frac{dI}{dt} = \frac{dI^X}{dt} + \frac{dI^Y}{dt}}$

where

$\displaystyle{ \frac{dI^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y} p(x',y) - H_{x',x}^{y}p(x,y) \right] \ln \left( \frac{ p(y|x) }{p(y|x')} \right) }$

and

$\displaystyle{ \frac{dI^Y}{dt} = \sum_{x,y} \left[ H_{x}^{y,y'}p(x,y') - H_{x}^{y',y}p(x,y) \right] \ln \left( \frac{p(x|y)}{p(x|y')} \right) }$

are the information flows associated with the subsystems $X$ and $Y$ respectively.

When

$\displaystyle{ \frac{dI^X}{dt} > 0}$

a transition in $X$ increases the mutual information $I,$ meaning that $X$ ‘knows’ more about $Y$ and vice versa.

We can rewrite the entropy production entering into the second law in terms of these information flows as

$\displaystyle{ \frac{dS_i}{dt} = \frac{dS_i^X}{dt} + \frac{dS_i^Y}{dt} }$

where

$\displaystyle{ \frac{dS_i^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^y p(x',y) - H_{x',x}^y p(x,y) \right] \ln \left( \frac{H_{x,x'}^y p(x',y) } {H_{x',x}^y p(x,y) } \right) \geq 0 }$

and similarly for $\frac{dS_Y}{dt}$. This gives the following decomposition of entropy production in each subsystem:

$\displaystyle{ \frac{dS_i^X}{dt} = \frac{dS^X}{dt} + \frac{dS^X_e}{dt} - \frac{dI^X}{dt} \geq 0 }$

$\displaystyle{ \frac{dS_i^Y}{dt} = \frac{dS^Y}{dt} + \frac{dS^X_e}{dt} - \frac{dI^Y}{dt} \geq 0},$

where the inequalities hold for each subsystem. To see this, if you write out the left hand side of each inequality you will find that they are both of the form

$\displaystyle{ \sum_{x,y} \left[ x-y \right] \ln \left( \frac{x}{y} \right) }$

which is non-negative for $x,y \geq 0$.

The interaction between the subsystems is contained entirely in the information flow terms. Neglecting these terms gives rise to situations like Maxwell’s demon where a subsystem seems to violate the second law.

Lots of Markov processes have boring equilibria $\frac{dp}{dt} = 0$ where there is no net flow among the states. Markov processes also admit non-equilibrium steady states, where there may be some constant flow of information. In this steady state all explicit time derivatives are zero, including the net information flow:

$\displaystyle{ \frac{dI}{dt} = 0 }$

which implies that $\frac{dI^X}{dt} = - \frac{dI^Y}{dt}.$ In this situation the above inequalities become

$\displaystyle{ \frac{dS^X_i}{dt} = \frac{dS_e^X}{dt} - \frac{dI^X}{dt} }$

and

$\displaystyle{ \frac{dS^Y_i}{dt} = \frac{dS_e^X}{dt} + \frac{dI^X}{dt} }.$

If

$\displaystyle{ \frac{dI^X}{dt} > 0 }$

then $X$ is learning something about $Y$ or acting as a sensor. The first inequality

$\frac{dS_e^X}{dt} \geq \frac{dI^X}{dt}$ quantifies the minimum amount of energy $X$ must supply to do this sensing. Similarly $-\frac{dS_e^Y}{dt} \leq \frac{dI^X}{dt}$ bounds the amount of useful energy is available to $Y$ as a result of this information transfer.

In their paper Horowitz and Esposito explore a few other examples and show the utility of this simple breakup of a system into two interacting subsystems in explaining various interesting situations in which the flow of information has thermodynamic significance.

For the whole story, read their paper!

• Jordan Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Phys. Rev. X 4 (2014), 031015.

## Planets in the Fourth Dimension

17 March, 2015

You probably that planets go around the sun in elliptical orbits. But do you know why?

In fact, they’re moving in circles in 4 dimensions. But when these circles are projected down to 3-dimensional space, they become ellipses!

This animation by Greg Egan shows the idea:

The plane here represents 2 of the 3 space dimensions we live in. The vertical direction is the mysterious fourth dimension. The planet goes around in a circle in 4-dimensional space. But down here in 3 dimensions, its ‘shadow’ moves in an ellipse!

What’s this fourth dimension I’m talking about here? It’s a lot like time. But it’s not exactly time. It’s the difference between ordinary time and another sort of time, which flows at a rate inversely proportional to the distance between the planet and the sun.

The movie uses this other sort of time. Relative to this other time, the planet is moving at constant speed around a circle in 4 dimensions. But in ordinary time, its shadow in 3 dimensions moves faster when it’s closer to the sun.

All this sounds crazy, but it’s not some new physics theory. It’s just a different way of thinking about Newtonian physics!

Physicists have known about this viewpoint at least since 1980, thanks to a paper by the mathematical physicist Jürgen Moser. Some parts of the story are much older. A lot of papers have been written about it.

But I only realized how simple it is when I got this paper in my email, from someone I’d never heard of before:

• Jesper Göransson, Symmetries of the Kepler problem, 8 March 2015.

I get a lot of papers by crackpots in my email, but the occasional gem from someone I don’t know makes up for all those.

The best thing about Göransson’s 4-dimensional description of planetary motion is that it gives a clean explanation of an amazing fact. You can take any elliptical orbit, apply a rotation of 4-dimensional space, and get another valid orbit!

Of course we can rotate an elliptical orbit about the sun in the usual 3-dimensional way and get another elliptical orbit. The interesting part is that we can also do 4-dimensional rotations. This can make a round ellipse look skinny: when we tilt a circle into the fourth dimension, its ‘shadow’ in 3-dimensional space becomes thinner!

In fact, you can turn any elliptical orbit into any other elliptical orbit with the same energy by a 4-dimensional rotation of this sort. All elliptical orbits with the same energy are really just circular orbits on the same sphere in 4 dimensions!

Jesper Göransson explains how this works in a terse and elegant way. But I can’t resist summarizing the key results.

### The Kepler problem

Suppose we have a particle moving in an inverse square force law. Its equation of motion is

$\displaystyle{ m \ddot{\mathbf{r}} = - \frac{k \mathbf{r}}{r^3} }$

where $\mathbf{r}$ is its position as a function of time, $r$ is its distance from the origin, $m$ is its mass, and $k$ says how strong the force is. From this we can derive the law of conservation of energy, which says

$\displaystyle{ \frac{m \dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{k}{r} = E }$

for some constant $E$ that depends on the particle’s orbit, but doesn’t change with time.

Let’s consider an attractive force, so $k > 0,$ and elliptical orbits, so $E < 0.$ Let's call the particle a 'planet'. It's a planet moving around the sun, where we treat the sun as so heavy that it remains perfectly fixed at the origin.

I only want to study orbits of a single fixed energy $E.$ This frees us to choose units of mass, length and time in which

$m = 1, \;\; k = 1, \;\; E = -\frac{1}{2}$

This will reduce the clutter of letters and let us focus on the key ideas. If you prefer an approach that keeps in the units, see Göransson’s paper.

Now the equation of motion is

$\displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} }$

and conservation of energy says

$\displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} }$

The big idea, apparently due to Moser, is to switch from our ordinary notion of time to a new notion of time! We’ll call this new time $s,$ and demand that

$\displaystyle{ \frac{d s}{d t} = \frac{1}{r} }$

This new kind of time ticks more slowly as you get farther from the sun. So, using this new time speeds up the planet’s motion when it’s far from the sun. If that seems backwards, just think about it. For a planet very far from the sun, one day of this new time could equal a week of ordinary time. So, measured using this new time, a planet far from the sun might travel in one day what would normally take a week.

This compensates for the planet’s ordinary tendency to move slower when it’s far from the sun. In fact, with this new kind of time, a planet moves just as fast when it’s farthest from the sun as when it’s closest.

Amazing stuff happens with this new notion of time!

To see this, first rewrite conservation of energy using this new notion of time. I’ve been using a dot for the ordinary time derivative, following Newton. Let’s use a prime for the derivative with respect to $s.$ So, for example, we have

$\displaystyle{ t' = \frac{dt}{ds} = r }$

and

$\displaystyle{ \mathbf{r}' = \frac{dr}{ds} = \frac{dt}{ds}\frac{dr}{dt} = r \dot{\mathbf{r}} }$

Using this new kind of time derivative, Göransson shows that conservation of energy can be written as

$\displaystyle{ (t' - 1)^2 + \mathbf{r}' \cdot \mathbf{r}' = 1 }$

This is the equation of a sphere in 4-dimensional space!

I’ll prove that conservation of energy can be written this way later. First let’s talk about what it means. To understand it, we should treat the ordinary time coordinate $t$ and the space coordinates $(x,y,z)$ on an equal footing. The point

$(t,x,y,z)$

moves around in 4-dimensional space as the parameter $s$ changes. What we’re seeing is that the velocity of this point, namely

$\mathbf{v} = (t',x',y',z')$

moves around on a sphere in 4-dimensional space! It’s a sphere of radius one centered at the point

$(1,0,0,0)$

With some further calculation we can show some other wonderful facts:

$\mathbf{r}''' = -\mathbf{r}'$

and

$t''' = -(t' - 1)$

These are the usual equations for a harmonic oscillator, but with an extra derivative!

I’ll prove these wonderful facts later. For now let’s just think about what they mean. We can state both of them in words as follows: the 4-dimensional velocity $\mathbf{v}$ carries out simple harmonic motion about the point $(1,0,0,0).$

That’s nice. But since $\mathbf{v}$ also stays on the unit sphere centered at this point, we can conclude something even better: $v$ must move along a great circle on this sphere, at constant speed!

This implies that the spatial components of the 4-dimensional velocity have mean $0,$ while the $t$ component has mean $1.$

The first part here makes a lot of sense: our planet doesn’t drift ever farther from the Sun, so its mean velocity must be zero. The second part is a bit subtler, but it also makes sense: the ordinary time $t$ moves forward at speed 1 on average with respect to the new time parameter $s$, but its rate of change oscillates in a sinusoidal way.

If we integrate both sides of

$\mathbf{r}''' = -\mathbf{r}'$

we get

$\mathbf{r}'' = -\mathbf{r} + \mathbf{a}$

for some constant vector $\mathbf{a}.$ This says that the position $\mathbf{r}$ oscillates harmonically about a point $\mathbf{a}.$ Since $\mathbf{a}$ doesn’t change with time, it’s a conserved quantity: it’s called the Runge–Lenz vector.

Often people start with the inverse square force law, show that angular momentum and the Runge–Lenz vector are conserved, and use these 6 conserved quantities and Noether’s theorem to show there’s a 6-dimensional group of symmetries. For solutions with negative energy, this turns out to be the group of rotations in 4 dimensions, $mathrm{SO}(4).$ With more work, we can see how the Kepler problem is related to a harmonic oscillator in 4 dimensions. Doing this involves reparametrizing time.

I like Göransson’s approach better in many ways, because it starts by biting the bullet and reparametrizing time. This lets him rather efficiently show that the planet’s elliptical orbit is a projection to 3-dimensional space of a circular orbit in 4d space. The 4d rotational symmetry is then evident!

Göransson actually carries out his argument for an inverse square law in n-dimensional space; it’s no harder. The elliptical orbits in n dimensions are projections of circular orbits in n+1 dimensions. Angular momentum is a bivector in n dimensions; together with the Runge–Lenz vector it forms a bivector in n+1 dimensions. This is the conserved quantity associated to the (n+1) dimensional rotational symmetry of the problem.

He also carries out the analogous argument for positive-energy orbits, which are hyperbolas, and zero-energy orbits, which are parabolas. The hyperbolic case has the Lorentz group symmetry and the zero-energy case has Euclidean group symmetry! This was already known, but it’s nice to see how easily Göransson’s calculations handle all three cases.

### Mathematical details

Checking all this is a straightforward exercise in vector calculus, but it takes a bit of work, so let me do some here. There will still be details left to fill in, and I urge that you give it a try, because this is the sort of thing that’s more interesting to do than to watch.

There are a lot of equations coming up, so I’ll put boxes around the important ones. The basic ones are the force law, conservation of energy, and the change of variables that gives

$\boxed{ t' = r , qquad \mathbf{r}' = r \dot{\mathbf{r}} }$

$\boxed{ \displaystyle{ \frac{dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} } }$

and then use

$\displaystyle{ \dot{\mathbf{r}} = \frac{d\mathbf{r}/dt}{dt/ds} = \frac{\mathbf{r}'}{t'} }$

to obtain

$\displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{2 t'^2} - \frac{1}{t'} = -\frac{1}{2} }$

With a little algebra this gives

$\boxed{ \displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1} }$

This shows that the ‘4-velocity’

$\mathbf{v} = (t',x',y',z')$

stays on the unit sphere centered at $(1,0,0,0).$

The next step is to take the equation of motion

$\boxed{ \displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} } }$

and rewrite it using primes ($s$ derivatives) instead of dots ($t$ derivatives). We start with

$\displaystyle{ \dot{\mathbf{r}} = \frac{\mathbf{r}'}{r} }$

and differentiate again to get

$\ddot{\mathbf{r}} = \displaystyle{ \frac{1}{r} \left(\frac{\mathbf{r}'}{r}\right)' } = \displaystyle{ \frac{1}{r} \left( \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^2} \right) } = \displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} }$

Now we use our other equation for $\ddot{\mathbf{r}}$ and get

$\displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} = - \frac{\mathbf{r}}{r^3} }$

or

$r \mathbf{r}'' - r' \mathbf{r}' = -\mathbf{r}$

so

$\boxed{ \displaystyle{ \mathbf{r}'' = \frac{r' \mathbf{r}' - \mathbf{r}}{r} } }$

To go further, it’s good to get a formula for $r''$ as well. First we compute

$r' = \displaystyle{ \frac{d}{ds} (\mathbf{r} \cdot \mathbf{r})^{\frac{1}{2}} } = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}}{r} }$

and then differentiating again,

$r'' = \displaystyle{\frac{d}{ds} \frac{\mathbf{r}' \cdot \mathbf{r}}{r} } = \displaystyle{ \frac{r \mathbf{r}'' \cdot \mathbf{r} + r \mathbf{r}' \cdot \mathbf{r}' - r' \mathbf{r}' \cdot \mathbf{r}}{r^2} }$

Plugging in our formula for $\mathbf{r}''$, some wonderful cancellations occur and we get

$r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 }$

But we can do better! Remember, conservation of energy says

$\displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1}$

and we know $t' = r.$ So,

$\mathbf{r}' \cdot \mathbf{r}' = 1 - (r - 1)^2 = 2r - r^2$

and

$r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 } = 1 - r$

So, we see

$\boxed{ r'' = 1 - r }$

Can you get here more elegantly?

Since $t' = r$ this instantly gives

$\boxed{ t''' = 1 - t' }$

as desired.

Next let’s get a similar formula for $\mathbf{r}'''.$ We start with

$\displaystyle{ \mathbf{r}'' = \frac{r' \mathbf{r}' - \mathbf{r}}{r} }$

and differentiate both sides to get

$\displaystyle{ \mathbf{r}''' = \frac{r r'' \mathbf{r}' + r r' \mathbf{r}'' - r \mathbf{r}' - r'}{r^2} }$

Then plug in our formulas for $r''$ and $\mathbf{r}''.$ Some truly miraculous cancellations occur and we get

$\boxed{ \mathbf{r}''' = -\mathbf{r}' }$

I could show you how it works—but to really believe it you have to do it yourself. It’s just algebra. Again, I’d like a better way to see why this happens!

Integrating both sides—which is a bit weird, since we got this equation by differentiating both sides of another one—we get

$\boxed{ \mathbf{r}'' = -\mathbf{r} + \mathbf{a} }$

for some fixed vector $\mathbf{a},$ the Runge–Lenz vector. This says $\mathbf{r}$ undergoes harmonic motion about $\mathbf{a}.$ It’s quite remarkable that both $\mathbf{r}$ and its norm $r$ undergo harmonic motion! At first I thought this was impossible, but it’s just a very special circumstance.

The quantum version of a planetary orbit is a hydrogen atom. Everything we just did has a quantum version! For more on that, see

• Greg Egan, The ellipse and the atom.

For more of the history of this problem, see:

• John Baez, Mysteries of the gravitational 2-body problem.

This also treats quantum aspects, connections to supersymmetry and Jordan algebras, and more! Someday I’ll update it to include the material in this blog post.

## Quantum Superposition

13 March, 2015

guest post by Piotr Migdał

In this blog post I will introduce some basics of quantum mechanics, with the emphasis on why a particle being in a few places at once behaves measurably differently from a particle whose position we just don’t know. It’s a kind of continuation of the “Quantum Network Theory” series (Part 1, Part 2) by Tomi Johnson about our work in Jake Biamonte’s group at the ISI Foundation in Turin. My goal is to explain quantum community detection. Before that, I need to introduce the relevant basics of quantum mechanics, and of the classical community detection.

But before I start, let me introduce myself, as it’s my first post to Azimuth.

I just finished my quantum optics theory Ph.D in Maciej Lewenstein’s group at The Institute of Photonic Sciences in Castelldefels, a beach near Barcelona. My scientific interests range from quantum physics, through complex networks, to data-driven approach…. to pretty much anything—and now I work as a data science freelancer. I enjoy doing data visualizations (for example of relations between topics in mathematics), I am a big fan of Rényi entropy (largely thanks to Azimuth), and I’m a believer in open science. If you think that there are too many off-topic side projects here, you are absolutely right!

In my opinion, quantum mechanics is easy. Based on my gifted education experience it takes roughly 9 intense hours to introduce entanglement to students having only a very basic linear algebra background. Even more, I believe that it is possible to get familiar with quantum mechanics by just playing with it—so I am developing a Quantum Game!

### Quantum weirdness

In quantum mechanics a particle can be in a few places at once. It sounds strange. So strange, that some pioneers of quantum mechanics (including, famously, Albert Einstein) didn’t want to believe in it: not because of any disagreement with experiment, not because of any lack of mathematical beauty, just because it didn’t fit their philosophical view of physics.

It went further: in the Soviet Union the idea that electron can be in many places (resonance bonds) was considered to oppose materialism. Later, in California, hippies investigated quantum mechanics as a basis for parapsychology—which, arguably, gave birth to the field of quantum information.

As Griffiths put it in his Introduction to Quantum Mechanics (Chapter 4.4.1):

To the layman, the philosopher, or the classical physicist, a statement of the form “this particle doesn’t have a well-defined position” [...] sounds vague, incompetent, or (worst of all) profound. It is none of these.

In this guest blog post I will try to show that not only can a particle be in many places at once, but also that if it were not in many places at once then it would cause problems. That is, as fundamental phenomena as atoms forming chemical bonds, or particle moving in the vacuum, require it.

As in many other cases, the simplest non-trivial case is perfect for explaining idea, as it covers the most important phenomena, while being easy to analyze, visualize and comprehend. Quantum mechanics is not an exception—let us start with a system of two states.

### A two state system

Let us study a simplified model of the hydrogen molecular ion $\mathrm{H}_2^+$, that is, a system of two protons and one electron (see Feynman Lectures on Physics, Vol. III, Chapter 10.1). Since the protons are heavy and slow, we treat them as fixed. We focus on the electron moving in the electric field created by protons.

In quantum mechanics we describe the state of a system using a complex vector. In simple terms, this is a list of complex numbers called ‘probability amplitudes’. For an electron that can be near one proton or another, we use a list of two numbers:

$|\psi\rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

In this state the electron is near the first proton with probability $|\alpha|^2,$ and near the second one with probability $|\beta|^2.$

Note that

$|\psi \rangle = \alpha \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \beta \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

So, we say the electron is in a ‘linear combination’ or ‘superposition’ of the two states

$|1\rangle = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$

(where it’s near the first proton) and the state

$|2\rangle = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

(where it’s near the second proton).

Why do we denote unit vectors in strange brackets looking like

$| \mathrm{something} \rangle$ ?

Well, this is called Dirac notation (or bra-ket notation) and it is immensely useful in quantum mechanics. We won’t go into it in detail here; merely note that $| \cdot \rangle$ stands for a column vector and $\langle \cdot |$ stands for a row vector, while $\psi$ is a traditional symbol for a quantum state.).

Amplitudes can be thought as ‘square roots’ of probabilities. We can force an electron to localize by performing a classical measurement, for example by moving protons away and measuring which of them has neutral charge (for being coupled with the electron). Then, we get probability $| \alpha|^2$ of finding it near the first proton and $|\beta|^2$ of finding it near the second. So, we require that

$|\alpha|^2 + |\beta|^2 = 1$

Note that as amplitudes are complex, for a given probability there are many possible amplitudes. For example

$1 = |1|^2 = |-1|^2 = |i|^2 = \left| \tfrac{1+i}{\sqrt{2}} \right|^2 = \cdots$

where $i$ is the imaginary unit, with $i^2 = -1.$

We will now show that the electron ‘wants’ to be spread out. Electrons don’t really have desires, so this is physics slang for saying that the electron will have less energy if its probability of being near the first proton is equal to its probability of being near the second proton: namely, 50%.

In quantum mechanics, a Hamiltonian is a matrix that describes the relation between the energy and evolution (i.e. how the state changes in time). The expected value of the energy of any state $| \psi \rangle$ is

$E = \langle \psi | H | \psi \rangle$

Here the row vector $\langle \psi |$ is the column vector $| \psi\rangle$ after transposition and complex conjugation (i.e. changing $i$ to $-i$), and

$\langle \psi | H | \psi \rangle$

means we are doing matrix multiplication on $\langle \psi |,$ $H$ and $| \psi \rangle$ to get a number.

For the electron in the $\mathrm{H}_2^+$ molecule the Hamiltonian can be written as the following $2 \times 2$ matrix with real, positive entries:

$H = \begin{bmatrix} E_0 & \Delta \\ \Delta & E_0 \end{bmatrix},$

where $E_0$ is the energy of the electron being either in state $|1\rangle$ or state $|2\rangle$, and $\Delta$ is the ‘tunneling amplitude’, which describes how easy it is for the electron to move from neighborhood of one proton to that of the other.

The expected value—physicists call it the ‘expectation value’—of the energy of a given state $|\psi\rangle$ is:

$E = \langle \psi | H | \psi \rangle \equiv \begin{bmatrix} \alpha^* & \beta^* \end{bmatrix} \begin{bmatrix} E_0 & \Delta \\ \Delta & E_0 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}.$

The star symbol denotes the complex conjugation. If you are unfamiliar with complex numbers, just work with real numbers on which this operation does nothing.

Exercise 1. Find $\alpha$ and $\beta$ with

$|\alpha|^2 + |\beta|^2 = 1$

that minimize or maximize the expectation value of energy $\langle \psi | H | \psi \rangle$ for

$|\psi\rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

Exercise 2. What’s the expectation value value of the energy for the states $| 1 \rangle$ and $| 2 \rangle$?

Or if you are lazy, just read the answer! It is straightforward to check that

$E = (\alpha^* \alpha + \beta^* \beta) E_0 + (\alpha^* \beta + \beta^* \alpha) \Delta$

The coefficient of $E_0$ is 1, so the minimal energy is $E_0 - \Delta$ and the maximal energy is $E_0 + \Delta$. The states achieving these energies are spread out:

$| \psi_- \rangle = \begin{bmatrix} 1/\sqrt{2} \\ -1/\sqrt{2} \end{bmatrix}, \quad \text{with} \quad \quad E = E_0 - \Delta$

and

$| \psi_+ \rangle = \begin{bmatrix} 1/\sqrt{2} \\ 1/\sqrt{2} \end{bmatrix}, \quad \text{with} \quad \quad E = E_0 + \Delta$

The energies of these states are below and above the energy $E_0,$ and $\Delta$ says how much.

So, the electron is ‘happier’ (electrons don’t have moods either) to be in the state $|\psi_-\rangle$ than to be localized near only one of the protons. In other words—and this is Chemistry 101—atoms like to share electrons and it bonds them. Also, they like to share electrons in a particular and symmetric way.

For reference, $|\psi_+ \rangle$ is called ‘antibonding state’. If the electron is in this state, the atoms will get repelled from each other—and so much for the molecule!

### How to classically add quantum things

How can we tell a difference between an electron being in a superposition between two states, and just not knowing its ‘real’ position? Well, first we need to devise a way to describe probabilistic mixtures.

It looks simple—if we have an electron in the state $|1\rangle$ or $|2\rangle$ with probabilities $1/2$, we may be tempted to write

$|\psi\rangle = \tfrac{1}{\sqrt{2}} |1\rangle + \tfrac{1}{\sqrt{2}} |2\rangle$

We’re getting the right probabilities, so it looks legit. But there is something strange about the energy. We have obtained the state $|\psi_+\rangle$ with energy $E_0+\Delta$ by mixing two states with the energy $E_0$!

Moreover, we could have used different amplitudes such that $|\alpha|^2=|\beta|^2=1/2$ and gotten different energies. So, we need to devise a way to avoid guessing amplitudes. All in all, we used quotation marks for ‘square roots’ for a reason!

It turns out that to describe statistical mixtures we can use density matrices.

The states we’d been looking at before are described by vectors like this:

$| \psi \rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

These are called ‘pure states’. For a pure state, here is how we create a density matrix:

$\rho = | \psi \rangle \langle \psi | \equiv \begin{bmatrix} \alpha \alpha^* & \alpha \beta^*\\ \beta \alpha^* & \beta \beta^* \end{bmatrix}$

On the diagonal we get probabilities ($|\alpha|^2$ and $|\beta|^2$), whereas the off-diagonal terms ($\alpha \beta^*$ and its complex conjugate) are related to the presence of quantum effects. For example, for $|\psi_-\rangle$ we get

$\rho = \begin{bmatrix} 1/2 & -1/2\\ -1/2 & 1/2 \end{bmatrix}$

For an electron in the state $|1\rangle$ we get

$\rho = \begin{bmatrix} 1 & 0\\ 0 & 0 \end{bmatrix}.$

To calculate the energy, the recipe is the following:

$E = \mathrm{tr}[H \rho]$

where $\mathrm{tr}$ is the ‘trace‘: the sum of the diagonal entries. For a $n \times n$ square matrix with entries $A_{ij}$ its trace is

$\mathrm{tr}(A) = A_{11} + A_{22} + \ldots + A_{nn}$

Exercise 3. Show that this formula for energy, and the previous one, give the same result on pure states.

I advertised that density matrices allow us to mix quantum states. How do they do that? Very simple: just by adding density matrices, multiplied by the respective probabilities:

$\rho = p_1 \rho_1 + p_2 \rho_2 + \cdots + p_n \rho_n$

It is exactly how we would mix probability vectors. Indeed, the diagonals are probability vectors!

So, let’s say that our co-worker was drunk and we are not sure if (s)he said that the state is $|\psi_-\rangle$ or $|1\rangle$. However, we think that the probabilities are $1/3$ and $2/3.$ We get the density matrix:

$\rho = \begin{bmatrix} 5/6 & -1/6\\ -1/6 & 1/6 \end{bmatrix}$

Exercise 4. Show that calculating energy using density matrix gives the same result as averaging energy over component pure states.

I may have given the impression that density matrix is an artificial thing, at best—a practical trick, and what we ‘really’ have are pure states (vectors), each with a given probability. If so, the next exercise is for you:

Exercise 5. Show that a 50%-50% mixture of $|1\rangle$ and $|2\rangle$ is the same as a 50%-50% mixture of $|\psi_+\rangle$ and $|\psi_-\rangle$.

This is different than statistical mechanics, or statistics, where we can always think about probability distributions as uniquely defined statistical mixtures of possible states. Here, as we see, it can be a bit more tricky.

As we said, for the diagonals things work as for classical probabilities. But there is more—at the same time as adding probabilities we also add the off-diagonal terms, which can add up to cancel, depending on their signs. It’s why it’s mixing quantum states may make them losing their quantum properties.

The value of the off-diagonal term is related to so-called ‘coherence’ between the states $|1\rangle$ and $|2\rangle$. Its value is bounded by the respective probabilities:

$\left| \rho_{12} \right| \leq \sqrt{\rho_{11}\rho_{22}} = \sqrt{p_1 p_2}$

where for pure states we get equality.

If the value is zero, there are no quantum effects between two positions: this means that the electron is sure to be at one place or the other, though we might be uncertain at which place. This is fundamentally different from a superposition (non-zero $\rho_{12}$), where we are uncertain at which site a particle is, but it can no longer be thought to be at one site or the other: it must be in some way associated with both simultaneously.

Exercise 6. For each $c \in [-1,1]$ propose how to obtain a mixed state described by density matrix

$\rho = \begin{bmatrix} 1/2 & c/2\\ c/2 & 1/2 \end{bmatrix}$

by mixing pure states of your choice.

### A spatial wavefunction

A similar thing works for position. Instead of a two-level system let’s take a particle in one dimension. The analogue of a state vector is a wavefunction, a complex-valued function on a line:

$\psi(x)$

In this continuous variant, $p(x) = |\psi(x)|^2$ is the probability density of finding particle in one place.

We construct the density matrix (or rather: ‘density operator’) in an way that is analogous to what we did for the two-level system:

$\rho(x, x') = \psi(x) \psi^*(x')$

Instead of a 2×2 matrix matrix, it is a complex function of two real variables. The probability density can be described by its diagonal values, i.e.

$p(x) = \rho(x,x)$

Again, we may wonder if the particle energetically favors being in many places at once. Well, it does.

Density matrices for a classical and quantum state. They yield the same probability distributions (for positions). However, their off-diagonal values (i.e. $x\neq x’$) are different. The classical state is just a probabilistic mixture of a particle being in a particular place.

What would happen if we had a mixture of perfectly localized particles? Due to Heisenberg’s uncertainly principle we have

$\Delta x \Delta p \geq \frac{\hbar}{2},$

that is, that the product of standard deviations of position and momentum is at least some value.

If we exactly know the position, then the uncertainty of momentum goes to infinity. (The same thing holds if we don’t know position, but it can be known, even in principle. Quantum mechanics couldn’t care less if the particle’s position is known by us, by our friend, by our detector or by a dust particle.)

The Hamiltonian represents energy, the energy of a free particle in continuous system is

$H=p^2/(2m)$

where $m$ is its mass, and $p$ is its momentum: that is, mass times velocity. So, if the particle is completely localized:

• its energy is infinite,
• its velocity are infinite, so in no time its wavefunction will spread everywhere.

Infinite energies sometimes happen if physics. But if we get infinite velocities we see that there is something wrong. So a particle needs to be spread out, or ‘delocalized’, to some degree, to have finite energy.

As a side note, to consider high energies we would need to employ special relativity. In fact, one cannot localize a massive particle that much, as it will create a soup of particles and antiparticles, once its energy related to momentum uncertainty is as much as the energy related to its mass; see the Darwin term in the fine structure.

Moreover, depending on the degree of its delocalization its behavior is different. For example, a statistical mixture of highly localized particles would spread a lot faster than the same $p(x)$ but derived from a single wavefunction. The density matrix of the former would be in between of that of pure state (a ‘circular’ Gaussian function) and the classical state (a ‘linear’ Gaussian). That is, it would be an ‘oval’ Gaussian, with off-diagonal values being smaller than for the pure state.

Let us look at two Gaussian wavefunctions, with varying level of coherent superposition between them. That is, each Gaussian is already a superposition, but when we combine two we let ourselves use a superposition, or a mixture, or something in between. For a perfect superposition of Gaussian, we would have the density matrix

$\rho(x,x') = \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \right) \left( \phi(x'+\tfrac{d}{2}) + \phi(x'-\tfrac{d}{2}) \right)$

where $\phi(x)$ is a normalized Gaussian function. For a statistical mixture between these Gaussians split by a distance of $d,$ we would have:

$\rho(x,x') = \frac{1}{2} \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) + \frac{1}{2} \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})$

And in general,

$\begin{array}{ccl} \rho(x,x') &=& \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})\right) + \\ \\ && \frac{c}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'-\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) \right) \end{array}$

for some $|c| \leq 1.$

PICTURE: Two Gaussian wavefunctions (centered at -2 and +2) in a coherent superposition with each other (the first and the last plot) and a statistical mixture (the middle plot); the 2nd and 4th plot show intermediary states. Superposition can be with different phase, much like the hydrogen example. Color represents absolute value and hue phase; here red is for positive numbers and teal is for negative.

(Click to enlarge.)

### Conclusion

We have seen learnt the difference between the quantum superposition and the statistical mixture of states. In particular, while both of these descriptions may give the same probabilities, their predictions on the physical properties of states differ. For example, we need an electron to be delocalized in a specific way to describe chemical bonds; and we need delocalization of any particle to predict its movement.

We used density matrices to express both quantum superposition and (classical) lack of knowledge on the same ground. We have identified its off-diagonal terms as ones related to the quantum coherence.

But what if there were not only two states, but many? So, instead of $\mathrm{H}_2^+$ (we were not even considering the full hydrogen atom, but only its ionized version), how about electric excitation on something bigger? Not even $\mathrm{C}_2\mathrm{H}_5\mathrm{OH}$ or some sugar, but a protein complex!

So, this will be your homework (cf. this homework on topology). Just joking, there will be another blog post.

## Trends in Reaction Network Theory (Part 1)

27 January, 2015

For those who have been following the posts on reaction networks, this workshop should be interesting! I hope to see you there.

Workshop on Mathematical Trends in Reaction Network Theory, 1-3 July 2015, Department of Mathematical Sciences, University of Copenhagen. Organized by Elisenda Feliu and Carsten Wiuf.

### Description

This workshop focuses on current and new trends in mathematical reaction network theory, which we consider broadly to be the theory describing the behaviour of systems of (bio)chemical reactions. In recent years, new interesting approaches using theory from dynamical systems, stochastics, algebra and beyond, have appeared. We aim to provide a forum for discussion of these new approaches and to bring together researchers from different communities.

### Structure

The workshop starts in the morning of Wednesday, July 1st, and finishes at lunchtime on Friday, July 3rd. In the morning there will be invited talks, followed by contributed talks in the afternoon. There will be a reception and poster session Wednesday in the afternoon, and a conference dinner Thursday. For those participants staying Friday afternoon, a sightseeing event will be arranged.

### Organization

The workshop is organized by the research group on Mathematics of Reaction Networks at the Department of Mathematical Sciences, University of Copenhagen. The event is sponsored by the Danish Research Council, the Department of Mathematical Sciences and the Dynamical Systems Interdisciplinary Network, which is part of the UCPH Excellence Programme for Interdisciplinary Research.

### Confirmed invited speakers

Nikki Meskhat (North Carolina State University, US)

Alan D. Rendall (Johannes Gutenberg Universität Mainz, Germany)

• János Tóth (Budapest University of Technology and Economics, Hungary)

Sebastian Walcher (RWTH Aachen, Germany)

Gheorghe Craciun (University of Wisconsin, Madison, US)

David Doty (California Institute of Technology, US)

>

Manoj Gopalkrishnan (Tata Institute of Fundamental Research, India)

Michal Komorowski (Institute of Fundamental Technological Research, Polish Academy of Sciences, Poland)

John Baez (University of California, Riverside, US)

### Important dates

Abstract submission for posters and contributed talks: March 15, 2015.

Notification of acceptance: March 26, 2015.

Conference: July 1-3, 2015.

### The organizers

The organizers are Elisenda Feliu and Carsten Wiuf at the Department of Mathematical Sciences of the University of Copenhagen.

They’ve written some interesting papers on reaction networks, including some that discuss chemical reactions with more than one stationary state. This is a highly nonlinear regime that’s very important in biology:

• Elisenda Feliu and Carsten Wiuf, A computational method to preclude multistationarity in networks of interacting species, Bioinformatics 29 (2013), 2327-2334.

Motivation. Modeling and analysis of complex systems are important aspects of understanding systemic behavior. In the lack of detailed knowledge about a system, we often choose modeling equations out of convenience and search the (high-dimensional) parameter space randomly to learn about model properties. Qualitative modeling sidesteps the issue of choosing specific modeling equations and frees the inference from specific properties of the equations. We consider classes of ordinary differential equation (ODE) models arising from interactions of species/entities, such as (bio)chemical reaction networks or ecosystems. A class is defined by imposing mild assumptions on the interaction rates. In this framework, we investigate whether there can be multiple positive steady states in some ODE models in a given class.

Results. We have developed and implemented a method to decide whether any ODE model in a given class cannot have multiple steady states. The method runs efficiently on models of moderate size. We tested the method on a large set of models for gene silencing by sRNA interference and on two publicly available databases of biological models, KEGG and Biomodels. We recommend that this method is used as (i) a pre-screening step for selecting an appropriate model and (ii) for investigating the robustness of non-existence of multiple steady state for a given ODE model with respect to variation in interaction rates.

Availability and Implementation. Scripts and examples in Maple are available in the Supplementary Information.

• Elisenda Feliu, Injectivity, multiple zeros, and multistationarity in reaction networks, Proceedings of the Royal Society A.

Abstract. Polynomial dynamical systems are widely used to model and study real phenomena. In biochemistry, they are the preferred choice for modelling the concentration of chemical species in reaction networks with mass-action kinetics. These systems are typically parameterised by many (unknown) parameters. A goal is to understand how properties of the dynamical systems depend on the parameters. Qualitative properties relating to the behaviour of a dynamical system are locally inferred from the system at steady state. Here we focus on steady states that are the positive solutions to a parameterised system of generalised polynomial equations. In recent years, methods from computational algebra have been developed to understand these solutions, but our knowledge is limited: for example, we cannot efficiently decide how many positive solutions the system has as a function of the parameters. Even deciding whether there is one or more solutions is non-trivial. We present a new method, based on so-called injectivity, to preclude or assert that multiple positive solutions exist. The results apply to generalised polynomials and variables can be restricted to the linear, parameter-independent first integrals of the dynamical system. The method has been tested in a wide range of systems.

You can see more of their papers on their webpages.

## A Second Law for Open Markov Processes

15 November, 2014

guest post by Blake Pollard

What comes to mind when you hear the term ‘random process’? Do you think of Brownian motion? Do you think of particles hopping around? Do you think of a drunkard staggering home?

Today I’m going to tell you about a version of the drunkard’s walk with a few modifications. Firstly, we don’t have just one drunkard: we can have any positive real number of drunkards. Secondly, our drunkards have no memory; where they go next doesn’t depend on where they’ve been. Thirdly, there are special places, such as entrances to bars, where drunkards magically appear and disappear.

The second condition says that our drunkards satisfy the Markov property, making their random walk into a Markov process. The third condition is really what I want to tell you about, because it makes our Markov process into a more general ‘open Markov process’.

There are a collection of places the drunkards can be, for example:

$V= \{ \text{bar},\text{sidewalk}, \text{street}, \text{taco truck}, \text{home} \}$

We call this set $V$ the set of states. There are certain probabilities associated with traveling between these places. We call these transition rates. For example it is more likely for a drunkard to go from the bar to the taco truck than to go from the bar to home so the transition rate between the bar and the taco truck should be greater than the transition rate from the bar to home. Sometimes you can’t get from one place to another without passing through intermediate places. In reality the drunkard can’t go directly from the bar to the taco truck: he or she has to go from the bar to sidewalk to the taco truck.

This information can all be summarized by drawing a directed graph where the positive numbers labelling the edges are the transition rates:

For simplicity we draw only three states: home, bar, taco truck. Drunkards go from home to the bar and back, but they never go straight from home to the taco truck.

We can keep track of where all of our drunkards are using a vector with 3 entries:

$\displaystyle{ p(t) = \left( \begin{array}{c} p_h(t) \\ p_b(t) \\ p_{tt}(t) \end{array} \right) \in \mathbb{R}^3 }$

We call this our population distribution. The first entry $p_h$ is the number of drunkards that are at home, the second $p_b$ is how many are at the bar, and the third $p_{tt}$ is how many are at the taco truck.

There is a set of coupled, linear, first-order differential equations we can write down using the information in our graph that tells us how the number of drunkards in each place change with time. This is called the master equation:

$\displaystyle{ \frac{d p}{d t} = H p }$

where $H$ is a 3×3 matrix which we call the Hamiltonian. The off-diagonal entries are nonnegative:

$H_{ij} \geq 0, i \neq j$

and the columns sum to zero:

$\sum_i H_{ij}=0$

We call a matrix satisfying these conditions infinitesimal stochastic. Stochastic matrices have columns that sum to one. If we take the exponential of an infinitesimal stochastic matrix we get one whose columns sum to one, hence the label ‘infinitesimal’.

The Hamiltonian for the graph above is

$H = \left( \begin{array}{ccc} -2 & 5 & 10 \\ 2 & -12 & 0 \\ 0 & 7 & -10 \end{array} \right)$

John has written a lot about Markov processes and infinitesimal stochastic Hamiltonians in previous posts.

Given two vectors $p,q \in \mathbb{R}^3$ describing the populations of drunkards which obey the same master equation, we can calculate the relative entropy of $p$ relative to $q$:

$\displaystyle{ S(p,q) = \sum_{ i \in V} p_i \ln \left( \frac{p_i}{q_i} \right) }$

This is an example of a ‘divergence’. In statistics, a divergence a way of measuring the distance between probability distributions, which may not be symmetrical and may even not obey the triangle inequality.

The relative entropy is important because it decreases monotonically with time, making it a Lyapunov function for Markov processes. Indeed, it is a well known fact that

$\displaystyle{ \frac{dS(p(t),q(t) ) } {dt} \leq 0 }$

This is true for any two population distributions which evolve according to the same master equation, though you have to allow infinity as a possible value for the relative entropy and negative infinity for its time derivative.

Why is entropy decreasing? Doesn’t the Second Law of Thermodynamics say entropy increases?

Don’t worry: the reason is that I have not put a minus sign in my definition of relative entropy. Put one in if you like, and then it will increase. Sometimes without the minus sign it’s called the Kullback–Leibler divergence. This decreases with the passage of time, saying that any two population distributions $p(t)$ and $q(t)$ get ‘closer together’ as they get randomized with the passage of time.

That itself is a nice result, but I want to tell you what happens when you allow drunkards to appear and disappear at certain states. Drunkards appear at the bar once they’ve had enough to drink and once they are home for long enough they can disappear. The set of places where drunkards can appear or disappear $B$ is called the set of boundary states.  So for the above process

$B = \{ \text{home},\text{bar} \}$

is the set of boundary states. This changes the way in which the population of drunkards changes with time!

The drunkards at the taco truck obey the master equation. For them,

$\displaystyle{ \frac{dp_{tt}}{dt} = 7p_b -10 p_{tt} }$

still holds. But because the populations can appear or disappear at the boundary states the master equation no longer holds at those states! Instead it is useful to define the flow of drunkards into the $i^{th}$ state by

$\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt}-\sum_j H_{ij} p_j}$

This quantity describes by how much the rate of change of the populations at the boundary states differ from that given by the master equation.

The reason why we are interested in open Markov processes is because you can take two open Markov processes and glue them together along some subset of their boundary states to get a new open Markov process! This allows us to build up or break down complicated Markov processes using open Markov processes as the building blocks.

For example we can draw the graph corresponding to the drunkards’ walk again, only now we will distinguish boundary states from internal states by coloring internal states blue and having boundary states be white:

Consider another open Markov process with states

$V=\{ \text{home},\text{work},\text{bar} \}$

where

$B=\{ \text{home}, \text{bar}\}$

are the boundary states, leaving

$I=\{\text{work}\}$

as an internal state:

Since the boundary states of this process overlap with the boundary states of the first process we can compose the two to form a new Markov process:

Notice the boundary states are now internal states. I hope any Markov process that could approximately model your behavior has more interesting nodes! There is a nice way to figure out the Hamiltonian of the composite from the Hamiltonians of the pieces, but we will leave that for another time.

We can ask ourselves, how does relative entropy change with time in open Markov processes? You can read my paper for the details, but here is the punchline:

$\displaystyle{ \frac{dS(p(t),q(t) ) }{dt} \leq \sum_{i \in B} \frac{Dp_i}{Dt}\frac{\partial S}{\partial p_i} + \frac{Dq_i}{Dt}\frac{\partial S}{\partial q_i} }$

This is a version of the Second Law of Thermodynamics for open Markov processes.

It is important to notice that the sum is only over the boundary states! This inequality tells us that relative entropy still decreases inside our process, but depending on the flow of populations through the boundary states the relative entropy of the whole process could either increase or decrease! This inequality will be important when we study how the relative entropy changes in different parts of a bigger more complicated process.

That is all for now, but I leave it as an exercise for you to imagine a Markov process that describes your life. How many states does it have? What are the relative transition rates? Are there states you would like to spend more or less time in? Are there states somewhere you would like to visit?

Here is my paper, which proves the above inequality:

• Blake Pollard, A Second Law for open Markov processes.

If you have comments or corrections, let me know!