## Categories in Control

23 April, 2015

To understand ecosystems, ultimately will be to understand networks. – B. C. Patten and M. Witkamp

A while back I decided one way to apply my math skills to help save the planet was to start pushing toward green mathematics: a kind of mathematics that can interact with biology and ecology just as fruitfully as traditional mathematics interacts with physics. As usual with math, the payoffs will come slowly, but they may be large. It’s not a substitute for doing other, more urgent things—but if mathematicians don’t do this, who will?

As a first step in this direction, I decided to study networks.

This May, a small group of mathematicians is meeting in Turin for a workshop on the categorical foundations of network theory, organized by Jacob Biamonte. I’m trying to get us mentally prepared for this. We all have different ideas, yet they should fit together somehow.

Tobias Fritz, Eugene Lerman and David Spivak have all written articles here about their work, though I suspect Eugene will have a lot of completely new things to say, too. Now it’s time for me to say what my students and I have been doing.

Despite my ultimate aim of studying biological and ecological networks, I decided to start by clarifying the math of networks that appear in chemistry and engineering, since these are simpler, better understood, useful in their own right, and probably a good warmup for the grander goal. I’ve been working with Brendan Fong on electrical ciruits, and with Jason Erbele on control theory. Let me talk about this paper:

• John Baez and Jason Erbele, Categories in control.

Control theory is the branch of engineering that focuses on manipulating open systems—systems with inputs and outputs—to achieve desired goals. In control theory, signal-flow diagrams are used to describe linear ways of manipulating signals, for example smooth real-valued functions of time. Here’s a real-world example; click the picture for more details:

For a category theorist, at least, it is natural to treat signal-flow diagrams as string diagrams in a symmetric monoidal category. This forces some small changes of perspective, which I’ll explain, but more important is the question: which symmetric monoidal category?

We argue that the answer is: the category $\mathrm{FinRel}_k$ of finite-dimensional vector spaces over a certain field $k,$ but with linear relations rather than linear maps as morphisms, and direct sum rather than tensor product providing the symmetric monoidal structure. We use the field $k = \mathbb{R}(s)$ consisting of rational functions in one real variable $s.$ This variable has the meaning of differentation. A linear relation from $k^m$ to $k^n$ is thus a system of linear constant-coefficient ordinary differential equations relating $m$ ‘input’ signals and $n$ ‘output’ signals.

Our main goal in this paper is to provide a complete ‘generators and relations’ picture of this symmetric monoidal category, with the generators being familiar components of signal-flow diagrams. It turns out that the answer has an intriguing but mysterious connection to ideas that are familiar in the diagrammatic approach to quantum theory! Quantum theory also involves linear algebra, but it uses linear maps between Hilbert spaces as morphisms, and the tensor product of Hilbert spaces provides the symmetric monoidal structure.

We hope that the category-theoretic viewpoint on signal-flow diagrams will shed new light on control theory. However, in this paper we only lay the groundwork.

### Signal flow diagrams

There are several basic operations that one wants to perform when manipulating signals. The simplest is multiplying a signal by a scalar. A signal can be amplified by a constant factor:

$f \mapsto cf$

where $c \in \mathbb{R}.$ We can write this as a string diagram:

Here the labels $f$ and $c f$ on top and bottom are just for explanatory purposes and not really part of the diagram. Control theorists often draw arrows on the wires, but this is unnecessary from the string diagram perspective. Arrows on wires are useful to distinguish objects from their
duals, but ultimately we will obtain a compact closed category where each object is its own dual, so the arrows can be dropped. What we really need is for the box denoting scalar multiplication to have a clearly defined input and output. This is why we draw it as a triangle. Control theorists often use a rectangle or circle, using arrows on wires to indicate which carries the input $f$ and which the output $c f.$

A signal can also be integrated with respect to the time variable:

$f \mapsto \int f$

Mathematicians typically take differentiation as fundamental, but engineers sometimes prefer integration, because it is more robust against small perturbations. In the end it will not matter much here. We can again draw integration as a string diagram:

Since this looks like the diagram for scalar multiplication, it is natural to extend $\mathbb{R}$ to $\mathbb{R}(s),$ the field of rational functions of a variable $s$ which stands for differentiation. Then differentiation becomes a special case of scalar multiplication, namely multiplication by $s,$ and integration becomes multiplication by $1/s.$ Engineers accomplish the same effect with Laplace transforms, since differentiating a signal $f$ is equivalent to multiplying its Laplace transform

$\displaystyle{ (\mathcal{L}f)(s) = \int_0^\infty f(t) e^{-st} \,dt }$

by the variable $s.$ Another option is to use the Fourier transform: differentiating $f$ is equivalent to multiplying its Fourier transform

$\displaystyle{ (\mathcal{F}f)(\omega) = \int_{-\infty}^\infty f(t) e^{-i\omega t}\, dt }$

by $-i\omega.$ Of course, the function $f$ needs to be sufficiently well-behaved to justify calculations involving its Laplace or Fourier transform. At a more basic level, it also requires some work to treat integration as the two-sided inverse of differentiation. Engineers do this by considering signals that vanish for $t < 0,$ and choosing the antiderivative that vanishes under the same condition. Luckily all these issues can be side-stepped in a formal treatment of signal-flow diagrams: we can simply treat signals as living in an unspecified vector space over the field $\mathbb{R}(s).$ The field $\mathbb{C}(s)$ would work just as well, and control theory relies heavily on complex analysis. In our paper we work over an arbitrary field $k.$

The simplest possible signal processor is a rock, which takes the 'input' given by the force $F$ on the rock and produces as 'output' the rock's position $q.$ Thanks to Newton's second law $F=ma,$ we can describe this using a signal-flow diagram:

Here composition of morphisms is drawn in the usual way, by attaching the output wire of one morphism to the input wire of the next.

To build more interesting machines we need more building blocks, such as addition:

$+ : (f,g) \mapsto f + g$

and duplication:

$\Delta : f \mapsto (f,f)$

When these linear maps are written as matrices, their matrices are transposes of each other. This is reflected in the string diagrams for addition and duplication:

The second is essentially an upside-down version of the first. However, we draw addition as a dark triangle and duplication as a light one because we will later want another way to ‘turn addition upside-down’ that does not give duplication. As an added bonus, a light upside-down triangle resembles the Greek letter $\Delta,$ the usual symbol for duplication.

While they are typically not considered worthy of mention in control theory, for completeness we must include two other building blocks. One is the zero map from the zero-dimensional vector space $\{0\}$ to our field $k,$ which we denote as $0$ and draw as follows:

The other is the zero map from $k$ to $\{0\},$ sometimes called ‘deletion’, which we denote as $!$ and draw thus:

Just as the matrices for addition and duplication are transposes of each other, so are the matrices for zero and deletion, though they are rather degenerate, being $1 \times 0$ and $0 \times 1$ matrices, respectively. Addition and zero make $k$ into a commutative monoid, meaning that the following relations hold:

The equation at right is the commutative law, and the crossing of strands is the braiding:

$B : (f,g) \mapsto (g,f)$

by which we switch two signals. In fact this braiding is a symmetry, so it does not matter which strand goes over which:

Dually, duplication and deletion make $k$ into a cocommutative comonoid. This means that if we reflect the equations obeyed by addition and zero across the horizontal axis and turn dark operations into light ones, we obtain another set of valid equations:

There are also relations between the monoid and comonoid operations. For example, adding two signals and then duplicating the result gives the same output as duplicating each signal and then adding the results:

This diagram is familiar in the theory of Hopf algebras, or more generally bialgebras. Here it is an example of the fact that the monoid operations on $k$ are comonoid homomorphisms—or equivalently, the comonoid operations are monoid homomorphisms.

We summarize this situation by saying that $k$ is a bimonoid. These are all the bimonoid laws, drawn as diagrams:

The last equation means we can actually make the diagram at left disappear, since it equals the identity morphism on the 0-dimensional vector space, which is drawn as nothing.

So far all our string diagrams denote linear maps. We can treat these as morphisms in the category $\mathrm{FinVect}_k,$ where objects are finite-dimensional vector spaces over a field $k$ and morphisms are linear maps. This category is equivalent to the category where the only objects are vector spaces $k^n$ for $n \ge 0,$ and then morphisms can be seen as $n \times m$ matrices. The space of signals is a vector space $V$ over $k$ which may not be finite-dimensional, but this does not cause a problem: an $n \times m$ matrix with entries in $k$ still defines a linear map from $V^n$ to $V^m$ in a functorial way.

In applications of string diagrams to quantum theory, we make $\mathrm{FinVect}_k$ into a symmetric monoidal category using the tensor product of vector spaces. In control theory, we instead make $\mathrm{FinVect}_k$ into a symmetric monoidal category using the direct sum of vector spaces. In Lemma 1 of our paper we prove that for any field $k,$ $\mathrm{FinVect}_k$ with direct sum is generated as a symmetric monoidal category by the one object $k$ together with these morphisms:

where $c \in k$ is arbitrary.

However, these generating morphisms obey some unexpected relations! For example, we have:

Thus, it is important to find a complete set of relations obeyed by these generating morphisms, thus obtaining a presentation of $\mathrm{FinVect}_k$ as a symmetric monoidal category. We do this in Theorem 2. In brief, these relations say:

(1) $(k, +, 0, \Delta, !)$ is a bicommutative bimonoid;

(2) the rig operations of $k$ can be recovered from the generating morphisms;

(3) all the generating morphisms commute with scalar multiplication.

Here item (2) means that $+, \cdot, 0$ and $1$ in the field $k$ can be expressed in terms of signal-flow diagrams as follows:

Multiplicative inverses cannot be so expressed, so our signal-flow diagrams so far do not know that $k$ is a field. Additive inverses also cannot be expressed in this way. So, we expect that a version of Theorem 2 will hold whenever $k$ is a mere rig: that is, a ‘ring without negatives’, like the natural numbers. The one change is that instead of working with vector spaces, we should work with finitely presented free $k$-modules.

Item (3), the fact that all our generating morphisms commute with scalar multiplication, amounts to these diagrammatic equations:

While Theorem 2 is a step towards understanding the category-theoretic underpinnings of control theory, it does not treat signal-flow diagrams that include ‘feedback’. Feedback is one of the most fundamental concepts in control theory because a control system without feedback may be highly sensitive to disturbances or unmodeled behavior. Feedback allows these uncontrolled behaviors to be mollified. As a string diagram, a basic feedback system might look schematically like this:

The user inputs a ‘reference’ signal, which is fed into a controller, whose output is fed into a system, which control theorists call a ‘plant’, which in turn produces its own output. But then the system’s output is duplicated, and one copy is fed into a sensor, whose output is added (or if we prefer, subtracted) from the reference signal.

In string diagrams—unlike in the usual thinking on control theory—it is essential to be able to read any diagram from top to bottom as a composite of tensor products of generating morphisms. Thus, to incorporate the idea of feedback, we need two more generating morphisms. These are the ‘cup’:

and ‘cap’:

These are not maps: they are relations. The cup imposes the relation that its two inputs be equal, while the cap does the same for its two outputs. This is a way of describing how a signal flows around a bend in a wire.

To make this precise, we use a category called $\mathrm{FinRel}_k.$ An object of this category is a finite-dimensional vector space over $k,$ while a morphism from $U$ to $V,$ denoted $L : U \rightharpoonup V,$ is a linear relation, meaning a linear subspace

$L \subseteq U \oplus V$

In particular, when $k = \mathbb{R}(s),$ a linear relation $L : k^m \to k^n$ is just an arbitrary system of constant-coefficient linear ordinary differential equations relating $m$ input variables and $n$ output variables.

Since the direct sum $U \oplus V$ is also the cartesian product of $U$ and $V,$ a linear relation is indeed a relation in the usual sense, but with the property that if $u \in U$ is related to $v \in V$ and $u' \in U$ is related to $v' \in V$ then $cu + c'u'$ is related to $cv + c'v'$ whenever $c,c' \in k.$

We compose linear relations $L : U \rightharpoonup V$ and $L' : V \rightharpoonup W$ as follows:

$L'L = \{(u,w) \colon \; \exists\; v \in V \;\; (u,v) \in L \textrm{ and } (v,w) \in L'\}$

Any linear map $f : U \to V$ gives a linear relation $F : U \rightharpoonup V,$ namely the graph of that map:

$F = \{ (u,f(u)) : u \in U \}$

Composing linear maps thus becomes a special case of composing linear relations, so $\mathrm{FinVect}_k$ becomes a subcategory of $\mathrm{FinRel}_k.$ Furthermore, we can make $\mathrm{FinRel}_k$ into a monoidal category using direct sums, and it becomes symmetric monoidal using the braiding already present in $\mathrm{FinVect}_k.$

In these terms, the cup is the linear relation

$\cup : k^2 \rightharpoonup \{0\}$

given by

$\cup \; = \; \{ (x,x,0) : x \in k \} \; \subseteq \; k^2 \oplus \{0\}$

while the cap is the linear relation

$\cap : \{0\} \rightharpoonup k^2$

given by

$\cap \; = \; \{ (0,x,x) : x \in k \} \; \subseteq \; \{0\} \oplus k^2$

These obey the zigzag relations:

Thus, they make $\mathrm{FinRel}_k$ into a compact closed category where $k,$ and thus every object, is its own dual.

Besides feedback, one of the things that make the cap and cup useful is that they allow any morphism $L : U \rightharpoonup V$ to be ‘plugged in backwards’ and thus ‘turned around’. For instance, turning around integration:

we obtain differentiation. In general, using caps and cups we can turn around any linear relation $L : U \rightharpoonup V$ and obtain a linear relation $L^\dagger : V \rightharpoonup U,$ called the adjoint of $L,$ which turns out to given by

$L^\dagger = \{(v,u) : (u,v) \in L \}$

For example, if $c \in k$ is nonzero, the adjoint of scalar multiplication by $c$ is multiplication by $c^{-1}$:

Thus, caps and cups allow us to express multiplicative inverses in terms of signal-flow diagrams! One might think that a problem arises when when $c = 0,$ but no: the adjoint of scalar multiplication by $0$ is

$\{(0,x) : x \in k \} \subseteq k \oplus k$

In Lemma 3 we show that $\mathrm{FinRel}_k$ is generated, as a symmetric monoidal category, by these morphisms:

where $c \in k$ is arbitrary.

In Theorem 4 we find a complete set of relations obeyed by these generating morphisms,thus giving a presentation of $\mathrm{FinRel}_k$ as a symmetric monoidal category. To describe these relations, it is useful to work with adjoints of the generating morphisms. We have already seen that the adjoint of scalar multiplication by $c$ is scalar multiplication by $c^{-1},$ except when $c = 0.$ Taking adjoints of the other four generating morphisms of $\mathrm{FinVect}_k,$ we obtain four important but perhaps unfamiliar linear relations. We draw these as ‘turned around’ versions of the original generating morphisms:

Coaddition is a linear relation from $k$ to $k^2$ that holds when the two outputs sum to the input:

$+^\dagger : k \rightharpoonup k^2$

$+^\dagger = \{(x,y,z) : \; x = y + z \} \subseteq k \oplus k^2$

Cozero is a linear relation from $k$ to $\{0\}$ that holds when the input is zero:

$0^\dagger : k \rightharpoonup \{0\}$

$0^\dagger = \{ (0,0)\} \subseteq k \oplus \{0\}$

Coduplication is a linear relation from $k^2$ to $k$ that holds when the two inputs both equal the output:

$\Delta^\dagger : k^2 \rightharpoonup k$

$\Delta^\dagger = \{(x,y,z) : \; x = y = z \} \subseteq k^2 \oplus k$

Codeletion is a linear relation from $\{0\}$ to $k$ that holds always:

$!^\dagger : \{0\} \rightharpoonup k$

$!^\dagger = \{(0,x) \} \subseteq \{0\} \oplus k$

Since $+^\dagger,0^\dagger,\Delta^\dagger$ and $!^\dagger$ automatically obey turned-around versions of the relations obeyed by $+,0,\Delta$ and $!,$ we see that $k$ acquires a second bicommutative bimonoid structure when considered as an object in $\mathrm{FinRel}_k.$

Moreover, the four dark operations make $k$ into a Frobenius monoid. This means that $(k,+,0)$ is a monoid, $(k,+^\dagger, 0^\dagger)$ is a comonoid, and the Frobenius relation holds:

All three expressions in this equation are linear relations saying that the sum of the two inputs equal the sum of the two outputs.

The operation sending each linear relation to its adjoint extends to a contravariant functor

$\dagger : \mathrm{FinRel}_k\ \to \mathrm{FinRel}_k$

which obeys a list of properties that are summarized by saying that $\mathrm{FinRel}_k$ is a †-compact category. Because two of the operations in the Frobenius monoid $(k, +,0,+^\dagger,0^\dagger)$ are adjoints of the other two, it is a †-Frobenius monoid.

This Frobenius monoid is also special, meaning that
comultiplication (in this case $+^\dagger$) followed by multiplication (in this case $+$) equals the identity:

This Frobenius monoid is also commutative—and cocommutative, but for Frobenius monoids this follows from commutativity.

Starting around 2008, commutative special †-Frobenius monoids have become important in the categorical foundations of quantum theory, where they can be understood as ‘classical structures’ for quantum systems. The category $\mathrm{FinHilb}$ of finite-dimensional Hilbert spaces and linear maps is a †-compact category, where any linear map $f : H \to K$ has an adjoint $f^\dagger : K \to H$ given by

$\langle f^\dagger \phi, \psi \rangle = \langle \phi, f \psi \rangle$

for all $\psi \in H, \phi \in K .$ A commutative special †-Frobenius monoid in $\mathrm{FinHilb}$ is then the same as a Hilbert space with a chosen orthonormal basis. The reason is that given an orthonormal basis $\psi_i$ for a finite-dimensional Hilbert space $H,$ we can make $H$ into a commutative special †-Frobenius monoid with multiplication $m : H \otimes H \to H$ given by

$m (\psi_i \otimes \psi_j ) = \left\{ \begin{array}{cl} \psi_i & i = j \\ 0 & i \ne j \end{array}\right.$

and unit $i : \mathbb{C} \to H$ given by

$i(1) = \sum_i \psi_i$

The comultiplication $m^\dagger$ duplicates basis states:

$m^\dagger(\psi_i) = \psi_i \otimes \psi_i$

Conversely, any commutative special †-Frobenius monoid in $\mathrm{FinHilb}$ arises this way.

Considerably earlier, around 1995, commutative Frobenius monoids were recognized as important in topological quantum field theory. The reason, ultimately, is that the free symmetric monoidal category on a commutative Frobenius monoid is $2\mathrm{Cob},$ the category with 2-dimensional oriented cobordisms as morphisms. But the free symmetric monoidal category on a commutative special Frobenius monoid was worked out even earlier: it is the category with finite sets as objects, where a morphism $f : X \to Y$ is an isomorphism class of cospans

$X \longrightarrow S \longleftarrow Y$

This category can be made into a †-compact category in an obvious way, and then the 1-element set becomes a commutative special †-Frobenius monoid.

For all these reasons, it is interesting to find a commutative special †-Frobenius monoid lurking at the heart of control theory! However, the Frobenius monoid here has yet another property, which is more unusual. Namely, the unit $0 : \{0\} \rightharpoonup k$ followed by the counit $0^\dagger : k \rightharpoonup \{0\}$ is the identity:

We call a special Frobenius monoid that also obeys this extra law extra-special. One can check that the free symmetric monoidal category on a commutative extra-special Frobenius monoid is the category with finite sets as objects, where a morphism $f : X \to Y$ is an equivalence relation on the disjoint union $X \sqcup Y,$ and we compose $f : X \to Y$ and $g : Y \to Z$ by letting $f$ and $g$ generate an equivalence relation on $X \sqcup Y \sqcup Z$ and then restricting this to $X \sqcup Z.$

As if this were not enough, the light operations share many properties with the dark ones. In particular, these operations make $k$ into a commutative extra-special †-Frobenius monoid in a second way. In summary:

$(k, +, 0, \Delta, !)$ is a bicommutative bimonoid;

$(k, \Delta^\dagger, !^\dagger, +^\dagger, 0^\dagger)$ is a bicommutative bimonoid;

$(k, +, 0, +^\dagger, 0^\dagger)$ is a commutative extra-special †-Frobenius monoid;

$(k, \Delta^\dagger, !^\dagger, \Delta, !)$ is a commutative extra-special †-Frobenius monoid.

It should be no surprise that with all these structures built in, signal-flow diagrams are a powerful method of designing processes.

However, it is surprising that most of these structures are present in a seemingly very different context: the so-called ZX calculus, a diagrammatic formalism for working with complementary observables in quantum theory. This arises naturally when one has an $n$-dimensional Hilbert space $H$ with two orthonormal bases $\psi_i, \phi_i$ that are mutually unbiased, meaning that

$|\langle \psi_i, \phi_j \rangle|^2 = \displaystyle{\frac{1}{n}}$

for all $1 \le i, j \le n.$ Each orthonormal basis makes $H$ into commutative special †-Frobenius monoid in $\mathrm{FinHilb}.$ Moreover, the multiplication and unit of either one of these Frobenius monoids fits together with the comultiplication and counit of the other to form a bicommutative bimonoid. So, we have all the structure present in the list above—except that these Frobenius monoids are only extra-special if $H$ is 1-dimensional.

The field $k$ is also a 1-dimensional vector space, but this is a red herring: in $\mathrm{FinRel}_k$ every finite-dimensional vector space naturally acquires all four structures listed above, since addition, zero, duplication and deletion are well-defined and obey all the relations we have discussed. Jason and I focus on $k$ in our paper simply because it generates all the objects $\mathrm{FinRel}_k$ via direct sum.

Finally, in $\mathrm{FinRel}_k$ the cap and cup are related to the light and dark operations as follows:

Note the curious factor of $-1$ in the second equation, which breaks some of the symmetry we have seen so far. This equation says that two elements $x, y \in k$ sum to zero if and only if $-x = y.$ Using the zigzag relations, the two equations above give

We thus see that in $\mathrm{FinRel}_k,$ both additive and multiplicative inverses can be expressed in terms of the generating morphisms used in signal-flow diagrams.

Theorem 4 of our paper gives a presentation of $\mathrm{FinRel}_k$ based on the ideas just discussed. Briefly, it says that $\mathrm{FinRel}_k$ is equivalent to the symmetric monoidal category generated by an object $k$ and these morphisms:

• addition $+: k^2 \rightharpoonup k$
• zero $0 : \{0\} \rightharpoonup k$
• duplication $\Delta: k\rightharpoonup k^2$
• deletion $! : k \rightharpoonup 0$
• scalar multiplication $c: k\rightharpoonup k$ for any $c\in k$
• cup $\cup : k^2 \rightharpoonup \{0\}$
• cap $\cap : \{0\} \rightharpoonup k^2$

obeying these relations:

(1) $(k, +, 0, \Delta, !)$ is a bicommutative bimonoid;

(2) $\cap$ and $\cup$ obey the zigzag equations;

(3) $(k, +, 0, +^\dagger, 0^\dagger)$ is a commutative extra-special †-Frobenius monoid;

(4) $(k, \Delta^\dagger, !^\dagger, \Delta, !)$ is a commutative extra-special †-Frobenius monoid;

(5) the field operations of $k$ can be recovered from the generating morphisms;

(6) the generating morphisms (1)-(4) commute with scalar multiplication.

Note that item (2) makes $\mathrm{FinRel}_k$ into a †-compact category, allowing us to mention the adjoints of generating morphisms in the subsequent relations. Item (5) means that $+, \cdot, 0, 1$ and also additive and multiplicative inverses in the field $k$ can be expressed in terms of signal-flow diagrams in the manner we have explained.

So, we have a good categorical understanding of the linear algebra used in signal flow diagrams!

Now Jason is moving ahead to apply this to some interesting problems… but that’s another story, for later.

## Resource Convertibility (Part 3)

13 April, 2015

guest post by Tobias Fritz

In Part 1 and Part 2, we learnt about ordered commutative monoids and how they formalize theories of resource convertibility and combinability. In this post, I would like to say a bit about the applications that have been explored so far. First, the study of resource theories has become a popular subject in quantum information theory, and many of the ideas in my paper actually originate there. I’ll list some references at the end. So I hope that the toolbox of ordered commutative monoids will turn out to be useful for this. But here I would like to talk about an example application that is much easier to understand, but no less difficult to analyze: graph theory and the resource theory of zero-error communication.

A graph consists of a bunch of nodes connected by a bunch of edges, for example like this:

This particular graph is the pentagon graph or 5-cycle. To give it some resource-theoretic interpretation, think of it as the distinguishability graph of a communication channel, where the nodes are the symbols that can be sent across the channel, and two symbols share an edge if and only if they can be unambiguously decoded. For example, the pentagon graph roughly corresponds to the distinguishability graph of my handwriting, when restricted to five letters only:

So my ‘w’ is distinguishable from my ‘u’, but it may be confused for my ‘m’. In order to communicate unambiguously, it looks like I should restrict myself to using only two of those letters in writing, since any third of them may be mistaken for one of the other three. But alternatively, I could use a block code to create context around each letter which allows for perfect disambiguation. This is what happens in practice: I write in natural language, where an entire word is usually not ambiguous.

One can now also consider graph homomorphisms, which are maps like this:

The numbers on the nodes indicate where each node on the left gets mapped to. Formally, a graph homomorphism is a function taking nodes to nodes such that adjacent nodes get mapped to adjacent nodes. If a homomorphism $G\to H$ exists between graphs $G$ and $H,$ then we also write $H\geq G$; in terms of communication channels, we can interpret this as saying that $H$ simulates $G,$ since the homomorphism provides a map between the symbols which preserves distinguishability. A ‘code’ for a communication channel is then just a homomorphism from the complete graph in which all nodes share an edge to the graph which describes the channel. With this ordering structure, the collection of all finite graphs forms an ordered set. This ordered set has an intricate structure which is intimately related to some big open problems in graph theory.

We can also combine two communication channels to form a compound one. Going back to the handwriting example, we can consider the new channel in which the symbols are pairs of letters. Two such pairs are distinguishable if and only if either the first letters of each pair are distinguishable or the second letters are,

$(a,b) \sim (a',b') \:\Leftrightarrow\: a\sim a' \:\lor\: b\sim b'$

When generalized to arbitrary graphs, this yields the definition of disjunctive product of graphs. It is not hard to show that this equips the ordered set of graphs with a binary operation compatible with the ordering, so that we obtain an ordered commutative monoid denoted Grph. It mathematically formalizes the resource theory of zero-error communication.

Using the toolbox of ordered commutative monoids combined with some concrete computations on graphs, one can show that Grph is not cancellative: if $K_{11}$ is the complete graph on 11 nodes, then $3C_5\not\geq K_{11},$ but there exists a graph $G$ such that

$3 C_5 + G \geq K_{11} + G$

The graph $G$ turns out to have 136 nodes. This result seems to be new. But if you happen to have seen something like this before, please let me know!

Last time, we also talked about rates of conversion. In Grph, it turns out that some of these correspond to famous graph invariants! For example, the rate of conversion from a graph $G$ to the single-edge graph $K_2$ is Shannon capacity $\Theta(\overline{G}),$ where $\overline{G}$ is the complement graph. This is of no surprise since $\Theta$ was originally defined by Shannon with precisely this rate in mind, although he did not use the language of ordered commutative monoids. In any case, the Shannon capacity $\Theta(\overline{G})$ is a graph invariant notorious for its complexity: it is not known whether there exists an algorithm to compute it! But an application of the Rate Theorem from Part 2 gives us a formula for the Shannon capacity:

$\Theta(\overline{G}) = \inf_f f(G)$

where $f$ ranges over all graph invariants which are monotone under graph homomorphisms, multiplicative under disjunctive product, and normalized such that $f(K_2) = 2.$ Unfortunately, this formula still not produce an algorithm for computing $\Theta.$ But it nonconstructively proves the existence of many new graph invariants $f$ which approximate the Shannon capacity from above.

Although my story ends here, I also feel that the whole project has barely started. There are lots of directions to explore! For example, it would be great to fit Shannon’s noisy channel coding theorem into this framework, but this has turned out be technically challenging. If you happen to be familiar with rate-distortion theory and you want to help out, please get in touch!

#### References

Here is a haphazard selection of references on resource theories in quantum information theory and related fields:

• Igor Devetak, Aram Harrow and Andreas Winter, A resource framework for quantum Shannon theory.

• Gilad Gour, Markus P. Müller, Varun Narasimhachar, Robert W. Spekkens and Nicole Yunger Halpern, The resource theory of informational nonequilibrium in thermodynamics.

• Fernando G.S.L. Brandão, Michał Horodecki, Nelly Huei Ying Ng, Jonathan Oppenheim and Stephanie Wehner, The second laws of quantum thermodynamics.

• Iman Marvian and Robert W. Spekkens, The theory of manipulations of pure state asymmetry: basic tools and equivalence classes of states under symmetric operations.

• Elliott H. Lieb and Jakob Yngvason, The physics and mathematics of the second law of thermodynamics.

## Resource Convertibility (Part 2)

10 April, 2015

guest post by Tobias Fritz

In Part 1, I introduced ordered commutative monoids as a mathematical formalization of resources and their convertibility. Today I’m going to say something about what to do with this formalization. Let’s start with a quick recap!

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Recall also that we think of the $x,y\in A$ as resource objects such that $x+y$ represents the object consisting of $x$ and $y$ together, and $x\geq y$ means that the resource object $x$ can be converted into $y.$

When confronted with an abstract definition like this, many people ask: so what is it useful for? The answer to this is twofold: first, it provides a language which we can use to guide our thoughts in any application context. Second, the definition itself is just the very start: we can now also prove theorems about ordered commutative monoids, which can be instantiated in any particular application context. So the theory of ordered commutative monoids will provide a useful toolbox for talking about concrete resource theories and studying them. In the remainder of this post, I’d like to say a bit about what this toolbox contains. For more, you’ll have to read the paper!

To start, let’s consider catalysis as one of the resource-theoretic phenomena neatly captured by ordered commutative monoids. Catalysis is the phenomenon that certain conversions become possible only due to the presence of a catalyst, which is an additional resource object which does not get consumed in the process of the conversion. For example, we have

$\text{timber + nails}\not\geq \text{table},$

$\text{timber + nails + saw + hammer} \geq \text{table + saw + hammer}$

because making a table from timber and nails requires a saw and a hammer as tools. So in this example, ‘saw $+$ hammer’ is a catalyst for the conversion of ‘timber $+$ nails’ into ‘table’. In mathematical language, catalysis occurs precisely when the ordered commutative monoid is not cancellative, which means that $x + z\geq y + z$ sometimes holds even though $x\geq y$ does not. So, the notion of catalysis perfectly matches up with a very natural and familiar notion from algebra.

One can continue along these lines and study those ordered commutative monoids which are cancellative. It turns out that every ordered commutative monoid can be made cancellative in a universal way; in the resource-theoretic interpretation, this boils down to replacing the convertibility relation by catalytic convertibility, in which $x$ is declared to be convertible into $y$ as soon as there exists a catalyst which achieves this conversion. Making an ordered commutative monoid cancellative like this is a kind of ‘regularization’: it leads to a mathematically more well-behaved structure. As it turns out, there are several additional steps of regularization that can be performed, and all of these are both mathematically natural and have an appealing resource-theoretic interpretation. These regularizations successively take us from the world of ordered commutative monoids to the realm of linear algebra and functional analysis, where powerful theorems are available. For now, let me not go into the details, but only try to summarize one of the consequences of this development. This requires a bit of preparation.

In many situations, it is not just of interest to convert a single copy of some resource object $x$ into a single copy of some $y;$ instead, one may be interested in converting many copies of $x$ into many copies of $y$ all together, and thereby maximizing (or minimizing) the ratio of the resulting number of $y$‘s compared to the number of $x$‘s that get consumed. This ratio is measured by the maximal rate:

$\displaystyle{ R_{\mathrm{max}}(x\to y) = \sup \left\{ \frac{m}{n} \:|\: nx \geq my \right\} }$

Here, $m$ and $n$ are natural numbers, and $nx$ stands for the $n$-fold sum $x+\cdots+x,$ and similarly for $my.$ So this maximal rate quantifies how many $y$’ s we can get out of one copy of $x,$ when working in a ‘mass production’ setting. There is also a notion of regularized rate, which has a slightly more complicated definition that I don’t want to spell out here, but is similar in spirit. The toolbox of ordered commutative monoids now provides the following result:

Rate Theorem: If $x\geq 0$ and $y\geq 0$ in an ordered commutative monoid $A$ which satisfies a mild technical assumption, then the maximal regularized rate from $x$ to $y$ can be computed like this:

$\displaystyle{ R_{\mathrm{max}}^{\mathrm{reg}}(x\to y) = \inf_f \frac{f(y)}{f(x)} }$

where $f$ ranges over all functionals on $A$ with $f(y)\neq 0.$

Wait a minute, what’s a ‘functional’? It’s defined to be a map $f:A\to\mathbb{R}$ which is monotone,

$x\geq y \:\Rightarrow\: f(x)\geq f(y)$

$f(x+y) = f(x) + f(y)$

In economic terms, we can think of a functional as a consistent assignment of prices to all resource objects. If $x$ is at least as useful as $y,$ then the price of $x$ should be at least as high as the price of $y$; and the price of two objects together should be the sum of their individual prices. So the $f$ in the rate formula above ranges over all ‘markets’ on which resource objects can be ‘traded’ at consistent prices. The term ‘functional’ is supposed to hint at a relation to functional analysis. In fact, the proof of the theorem crucially relies on the Hahn–Banach Theorem.

The mild technical mentioned in the Rate Theorem is that the ordered commutative monoid needs to have a generating pair. This turns out to hold in the applications that I have considered so far, and I hope that it will turn out to hold in most others as well. For the full gory details, see the paper.

So this provides some idea of what kinds of gadgets one can find in the toolbox of ordered commutative monoids. Next time, I’ll show some applications to graph theory and zero-error communication and say a bit about where this project might be going next.

## Resource Convertibility (Part 1)

7 April, 2015

guest post by Tobias Fritz

Hi! I am Tobias Fritz, a mathematician at the Perimeter Institute for Theoretical Physics in Waterloo, Canada. I like to work on all sorts of mathematical structures which pop up in probability theory, information theory, and other sorts of applied math. Today I would like to tell you about my latest paper:

It should be of interest to Azimuth readers as it forms part of what John likes to call ‘green mathematics’. So let’s get started!

Resources and their management are an essential part of our everyday life. We deal with the management of time or money pretty much every day. We also consume natural resources in order to afford food and amenities for (some of) the 7 billion people on our planet. Many of the objects that we deal with in science and engineering can be considered as resources. For example, a communication channel is a resource for sending information from one party to another. But for now, let’s stick with a toy example: timber and nails constitute a resource for making a table. In mathematical notation, this looks like so:

$\mathrm{timber} + \mathrm{nails} \geq \mathrm{table}$

We interpret this inequality as saying that “given timber and nails, we can make a table”. I like to write it as an inequality like this, which I think of as stating that having timber and nails is at least as good as having a table, because the timber and nails can always be turned into a table whenever one needs a table.

To be more precise, we should also take into account that making the table requires some tools. These tools do not get consumed in the process, so we also get them back out:

$\text{timber} + \text{nails} + \text{saw} + \text{hammer} \geq \text{table} + \text{hammer} + \text{saw}$

Notice that this kind of equation is analogous to a chemical reaction equation like this:

$2 \mathrm{H}_2 + \mathrm{O}_2 \geq \mathrm{H}_2\mathrm{O}$

So given a hydrogen molecules and an oxygen molecule, we can let them react such as to form a molecule of water. In chemistry, this kind of equation would usually be written with an arrow ‘$\rightarrow$’ instead of an ordering symbol ‘$\geq$’ , but here we interpret the equation slightly differently. As with the timber and the nails and nails above, the inequality says that if we have two hydrogen atoms and an oxygen atom, then we can let them react to a molecule of water, but we don’t have to. In this sense, having two hydrogen atoms and an oxygen atom is at least as good as having a molecule of water.

So what’s going on here, mathematically? In all of the above equations, we have a bunch of stuff on each side and an inequality ‘$\geq$’ in between. The stuff on each side consists of a bunch of objects tacked together via ‘$+$’ . With respect to these two pieces of structure, the collection of all our resource objects forms an ordered commutative monoid:

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Here, the third axiom is the most important, since it tells us how the additive structure interacts with the ordering structure.

Ordered commutative monoids are the mathematical formalization of resource convertibility and combinability as follows. The elements $x,y\in A$ are the resource objects, corresponding to the ‘collections of stuff’ in our earlier examples, such as $x = \text{timber} + \text{nails}$ or $y = 2 \text{H}_2 + \text{O}_2.$ Then the addition operation simply joins up collections of stuff into bigger collections of stuff. The ordering relation $\geq$ is what formalizes resource convertibility, as in the examples above. The third axiom states that if we can convert $x$ into $y,$ then we can also convert $x$ together with $z$ into $y$ together with $z$ for any $z,$ for example by doing nothing to $z.$

A mathematically minded reader might object that requiring $A$ to form a partially ordered set under $\geq$ is too strong a requirement, since it requires two resource objects to be equal as soon as they are mutually interconvertible: $x \geq y$ and $y \geq x$ implies $x = y.$ However, I think that this is not an essential restriction, because we can regard this implication as the definition of equality: ‘$x = y$’ is just a shorthand notation for ‘$x\geq y$ and $y\geq x$’ which formalizes the perfect interconvertibility of resource objects.

We could now go back to the original examples and try to model carpentry and chemistry in terms of ordered commutative monoids. But as a mathematician, I needed to start out with something mathematically precise and rigorous as a testing ground for the formalism. This helps ensure that the mathematics is sensible and useful before diving into real-world applications. So, the main example in my paper is the ordered commutative monoid of graphs, which has a resource-theoretic interpretation in terms of zero-error information theory. As graph theory is a difficult and traditional subject, this application constitutes the perfect training camp for the mathematics of ordered commutative monoids. I will get to this in Part 3.

In Part 2, I will say something about what one can do with ordered commutative monoids. In the meantime, I’d be curious to know what you think about what I’ve said so far!

## Thermodynamics with Continuous Information Flow

21 March, 2015

guest post by Blake S. Pollard

Over a century ago James Clerk Maxwell created a thought experiment that has helped shape our understanding of the Second Law of Thermodynamics: the law that says entropy can never decrease.

Maxwell’s proposed experiment was simple. Suppose you had a box filled with an ideal gas at equilibrium at some temperature. You stick in an insulating partition, splitting the box into two halves. These two halves are isolated from one another except for one important caveat: somewhere along the partition resides a being capable of opening and closing a door, allowing gas particles to flow between the two halves. This being is also capable of observing the velocities of individual gas particles. Every time a particularly fast molecule is headed towards the door the being opens it, letting fly into the other half of the box. When a slow particle heads towards the door the being keeps it closed. After some time, fast molecules would build up on one side of the box, meaning half the box would heat up! To an observer it would seem like the box, originally at a uniform temperature, would for some reason start splitting up into a hot half and a cold half. This seems to violate the Second Law (as well as all our experience with boxes of gas).

Of course, this apparent violation probably has something to do with positing the existence of intelligent microscopic doormen. This being, and the thought experiment itself, are typically referred to as Maxwell’s demon.

Photo credit: Peter MacDonald, Edmonds, UK

When people cook up situations that seem to violate the Second Law there is typically a simple resolution: you have to consider the whole system! In the case of Maxwell’s demon, while the entropy of the box decreases, the entropy of the system as a whole, demon include, goes up. Precisely quantifying how Maxwell’s demon doesn’t violate the Second Law has led people to a better understanding of the role of information in thermodynamics.

At the American Physical Society March Meeting in San Antonio, Texas, I had the pleasure of hearing some great talks on entropy, information, and the Second Law. Jordan Horowitz, a postdoc at Boston University, gave a talk on his work with Massimiliano Esposito, a researcher at the University of Luxembourg, on how one can understand situations like Maxwell’s demon (and a whole lot more) by analyzing the flow of information between subsystems.

Consider a system made up of two parts, $X$ and $Y$. Each subsystem has a discrete set of states. Each systems makes transitions among these discrete states. These dynamics can be modeled as Markov processes. They are interested in modeling the thermodynamics of information flow between subsystems. To this end they consider a bipartite system, meaning that either $X$ transitions or $Y$ transitions, never both at the same time. The probability distribution $p(x,y)$ of the whole system evolves according to the master equation:

$\displaystyle{ \frac{dp(x,y)}{dt} = \sum_{x', y'} H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}p(x,y) }$

where $H_{x,x'}^{y,y'}$ is the rate at which the system transitions from $(x',y') \to (x,y).$ The ‘bipartite’ condition means that $H$ has the form

$H_{x,x'}^{y,y'} = \left\{ \begin{array}{cc} H_{x,x'}^y & x \neq x'; y=y' \\ H_x^{y,y'} & x=x'; y \neq y' \\ 0 & \text{otherwise.} \end{array} \right.$

The joint system is an open system that satisfies the second law of thermodynamics:

$\displaystyle{ \frac{dS_i}{dt} = \frac{dS_{XY}}{dt} + \frac{dS_e}{dt} \geq 0 }$

where

$\displaystyle{ S_{XY} = - \sum_{x,y} p(x,y) \ln ( p(x,y) ) }$

is the Shannon entropy of the system, satisfying

$\displaystyle{ \frac{dS_{XY} }{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{p(x',y')}{p(x,y)} \right) }$

and

$\displaystyle{ \frac{dS_e}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{ H_{x,x'}^{y,y'} } {H_{x',x}^{y',y} } \right) }$

is the entropy change of the environment.

We want to investigate how the entropy production of the whole system relates to entropy production in the bipartite pieces $X$ and $Y$. To this end they define a new flow, the information flow, as the time rate of change of the mutual information

$\displaystyle{ I = \sum_{x,y} p(x,y) \ln \left( \frac{p(x,y)}{p(x)p(y)} \right) }$

Its time derivative can be split up as

$\displaystyle{ \frac{dI}{dt} = \frac{dI^X}{dt} + \frac{dI^Y}{dt}}$

where

$\displaystyle{ \frac{dI^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y} p(x',y) - H_{x',x}^{y}p(x,y) \right] \ln \left( \frac{ p(y|x) }{p(y|x')} \right) }$

and

$\displaystyle{ \frac{dI^Y}{dt} = \sum_{x,y} \left[ H_{x}^{y,y'}p(x,y') - H_{x}^{y',y}p(x,y) \right] \ln \left( \frac{p(x|y)}{p(x|y')} \right) }$

are the information flows associated with the subsystems $X$ and $Y$ respectively.

When

$\displaystyle{ \frac{dI^X}{dt} > 0}$

a transition in $X$ increases the mutual information $I,$ meaning that $X$ ‘knows’ more about $Y$ and vice versa.

We can rewrite the entropy production entering into the second law in terms of these information flows as

$\displaystyle{ \frac{dS_i}{dt} = \frac{dS_i^X}{dt} + \frac{dS_i^Y}{dt} }$

where

$\displaystyle{ \frac{dS_i^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^y p(x',y) - H_{x',x}^y p(x,y) \right] \ln \left( \frac{H_{x,x'}^y p(x',y) } {H_{x',x}^y p(x,y) } \right) \geq 0 }$

and similarly for $\frac{dS_Y}{dt}$. This gives the following decomposition of entropy production in each subsystem:

$\displaystyle{ \frac{dS_i^X}{dt} = \frac{dS^X}{dt} + \frac{dS^X_e}{dt} - \frac{dI^X}{dt} \geq 0 }$

$\displaystyle{ \frac{dS_i^Y}{dt} = \frac{dS^Y}{dt} + \frac{dS^X_e}{dt} - \frac{dI^Y}{dt} \geq 0},$

where the inequalities hold for each subsystem. To see this, if you write out the left hand side of each inequality you will find that they are both of the form

$\displaystyle{ \sum_{x,y} \left[ x-y \right] \ln \left( \frac{x}{y} \right) }$

which is non-negative for $x,y \geq 0$.

The interaction between the subsystems is contained entirely in the information flow terms. Neglecting these terms gives rise to situations like Maxwell’s demon where a subsystem seems to violate the second law.

Lots of Markov processes have boring equilibria $\frac{dp}{dt} = 0$ where there is no net flow among the states. Markov processes also admit non-equilibrium steady states, where there may be some constant flow of information. In this steady state all explicit time derivatives are zero, including the net information flow:

$\displaystyle{ \frac{dI}{dt} = 0 }$

which implies that $\frac{dI^X}{dt} = - \frac{dI^Y}{dt}.$ In this situation the above inequalities become

$\displaystyle{ \frac{dS^X_i}{dt} = \frac{dS_e^X}{dt} - \frac{dI^X}{dt} }$

and

$\displaystyle{ \frac{dS^Y_i}{dt} = \frac{dS_e^X}{dt} + \frac{dI^X}{dt} }.$

If

$\displaystyle{ \frac{dI^X}{dt} > 0 }$

then $X$ is learning something about $Y$ or acting as a sensor. The first inequality

$\frac{dS_e^X}{dt} \geq \frac{dI^X}{dt}$ quantifies the minimum amount of energy $X$ must supply to do this sensing. Similarly $-\frac{dS_e^Y}{dt} \leq \frac{dI^X}{dt}$ bounds the amount of useful energy is available to $Y$ as a result of this information transfer.

In their paper Horowitz and Esposito explore a few other examples and show the utility of this simple breakup of a system into two interacting subsystems in explaining various interesting situations in which the flow of information has thermodynamic significance.

For the whole story, read their paper!

• Jordan Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Phys. Rev. X 4 (2014), 031015.

## Planets in the Fourth Dimension

17 March, 2015

You probably that planets go around the sun in elliptical orbits. But do you know why?

In fact, they’re moving in circles in 4 dimensions. But when these circles are projected down to 3-dimensional space, they become ellipses!

This animation by Greg Egan shows the idea:

The plane here represents 2 of the 3 space dimensions we live in. The vertical direction is the mysterious fourth dimension. The planet goes around in a circle in 4-dimensional space. But down here in 3 dimensions, its ‘shadow’ moves in an ellipse!

What’s this fourth dimension I’m talking about here? It’s a lot like time. But it’s not exactly time. It’s the difference between ordinary time and another sort of time, which flows at a rate inversely proportional to the distance between the planet and the sun.

The movie uses this other sort of time. Relative to this other time, the planet is moving at constant speed around a circle in 4 dimensions. But in ordinary time, its shadow in 3 dimensions moves faster when it’s closer to the sun.

All this sounds crazy, but it’s not some new physics theory. It’s just a different way of thinking about Newtonian physics!

Physicists have known about this viewpoint at least since 1980, thanks to a paper by the mathematical physicist Jürgen Moser. Some parts of the story are much older. A lot of papers have been written about it.

But I only realized how simple it is when I got this paper in my email, from someone I’d never heard of before:

• Jesper Göransson, Symmetries of the Kepler problem, 8 March 2015.

I get a lot of papers by crackpots in my email, but the occasional gem from someone I don’t know makes up for all those.

The best thing about Göransson’s 4-dimensional description of planetary motion is that it gives a clean explanation of an amazing fact. You can take any elliptical orbit, apply a rotation of 4-dimensional space, and get another valid orbit!

Of course we can rotate an elliptical orbit about the sun in the usual 3-dimensional way and get another elliptical orbit. The interesting part is that we can also do 4-dimensional rotations. This can make a round ellipse look skinny: when we tilt a circle into the fourth dimension, its ‘shadow’ in 3-dimensional space becomes thinner!

In fact, you can turn any elliptical orbit into any other elliptical orbit with the same energy by a 4-dimensional rotation of this sort. All elliptical orbits with the same energy are really just circular orbits on the same sphere in 4 dimensions!

Jesper Göransson explains how this works in a terse and elegant way. But I can’t resist summarizing the key results.

### The Kepler problem

Suppose we have a particle moving in an inverse square force law. Its equation of motion is

$\displaystyle{ m \ddot{\mathbf{r}} = - \frac{k \mathbf{r}}{r^3} }$

where $\mathbf{r}$ is its position as a function of time, $r$ is its distance from the origin, $m$ is its mass, and $k$ says how strong the force is. From this we can derive the law of conservation of energy, which says

$\displaystyle{ \frac{m \dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{k}{r} = E }$

for some constant $E$ that depends on the particle’s orbit, but doesn’t change with time.

Let’s consider an attractive force, so $k > 0,$ and elliptical orbits, so $E < 0.$ Let's call the particle a 'planet'. It's a planet moving around the sun, where we treat the sun as so heavy that it remains perfectly fixed at the origin.

I only want to study orbits of a single fixed energy $E.$ This frees us to choose units of mass, length and time in which

$m = 1, \;\; k = 1, \;\; E = -\frac{1}{2}$

This will reduce the clutter of letters and let us focus on the key ideas. If you prefer an approach that keeps in the units, see Göransson’s paper.

Now the equation of motion is

$\displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} }$

and conservation of energy says

$\displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} }$

The big idea, apparently due to Moser, is to switch from our ordinary notion of time to a new notion of time! We’ll call this new time $s,$ and demand that

$\displaystyle{ \frac{d s}{d t} = \frac{1}{r} }$

This new kind of time ticks more slowly as you get farther from the sun. So, using this new time speeds up the planet’s motion when it’s far from the sun. If that seems backwards, just think about it. For a planet very far from the sun, one day of this new time could equal a week of ordinary time. So, measured using this new time, a planet far from the sun might travel in one day what would normally take a week.

This compensates for the planet’s ordinary tendency to move slower when it’s far from the sun. In fact, with this new kind of time, a planet moves just as fast when it’s farthest from the sun as when it’s closest.

Amazing stuff happens with this new notion of time!

To see this, first rewrite conservation of energy using this new notion of time. I’ve been using a dot for the ordinary time derivative, following Newton. Let’s use a prime for the derivative with respect to $s.$ So, for example, we have

$\displaystyle{ t' = \frac{dt}{ds} = r }$

and

$\displaystyle{ \mathbf{r}' = \frac{dr}{ds} = \frac{dt}{ds}\frac{dr}{dt} = r \dot{\mathbf{r}} }$

Using this new kind of time derivative, Göransson shows that conservation of energy can be written as

$\displaystyle{ (t' - 1)^2 + \mathbf{r}' \cdot \mathbf{r}' = 1 }$

This is the equation of a sphere in 4-dimensional space!

I’ll prove that conservation of energy can be written this way later. First let’s talk about what it means. To understand it, we should treat the ordinary time coordinate $t$ and the space coordinates $(x,y,z)$ on an equal footing. The point

$(t,x,y,z)$

moves around in 4-dimensional space as the parameter $s$ changes. What we’re seeing is that the velocity of this point, namely

$\mathbf{v} = (t',x',y',z')$

moves around on a sphere in 4-dimensional space! It’s a sphere of radius one centered at the point

$(1,0,0,0)$

With some further calculation we can show some other wonderful facts:

$\mathbf{r}''' = -\mathbf{r}'$

and

$t''' = -(t' - 1)$

These are the usual equations for a harmonic oscillator, but with an extra derivative!

I’ll prove these wonderful facts later. For now let’s just think about what they mean. We can state both of them in words as follows: the 4-dimensional velocity $\mathbf{v}$ carries out simple harmonic motion about the point $(1,0,0,0).$

That’s nice. But since $\mathbf{v}$ also stays on the unit sphere centered at this point, we can conclude something even better: $v$ must move along a great circle on this sphere, at constant speed!

This implies that the spatial components of the 4-dimensional velocity have mean $0,$ while the $t$ component has mean $1.$

The first part here makes a lot of sense: our planet doesn’t drift ever farther from the Sun, so its mean velocity must be zero. The second part is a bit subtler, but it also makes sense: the ordinary time $t$ moves forward at speed 1 on average with respect to the new time parameter $s$, but its rate of change oscillates in a sinusoidal way.

If we integrate both sides of

$\mathbf{r}''' = -\mathbf{r}'$

we get

$\mathbf{r}'' = -\mathbf{r} + \mathbf{a}$

for some constant vector $\mathbf{a}.$ This says that the position $\mathbf{r}$ oscillates harmonically about a point $\mathbf{a}.$ Since $\mathbf{a}$ doesn’t change with time, it’s a conserved quantity: it’s called the Runge–Lenz vector.

Often people start with the inverse square force law, show that angular momentum and the Runge–Lenz vector are conserved, and use these 6 conserved quantities and Noether’s theorem to show there’s a 6-dimensional group of symmetries. For solutions with negative energy, this turns out to be the group of rotations in 4 dimensions, $mathrm{SO}(4).$ With more work, we can see how the Kepler problem is related to a harmonic oscillator in 4 dimensions. Doing this involves reparametrizing time.

I like Göransson’s approach better in many ways, because it starts by biting the bullet and reparametrizing time. This lets him rather efficiently show that the planet’s elliptical orbit is a projection to 3-dimensional space of a circular orbit in 4d space. The 4d rotational symmetry is then evident!

Göransson actually carries out his argument for an inverse square law in n-dimensional space; it’s no harder. The elliptical orbits in n dimensions are projections of circular orbits in n+1 dimensions. Angular momentum is a bivector in n dimensions; together with the Runge–Lenz vector it forms a bivector in n+1 dimensions. This is the conserved quantity associated to the (n+1) dimensional rotational symmetry of the problem.

He also carries out the analogous argument for positive-energy orbits, which are hyperbolas, and zero-energy orbits, which are parabolas. The hyperbolic case has the Lorentz group symmetry and the zero-energy case has Euclidean group symmetry! This was already known, but it’s nice to see how easily Göransson’s calculations handle all three cases.

### Mathematical details

Checking all this is a straightforward exercise in vector calculus, but it takes a bit of work, so let me do some here. There will still be details left to fill in, and I urge that you give it a try, because this is the sort of thing that’s more interesting to do than to watch.

There are a lot of equations coming up, so I’ll put boxes around the important ones. The basic ones are the force law, conservation of energy, and the change of variables that gives

$\boxed{ t' = r , qquad \mathbf{r}' = r \dot{\mathbf{r}} }$

$\boxed{ \displaystyle{ \frac{dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} } }$

and then use

$\displaystyle{ \dot{\mathbf{r}} = \frac{d\mathbf{r}/dt}{dt/ds} = \frac{\mathbf{r}'}{t'} }$

to obtain

$\displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{2 t'^2} - \frac{1}{t'} = -\frac{1}{2} }$

With a little algebra this gives

$\boxed{ \displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1} }$

This shows that the ‘4-velocity’

$\mathbf{v} = (t',x',y',z')$

stays on the unit sphere centered at $(1,0,0,0).$

The next step is to take the equation of motion

$\boxed{ \displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} } }$

and rewrite it using primes ($s$ derivatives) instead of dots ($t$ derivatives). We start with

$\displaystyle{ \dot{\mathbf{r}} = \frac{\mathbf{r}'}{r} }$

and differentiate again to get

$\ddot{\mathbf{r}} = \displaystyle{ \frac{1}{r} \left(\frac{\mathbf{r}'}{r}\right)' } = \displaystyle{ \frac{1}{r} \left( \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^2} \right) } = \displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} }$

Now we use our other equation for $\ddot{\mathbf{r}}$ and get

$\displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} = - \frac{\mathbf{r}}{r^3} }$

or

$r \mathbf{r}'' - r' \mathbf{r}' = -\mathbf{r}$

so

$\boxed{ \displaystyle{ \mathbf{r}'' = \frac{r' \mathbf{r}' - \mathbf{r}}{r} } }$

To go further, it’s good to get a formula for $r''$ as well. First we compute

$r' = \displaystyle{ \frac{d}{ds} (\mathbf{r} \cdot \mathbf{r})^{\frac{1}{2}} } = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}}{r} }$

and then differentiating again,

$r'' = \displaystyle{\frac{d}{ds} \frac{\mathbf{r}' \cdot \mathbf{r}}{r} } = \displaystyle{ \frac{r \mathbf{r}'' \cdot \mathbf{r} + r \mathbf{r}' \cdot \mathbf{r}' - r' \mathbf{r}' \cdot \mathbf{r}}{r^2} }$

Plugging in our formula for $\mathbf{r}''$, some wonderful cancellations occur and we get

$r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 }$

But we can do better! Remember, conservation of energy says

$\displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1}$

and we know $t' = r.$ So,

$\mathbf{r}' \cdot \mathbf{r}' = 1 - (r - 1)^2 = 2r - r^2$

and

$r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 } = 1 - r$

So, we see

$\boxed{ r'' = 1 - r }$

Can you get here more elegantly?

Since $t' = r$ this instantly gives

$\boxed{ t''' = 1 - t' }$

as desired.

Next let’s get a similar formula for $\mathbf{r}'''.$ We start with

$\displaystyle{ \mathbf{r}'' = \frac{r' \mathbf{r}' - \mathbf{r}}{r} }$

and differentiate both sides to get

$\displaystyle{ \mathbf{r}''' = \frac{r r'' \mathbf{r}' + r r' \mathbf{r}'' - r \mathbf{r}' - r'}{r^2} }$

Then plug in our formulas for $r''$ and $\mathbf{r}''.$ Some truly miraculous cancellations occur and we get

$\boxed{ \mathbf{r}''' = -\mathbf{r}' }$

I could show you how it works—but to really believe it you have to do it yourself. It’s just algebra. Again, I’d like a better way to see why this happens!

Integrating both sides—which is a bit weird, since we got this equation by differentiating both sides of another one—we get

$\boxed{ \mathbf{r}'' = -\mathbf{r} + \mathbf{a} }$

for some fixed vector $\mathbf{a},$ the Runge–Lenz vector. This says $\mathbf{r}$ undergoes harmonic motion about $\mathbf{a}.$ It’s quite remarkable that both $\mathbf{r}$ and its norm $r$ undergo harmonic motion! At first I thought this was impossible, but it’s just a very special circumstance.

The quantum version of a planetary orbit is a hydrogen atom. Everything we just did has a quantum version! For more on that, see

• Greg Egan, The ellipse and the atom.

For more of the history of this problem, see:

• John Baez, Mysteries of the gravitational 2-body problem.

This also treats quantum aspects, connections to supersymmetry and Jordan algebras, and more! Someday I’ll update it to include the material in this blog post.

## Quantum Superposition

13 March, 2015

guest post by Piotr Migdał

In this blog post I will introduce some basics of quantum mechanics, with the emphasis on why a particle being in a few places at once behaves measurably differently from a particle whose position we just don’t know. It’s a kind of continuation of the “Quantum Network Theory” series (Part 1, Part 2) by Tomi Johnson about our work in Jake Biamonte’s group at the ISI Foundation in Turin. My goal is to explain quantum community detection. Before that, I need to introduce the relevant basics of quantum mechanics, and of the classical community detection.

But before I start, let me introduce myself, as it’s my first post to Azimuth.

I just finished my quantum optics theory Ph.D in Maciej Lewenstein’s group at The Institute of Photonic Sciences in Castelldefels, a beach near Barcelona. My scientific interests range from quantum physics, through complex networks, to data-driven approach…. to pretty much anything—and now I work as a data science freelancer. I enjoy doing data visualizations (for example of relations between topics in mathematics), I am a big fan of Rényi entropy (largely thanks to Azimuth), and I’m a believer in open science. If you think that there are too many off-topic side projects here, you are absolutely right!

In my opinion, quantum mechanics is easy. Based on my gifted education experience it takes roughly 9 intense hours to introduce entanglement to students having only a very basic linear algebra background. Even more, I believe that it is possible to get familiar with quantum mechanics by just playing with it—so I am developing a Quantum Game!

### Quantum weirdness

In quantum mechanics a particle can be in a few places at once. It sounds strange. So strange, that some pioneers of quantum mechanics (including, famously, Albert Einstein) didn’t want to believe in it: not because of any disagreement with experiment, not because of any lack of mathematical beauty, just because it didn’t fit their philosophical view of physics.

It went further: in the Soviet Union the idea that electron can be in many places (resonance bonds) was considered to oppose materialism. Later, in California, hippies investigated quantum mechanics as a basis for parapsychology—which, arguably, gave birth to the field of quantum information.

As Griffiths put it in his Introduction to Quantum Mechanics (Chapter 4.4.1):

To the layman, the philosopher, or the classical physicist, a statement of the form “this particle doesn’t have a well-defined position” [...] sounds vague, incompetent, or (worst of all) profound. It is none of these.

In this guest blog post I will try to show that not only can a particle be in many places at once, but also that if it were not in many places at once then it would cause problems. That is, as fundamental phenomena as atoms forming chemical bonds, or particle moving in the vacuum, require it.

As in many other cases, the simplest non-trivial case is perfect for explaining idea, as it covers the most important phenomena, while being easy to analyze, visualize and comprehend. Quantum mechanics is not an exception—let us start with a system of two states.

### A two state system

Let us study a simplified model of the hydrogen molecular ion $\mathrm{H}_2^+$, that is, a system of two protons and one electron (see Feynman Lectures on Physics, Vol. III, Chapter 10.1). Since the protons are heavy and slow, we treat them as fixed. We focus on the electron moving in the electric field created by protons.

In quantum mechanics we describe the state of a system using a complex vector. In simple terms, this is a list of complex numbers called ‘probability amplitudes’. For an electron that can be near one proton or another, we use a list of two numbers:

$|\psi\rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

In this state the electron is near the first proton with probability $|\alpha|^2,$ and near the second one with probability $|\beta|^2.$

Note that

$|\psi \rangle = \alpha \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \beta \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

So, we say the electron is in a ‘linear combination’ or ‘superposition’ of the two states

$|1\rangle = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$

(where it’s near the first proton) and the state

$|2\rangle = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$

(where it’s near the second proton).

Why do we denote unit vectors in strange brackets looking like

$| \mathrm{something} \rangle$ ?

Well, this is called Dirac notation (or bra-ket notation) and it is immensely useful in quantum mechanics. We won’t go into it in detail here; merely note that $| \cdot \rangle$ stands for a column vector and $\langle \cdot |$ stands for a row vector, while $\psi$ is a traditional symbol for a quantum state.).

Amplitudes can be thought as ‘square roots’ of probabilities. We can force an electron to localize by performing a classical measurement, for example by moving protons away and measuring which of them has neutral charge (for being coupled with the electron). Then, we get probability $| \alpha|^2$ of finding it near the first proton and $|\beta|^2$ of finding it near the second. So, we require that

$|\alpha|^2 + |\beta|^2 = 1$

Note that as amplitudes are complex, for a given probability there are many possible amplitudes. For example

$1 = |1|^2 = |-1|^2 = |i|^2 = \left| \tfrac{1+i}{\sqrt{2}} \right|^2 = \cdots$

where $i$ is the imaginary unit, with $i^2 = -1.$

We will now show that the electron ‘wants’ to be spread out. Electrons don’t really have desires, so this is physics slang for saying that the electron will have less energy if its probability of being near the first proton is equal to its probability of being near the second proton: namely, 50%.

In quantum mechanics, a Hamiltonian is a matrix that describes the relation between the energy and evolution (i.e. how the state changes in time). The expected value of the energy of any state $| \psi \rangle$ is

$E = \langle \psi | H | \psi \rangle$

Here the row vector $\langle \psi |$ is the column vector $| \psi\rangle$ after transposition and complex conjugation (i.e. changing $i$ to $-i$), and

$\langle \psi | H | \psi \rangle$

means we are doing matrix multiplication on $\langle \psi |,$ $H$ and $| \psi \rangle$ to get a number.

For the electron in the $\mathrm{H}_2^+$ molecule the Hamiltonian can be written as the following $2 \times 2$ matrix with real, positive entries:

$H = \begin{bmatrix} E_0 & \Delta \\ \Delta & E_0 \end{bmatrix},$

where $E_0$ is the energy of the electron being either in state $|1\rangle$ or state $|2\rangle$, and $\Delta$ is the ‘tunneling amplitude’, which describes how easy it is for the electron to move from neighborhood of one proton to that of the other.

The expected value—physicists call it the ‘expectation value’—of the energy of a given state $|\psi\rangle$ is:

$E = \langle \psi | H | \psi \rangle \equiv \begin{bmatrix} \alpha^* & \beta^* \end{bmatrix} \begin{bmatrix} E_0 & \Delta \\ \Delta & E_0 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix}.$

The star symbol denotes the complex conjugation. If you are unfamiliar with complex numbers, just work with real numbers on which this operation does nothing.

Exercise 1. Find $\alpha$ and $\beta$ with

$|\alpha|^2 + |\beta|^2 = 1$

that minimize or maximize the expectation value of energy $\langle \psi | H | \psi \rangle$ for

$|\psi\rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

Exercise 2. What’s the expectation value value of the energy for the states $| 1 \rangle$ and $| 2 \rangle$?

Or if you are lazy, just read the answer! It is straightforward to check that

$E = (\alpha^* \alpha + \beta^* \beta) E_0 + (\alpha^* \beta + \beta^* \alpha) \Delta$

The coefficient of $E_0$ is 1, so the minimal energy is $E_0 - \Delta$ and the maximal energy is $E_0 + \Delta$. The states achieving these energies are spread out:

$| \psi_- \rangle = \begin{bmatrix} 1/\sqrt{2} \\ -1/\sqrt{2} \end{bmatrix}, \quad \text{with} \quad \quad E = E_0 - \Delta$

and

$| \psi_+ \rangle = \begin{bmatrix} 1/\sqrt{2} \\ 1/\sqrt{2} \end{bmatrix}, \quad \text{with} \quad \quad E = E_0 + \Delta$

The energies of these states are below and above the energy $E_0,$ and $\Delta$ says how much.

So, the electron is ‘happier’ (electrons don’t have moods either) to be in the state $|\psi_-\rangle$ than to be localized near only one of the protons. In other words—and this is Chemistry 101—atoms like to share electrons and it bonds them. Also, they like to share electrons in a particular and symmetric way.

For reference, $|\psi_+ \rangle$ is called ‘antibonding state’. If the electron is in this state, the atoms will get repelled from each other—and so much for the molecule!

### How to classically add quantum things

How can we tell a difference between an electron being in a superposition between two states, and just not knowing its ‘real’ position? Well, first we need to devise a way to describe probabilistic mixtures.

It looks simple—if we have an electron in the state $|1\rangle$ or $|2\rangle$ with probabilities $1/2$, we may be tempted to write

$|\psi\rangle = \tfrac{1}{\sqrt{2}} |1\rangle + \tfrac{1}{\sqrt{2}} |2\rangle$

We’re getting the right probabilities, so it looks legit. But there is something strange about the energy. We have obtained the state $|\psi_+\rangle$ with energy $E_0+\Delta$ by mixing two states with the energy $E_0$!

Moreover, we could have used different amplitudes such that $|\alpha|^2=|\beta|^2=1/2$ and gotten different energies. So, we need to devise a way to avoid guessing amplitudes. All in all, we used quotation marks for ‘square roots’ for a reason!

It turns out that to describe statistical mixtures we can use density matrices.

The states we’d been looking at before are described by vectors like this:

$| \psi \rangle = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

These are called ‘pure states’. For a pure state, here is how we create a density matrix:

$\rho = | \psi \rangle \langle \psi | \equiv \begin{bmatrix} \alpha \alpha^* & \alpha \beta^*\\ \beta \alpha^* & \beta \beta^* \end{bmatrix}$

On the diagonal we get probabilities ($|\alpha|^2$ and $|\beta|^2$), whereas the off-diagonal terms ($\alpha \beta^*$ and its complex conjugate) are related to the presence of quantum effects. For example, for $|\psi_-\rangle$ we get

$\rho = \begin{bmatrix} 1/2 & -1/2\\ -1/2 & 1/2 \end{bmatrix}$

For an electron in the state $|1\rangle$ we get

$\rho = \begin{bmatrix} 1 & 0\\ 0 & 0 \end{bmatrix}.$

To calculate the energy, the recipe is the following:

$E = \mathrm{tr}[H \rho]$

where $\mathrm{tr}$ is the ‘trace‘: the sum of the diagonal entries. For a $n \times n$ square matrix with entries $A_{ij}$ its trace is

$\mathrm{tr}(A) = A_{11} + A_{22} + \ldots + A_{nn}$

Exercise 3. Show that this formula for energy, and the previous one, give the same result on pure states.

I advertised that density matrices allow us to mix quantum states. How do they do that? Very simple: just by adding density matrices, multiplied by the respective probabilities:

$\rho = p_1 \rho_1 + p_2 \rho_2 + \cdots + p_n \rho_n$

It is exactly how we would mix probability vectors. Indeed, the diagonals are probability vectors!

So, let’s say that our co-worker was drunk and we are not sure if (s)he said that the state is $|\psi_-\rangle$ or $|1\rangle$. However, we think that the probabilities are $1/3$ and $2/3.$ We get the density matrix:

$\rho = \begin{bmatrix} 5/6 & -1/6\\ -1/6 & 1/6 \end{bmatrix}$

Exercise 4. Show that calculating energy using density matrix gives the same result as averaging energy over component pure states.

I may have given the impression that density matrix is an artificial thing, at best—a practical trick, and what we ‘really’ have are pure states (vectors), each with a given probability. If so, the next exercise is for you:

Exercise 5. Show that a 50%-50% mixture of $|1\rangle$ and $|2\rangle$ is the same as a 50%-50% mixture of $|\psi_+\rangle$ and $|\psi_-\rangle$.

This is different than statistical mechanics, or statistics, where we can always think about probability distributions as uniquely defined statistical mixtures of possible states. Here, as we see, it can be a bit more tricky.

As we said, for the diagonals things work as for classical probabilities. But there is more—at the same time as adding probabilities we also add the off-diagonal terms, which can add up to cancel, depending on their signs. It’s why it’s mixing quantum states may make them losing their quantum properties.

The value of the off-diagonal term is related to so-called ‘coherence’ between the states $|1\rangle$ and $|2\rangle$. Its value is bounded by the respective probabilities:

$\left| \rho_{12} \right| \leq \sqrt{\rho_{11}\rho_{22}} = \sqrt{p_1 p_2}$

where for pure states we get equality.

If the value is zero, there are no quantum effects between two positions: this means that the electron is sure to be at one place or the other, though we might be uncertain at which place. This is fundamentally different from a superposition (non-zero $\rho_{12}$), where we are uncertain at which site a particle is, but it can no longer be thought to be at one site or the other: it must be in some way associated with both simultaneously.

Exercise 6. For each $c \in [-1,1]$ propose how to obtain a mixed state described by density matrix

$\rho = \begin{bmatrix} 1/2 & c/2\\ c/2 & 1/2 \end{bmatrix}$

by mixing pure states of your choice.

### A spatial wavefunction

A similar thing works for position. Instead of a two-level system let’s take a particle in one dimension. The analogue of a state vector is a wavefunction, a complex-valued function on a line:

$\psi(x)$

In this continuous variant, $p(x) = |\psi(x)|^2$ is the probability density of finding particle in one place.

We construct the density matrix (or rather: ‘density operator’) in an way that is analogous to what we did for the two-level system:

$\rho(x, x') = \psi(x) \psi^*(x')$

Instead of a 2×2 matrix matrix, it is a complex function of two real variables. The probability density can be described by its diagonal values, i.e.

$p(x) = \rho(x,x)$

Again, we may wonder if the particle energetically favors being in many places at once. Well, it does.

Density matrices for a classical and quantum state. They yield the same probability distributions (for positions). However, their off-diagonal values (i.e. $x\neq x’$) are different. The classical state is just a probabilistic mixture of a particle being in a particular place.

What would happen if we had a mixture of perfectly localized particles? Due to Heisenberg’s uncertainly principle we have

$\Delta x \Delta p \geq \frac{\hbar}{2},$

that is, that the product of standard deviations of position and momentum is at least some value.

If we exactly know the position, then the uncertainty of momentum goes to infinity. (The same thing holds if we don’t know position, but it can be known, even in principle. Quantum mechanics couldn’t care less if the particle’s position is known by us, by our friend, by our detector or by a dust particle.)

The Hamiltonian represents energy, the energy of a free particle in continuous system is

$H=p^2/(2m)$

where $m$ is its mass, and $p$ is its momentum: that is, mass times velocity. So, if the particle is completely localized:

• its energy is infinite,
• its velocity are infinite, so in no time its wavefunction will spread everywhere.

Infinite energies sometimes happen if physics. But if we get infinite velocities we see that there is something wrong. So a particle needs to be spread out, or ‘delocalized’, to some degree, to have finite energy.

As a side note, to consider high energies we would need to employ special relativity. In fact, one cannot localize a massive particle that much, as it will create a soup of particles and antiparticles, once its energy related to momentum uncertainty is as much as the energy related to its mass; see the Darwin term in the fine structure.

Moreover, depending on the degree of its delocalization its behavior is different. For example, a statistical mixture of highly localized particles would spread a lot faster than the same $p(x)$ but derived from a single wavefunction. The density matrix of the former would be in between of that of pure state (a ‘circular’ Gaussian function) and the classical state (a ‘linear’ Gaussian). That is, it would be an ‘oval’ Gaussian, with off-diagonal values being smaller than for the pure state.

Let us look at two Gaussian wavefunctions, with varying level of coherent superposition between them. That is, each Gaussian is already a superposition, but when we combine two we let ourselves use a superposition, or a mixture, or something in between. For a perfect superposition of Gaussian, we would have the density matrix

$\rho(x,x') = \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \right) \left( \phi(x'+\tfrac{d}{2}) + \phi(x'-\tfrac{d}{2}) \right)$

where $\phi(x)$ is a normalized Gaussian function. For a statistical mixture between these Gaussians split by a distance of $d,$ we would have:

$\rho(x,x') = \frac{1}{2} \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) + \frac{1}{2} \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})$

And in general,

$\begin{array}{ccl} \rho(x,x') &=& \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})\right) + \\ \\ && \frac{c}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'-\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) \right) \end{array}$

for some $|c| \leq 1.$

PICTURE: Two Gaussian wavefunctions (centered at -2 and +2) in a coherent superposition with each other (the first and the last plot) and a statistical mixture (the middle plot); the 2nd and 4th plot show intermediary states. Superposition can be with different phase, much like the hydrogen example. Color represents absolute value and hue phase; here red is for positive numbers and teal is for negative.

(Click to enlarge.)

### Conclusion

We have seen learnt the difference between the quantum superposition and the statistical mixture of states. In particular, while both of these descriptions may give the same probabilities, their predictions on the physical properties of states differ. For example, we need an electron to be delocalized in a specific way to describe chemical bonds; and we need delocalization of any particle to predict its movement.

We used density matrices to express both quantum superposition and (classical) lack of knowledge on the same ground. We have identified its off-diagonal terms as ones related to the quantum coherence.

But what if there were not only two states, but many? So, instead of $\mathrm{H}_2^+$ (we were not even considering the full hydrogen atom, but only its ionized version), how about electric excitation on something bigger? Not even $\mathrm{C}_2\mathrm{H}_5\mathrm{OH}$ or some sugar, but a protein complex!

So, this will be your homework (cf. this homework on topology). Just joking, there will be another blog post.