Applied Category Theory Course

26 March, 2018

It just became a lot easier to learn about applied category theory, thanks to this free book:

• Brendan Fong and David Spivak, Seven Sketches in Compositionality: An Invitation to Applied Category Theory.

I’ve started an informal online course based on this book on the Azimuth Forum. I’m getting pretty sick of the superficial quality of my interactions on social media. This could be a way to do something more interesting.

The idea is that you can read chapters of this book, discuss them, try the exercises in the book, ask and answer questions, and maybe team up to create software that implements some of the ideas. I’ll try to keep things moving forward. For example, I’ll explain some stuff and try to help answer questions that people are stuck on. I may also give some talks or run discussions on Google Hangouts or similar software—but only when I have time: I’m more of a text-based guy. I may get really busy some times, and leave the rest of you alone for a while. But I like writing about math for at least 15 minutes a day, and more when I have time. Furthermore, I’m obsessed with applied category theory and plan to stay that way for at least a few more years.

If this sounds interesting, let me know here—and please visit the Azimuth Forum and register! Use your full real name as your username, with no spaces. I will add spaces and that will become your username. Use a real working email address. If you don’t, the registration process may not work.

Over 70 people have registered so far, so this process will take a while.

The main advantage of the Forum over this blog is that you can initiate new threads and edit your comments. Like here you can write equations in LaTeX. Like here, that ability is severely limited: for example you can’t define macros, and you can’t use TikZ. (Maybe someone could fix that.) But equations are better typeset over there—and more importantly, the ability to edit comments makes it a lot easier to correct errors in your LaTeX.

Please let me know what you think.

What follows is the preface to Fong and Spivak’s book, just so you can get an idea of what it’s like.

Preface

Category theory is becoming a central hub for all of pure mathematics. It is unmatched in its ability to organize and layer abstractions, to find commonalities between structures of all sorts, and to facilitate communication between different mathematical communities. But it has also been branching out into science, informatics, and industry. We believe that it has the potential to be a major cohesive force in the world, building rigorous bridges between disparate worlds, both theoretical and practical. The motto at MIT is mens et manus, Latin for mind and hand. We believe that category theory—and pure math in general—has stayed in the realm of mind for too long; it is ripe to be brought to hand.

Purpose and audience

The purpose of this book is to offer a self-contained tour of applied category theory. It is an invitation to discover advanced topics in category theory through concrete real-world examples. Rather than try to give a comprehensive treatment of these topics—which include adjoint functors, enriched categories, proarrow equipments, toposes, and much more–we merely provide a taste. We want to give readers some insight into how it feels to work with these structures as well as some ideas about how they might show up in practice.

The audience for this book is quite diverse: anyone who finds the above description intriguing. This could include a motivated high school student who hasn’t seen calculus yet but has loved reading a weird book on mathematical logic they found at the library. Or a machine learning researcher who wants to understand what vector spaces, design theory, and dynamical systems could possibly have in common. Or a pure mathematician who wants to imagine what sorts of applications their work might have. Or a recently-retired programmer who’s always had an eerie feeling that category theory is what they’ve been looking for to tie it all together, but who’s found the usual books on the subject impenetrable.

For example, we find it something of a travesty that in 2018 there seems to be no introductory material available on monoidal categories. Even beautiful modern introductions to category theory, e.g. by Riehl or Leinster, do not include anything on this rather central topic. The basic idea is certainly not too abstract; modern human intuition seems to include a pre-theoretical understanding of monoidal categories that is just waiting to be formalized. Is there anyone who wouldn’t correctly understand the basic idea being communicated in the following diagram?

Many applied category theory topics seem to take monoidal categories as their jumping off point. So one aim of this book is to provide a reference—even if unconventional—for this important topic.

We hope this book inspires both new visions and new questions. We intend it to be self-contained in the sense that it is approachable with minimal prerequisites, but not in the sense that the complete story is told here. On the contrary, we hope that readers use this as an invitation to further reading, to orient themselves in what is becoming a large literature, and to discover new applications for themselves.

This book is, unashamedly, our take on the subject. While the abstract structures we explore are important to any category theorist, the specific topics have simply been chosen to our personal taste. Our examples are ones that we find simple but powerful, concrete but representative, entertaining but in a way that feels important and expansive at the same time. We hope our readers will enjoy themselves and learn a lot in the process.

How to read this book

The basic idea of category theory—which threads through every chapter—is that if one pays careful attention to structures and coherence, the resulting systems will be extremely reliable and interoperable. For example, a category involves several structures: a collection of objects, a collection of morphisms relating objects, and a formula for combining any chain of morphisms into a morphism. But these structures need to cohere or work together in a simple commonsense way: a chain of chains is a chain, so combining a chain of chains should be the same as combining the chain. That’s it!

We will see structures and coherence come up in pretty much every definition we give: “here are some things and here are how they fit together.” We ask the reader to be on the lookout for structures and coherence as they read the book, and to realize that as we layer abstraction on abstraction, it is the coherence that makes everything function like a well-oiled machine.

Each chapter in this book is motivated by a real-world topic, such as electrical circuits, control theory, cascade failures, information integration, and hybrid systems. These motivations lead us into and through various sorts of category-theoretic concepts.

We generally have one motivating idea and one category-theoretic purpose per chapter, and this forms the title of the chapter, e.g. Chapter 4 is “Collaborative design: profunctors, categorification, and monoidal categories.” In many math books, the difficulty is roughly a monotonically-increasing function of the page number. In this book, this occurs in each chapter, but not so much in the book as a whole. The chapters start out fairly easy and progress in difficulty.

The upshot is that if you find the end of a chapter very difficult, hope is certainly not lost: you can start on the next one and make good progress. This format lends itself to giving you a first taste now, but also leaving open the opportunity for you to come back at a later date and get more deeply into it. But by all means, if you have the gumption to work through each chapter to its end, we very much encourage that!

We include many exercises throughout the text. Usually these exercises are fairly straightforward; the only thing they demand is that the reader’s mind changes state from passive to active, rereads the previous paragraphs with intent, and puts the pieces together. A reader becomes a student when they work the exercises; until then they are more of a tourist, riding on a bus and listening off and on to the tour guide. Hey, there’s nothing wrong with that, but we do encourage you to get off the bus and make contact with the natives as often as you can.


Coarse-Graining Open Markov Processes

4 March, 2018

Kenny Courser and I have been working hard on this paper for months:

• John Baez and Kenny Courser, Coarse-graining open Markov processes.

It may be almost done. So, it would be great if people here could take a look and comment on it! It’s a cool mix of probability theory and double categories. I’ve posted a similar but non-isomorphic article on the n-Category Café, where people know a lot about double categories. But maybe some of you here know more about Markov processes!

‘Coarse-graining’ is a standard method of extracting a simple Markov process from a more complicated one by identifying states. We extend coarse-graining to open Markov processes. An ‘open’ Markov process is one where probability can flow in or out of certain states called ‘inputs’ and ‘outputs’. One can build up an ordinary Markov process from smaller open pieces in two basic ways:

• composition, where we identify the outputs of one open Markov process with the inputs of another,

and

• tensoring, where we set two open Markov processes side by side.

A while back, Brendan Fong, Blake Pollard and I showed that these constructions make open Markov processes into the morphisms of a symmetric monoidal category:

A compositional framework for Markov processes, Azimuth, January 12, 2016.

Here Kenny and I go further by constructing a symmetric monoidal double category where the 2-morphisms include ways of coarse-graining open Markov processes. We also extend the previously defined ‘black-boxing’ functor from the category of open Markov processes to this double category.

But before you dive into the paper, let me explain all this stuff a bit more….

Very roughly speaking, a ‘Markov process’ is a stochastic model describing a sequence of transitions between states in which the probability of a transition depends only on the current state. But the only Markov processes talk about are continuous-time Markov processes with a finite set of states. These can be drawn as labeled graphs:

where the number labeling each edge describes the probability per time of making a transition from one state to another.

An ‘open’ Markov process is a generalization in which probability can also flow in or out of certain states designated as ‘inputs’ and outputs’:

Open Markov processes can be seen as morphisms in a category, since we can compose two open Markov processes by identifying the outputs of the first with the inputs of the second. Composition lets us build a Markov process from smaller open parts—or conversely, analyze the behavior of a Markov process in terms of its parts.

In this paper, Kenny extend the study of open Markov processes to include coarse-graining. ‘Coarse-graining’ is a widely studied method of simplifying a Markov process by mapping its set of states X onto some smaller set X' in a manner that respects the dynamics. Here we introduce coarse-graining for open Markov processes. And we show how to extend this notion to the case of maps p: X \to X' that are not surjective, obtaining a general concept of morphism between open Markov processes.

Since open Markov processes are already morphisms in a category, it is natural to treat morphisms between them as morphisms between morphisms, or ‘2-morphisms’. We can do this using double categories!

Double categories were first introduced by Ehresmann around 1963. Since then, they’ve used in topology and other branches of pure math—but more recently they’ve been used to study open dynamical systems and open discrete-time Markov chains. So, it should not be surprising that they are also useful for open Markov processes..

A 2-morphism in a double category looks like this:

While a mere category has only objects and morphisms, here we have a few more types of things. We call A, B, C and D ‘objects’, f and g ‘vertical 1-morphisms’, M and N ‘horizontal 1-cells’, and \alpha a ‘2-morphism’. We can compose vertical 1-morphisms to get new vertical 1-morphisms and compose horizontal 1-cells to get new horizontal 1-cells. We can compose the 2-morphisms in two ways: horizontally by setting squares side by side, and vertically by setting one on top of the other. The ‘interchange law’ relates vertical and horizontal composition of 2-morphisms.

In a ‘strict’ double category all these forms of composition are associative. In a ‘pseudo’ double category, horizontal 1-cells compose in a weakly associative manner: that is, the associative law holds only up to an invertible 2-morphism, the ‘associator’, which obeys a coherence law. All this is just a sketch; for details on strict and pseudo double categories try the paper by Grandis and Paré.

Kenny and I construct a double category \mathbb{M}\mathbf{ark} with:

  1. finite sets as objects,
  2. maps between finite sets as vertical 1-morphisms,
  3. open Markov processes as horizontal 1-cells,
  4. morphisms between open Markov processes as 2-morphisms.

I won’t give the definition of item 4 here; you gotta read our paper for that! Composition of open Markov processes is only weakly associative, so \mathbb{M}\mathbf{ark} is a pseudo double category.

This is how our paper goes. In Section 2 we define open Markov processes and steady state solutions of the open master equation. In Section 3 we introduce coarse-graining first for Markov processes and then open Markov processes. In Section 4 we construct the double category \mathbb{M}\mathbf{ark} described above. We prove this is a symmetric monoidal double category in the sense defined by Mike Shulman. This captures the fact that we can not only compose open Markov processes but also ‘tensor’ them by setting them side by side.

For example, if we compose this open Markov process:

with the one I showed you before:

we get this open Markov process:

But if we tensor them, we get this:

As compared with an ordinary Markov process, the key new feature of an open Markov process is that probability can flow in or out. To describe this we need a generalization of the usual master equation for Markov processes, called the ‘open master equation’.

This is something that Brendan, Blake and I came up with earlier. In this equation, the probabilities at input and output states are arbitrary specified functions of time, while the probabilities at other states obey the usual master equation. As a result, the probabilities are not necessarily normalized. We interpret this by saying probability can flow either in or out at both the input and the output states.

If we fix constant probabilities at the inputs and outputs, there typically exist solutions of the open master equation with these boundary conditions that are constant as a function of time. These are called ‘steady states’. Often these are nonequilibrium steady states, meaning that there is a nonzero net flow of probabilities at the inputs and outputs. For example, probability can flow through an open Markov process at a constant rate in a nonequilibrium steady state. It’s like a bathtub where water is flowing in from the faucet, and flowing out of the drain, but the level of the water isn’t changing.

Brendan, Blake and I studied the relation between probabilities and flows at the inputs and outputs that holds in steady state. We called the process of extracting this relation from an open Markov process ‘black-boxing’, since it gives a way to forget the internal workings of an open system and remember only its externally observable behavior. We showed that black-boxing is compatible with composition and tensoring. In other words, we showed that black-boxing is a symmetric monoidal functor.

In Section 5 of our new paper, Kenny and I show that black-boxing is compatible with morphisms between open Markov processes. To make this idea precise, we prove that black-boxing gives a map from the double category \mathbb{M}\mathbf{ark} to another double category, called \mathbb{L}\mathbf{inRel}, which has:

  1. finite-dimensional real vector spaces U,V,W,\dots as objects,
  2. linear maps f : V \to W as vertical 1-morphisms from V to W,
  3. linear relations R \subseteq V \oplus W as horizontal 1-cells from V to W,
  4. squares

    obeying (f \oplus g)R \subseteq S as 2-morphisms.

Here a ‘linear relation’ from a vector space V to a vector space W is a linear subspace R \subseteq V \oplus W. Linear relations can be composed in the usual way we compose relations. The double category \mathbb{L}\mathbf{inRel} becomes symmetric monoidal using direct sum as the tensor product, but unlike \mathbb{M}\mathbf{ark} it is a strict double category: that is, composition of linear relations is associative.

Our main result, Theorem 5.5, says that black-boxing gives a symmetric monoidal double functor

\blacksquare : \mathbb{M}\mathbf{ark} \to \mathbb{L}\mathbf{inRel}

As you’ll see if you check out our paper, there’s a lot of nontrivial content hidden in this short statement! The proof requires a lot of linear algebra and also a reasonable amount of category theory. For example, we needed this fact: if you’ve got a commutative cube in the category of finite sets:

and the top and bottom faces are pushouts, and the two left-most faces are pullbacks, and the two left-most arrows on the bottom face are monic, then the two right-most faces are pullbacks. I think it’s cool that this is relevant to Markov processes!

Finally, in Section 6 we state a conjecture. First we use a technique invented by Mike Shulman to construct symmetric monoidal bicategories \mathbf{Mark} and \mathbf{LinRel} from the symmetric monoidal double categories \mathbb{M}\mathbf{ark} and \mathbb{L}\mathbf{inRel}. We conjecture that our black-boxing double functor determines a functor between these symmetric monoidal bicategories. This has got to be true. However, double categories seem to be a simpler framework for coarse-graining open Markov processes.

Finally, let me talk a bit about some related work. As I already mentioned, Brendan, Blake and I constructed a symmetric monoidal category where the morphisms are open Markov processes. However, we formalized such Markov processes in a slightly different way than Kenny and I do now. We defined a Markov process to be one of the pictures I’ve been showing you: a directed multigraph where each edge is assigned a positive number called its ‘rate constant’. In other words, we defined it to be a diagram

where X is a finite set of vertices or ‘states’, E is a finite set of edges or ‘transitions’ between states, the functions s,t : E \to X give the source and target of each edge, and r : E \to (0,\infty) gives the rate constant for each transition. We explained how from this data one can extract a matrix of real numbers (H_{i j})_{i,j \in X} called the ‘Hamiltonian’ of the Markov process, with two properties that are familiar in this game:

H_{i j} \geq 0 if i \neq j,

\sum_{i \in X} H_{i j} = 0 for all j \in X.

A matrix with these properties is called ‘infinitesimal stochastic’, since these conditions are equivalent to \exp(t H) being stochastic for all t \ge 0.

In our new paper, Kenny and I skip the directed multigraphs and work directly with the Hamiltonians! In other words, we define a Markov process to be a finite set X together with an infinitesimal stochastic matrix (H_{ij})_{i,j \in X}. This allows us to work more directly with the Hamiltonian and the all-important ‘master equation’

\displaystyle{    \frac{d p(t)}{d t} = H p(t)  }

which describes the evolution of a time-dependent probability distribution

p(t) : X \to \mathbb{R}

Clerc, Humphrey and Panangaden have constructed a bicategory with finite sets as objects, ‘open discrete labeled Markov processes’ as morphisms, and ‘simulations’ as 2-morphisms. The use the word ‘open’ in a pretty similar way to me. But their open discrete labeled Markov processes are also equipped with a set of ‘actions’ which represent interactions between the Markov process and the environment, such as an outside entity acting on a stochastic system. A ‘simulation’ is then a function between the state spaces that map the inputs, outputs and set of actions of one open discrete labeled Markov process to the inputs, outputs and set of actions of another.

Another compositional framework for Markov processes was discussed by de Francesco Albasini, Sabadini and Walters. They constructed an algebra of ‘Markov automata’. A Markov automaton is a family of matrices with non-negative real coefficients that is indexed by elements of a binary product of sets, where one set represents a set of ‘signals on the left interface’ of the Markov automata and the other set analogously for the right interface.

So, double categories are gradually invading the theory of Markov processes… as part of the bigger trend toward applied category theory. They’re natural things; scientists should use them.


The Kepler Problem (Part 1)

7 January, 2018

Johannes Kepler loved geometry, so of course he was fascinated by Platonic solids. His early work Mysterium Cosmographicum, written in 1596, includes pictures showing how the 5 Platonic solids correspond to the 5 elements:



Five elements? Yes, besides earth, air, water and fire, he includes a fifth element that doesn’t feel the Earth’s gravitational pull: the ‘quintessence’, or ‘aether’, from which heavenly bodies are made.

In the same book he also tried to use the Platonic solids to explain the orbits of the planets:



The six planets are Mercury, Venus, Earth, Mars, Jupiter and Saturn. And the tetrahedron and cube, in case you’re wondering, sit outside the largest sphere shown above. You can see them another picture from Kepler’s book:

These ideas may seem goofy now, but studying the exact radii of the planets’ orbits led him to discover that these orbits aren’t circular: they’re ellipses! By 1619 this led him to what we call Kepler’s laws of planetary motion. And those, in turn, helped Newton verify Hooke’s hunch that the force of gravity goes as the inverse square of the distance between bodies!

In honor of this, the problem of a particle orbiting in an inverse square force law is called the Kepler problem.

So, I’m happy that Greg Egan, Layra Idarani and I have come across a solid mathematical connection between the Platonic solids and the Kepler problem.

But this involves a detour into the 4th dimension!

It’s a remarkable fact that the Kepler problem has not just the expected conserved quantities—energy and the 3 components of angular momentum—but also 3 more: the components of the Runge–Lenz vector. To understand those extra conserved quantities, go here:

• Greg Egan, The ellipse and the atom.

Noether proved that conserved quantities come from symmetries. Energy comes from time translation symmetry. Angular momentum comes from rotation symmetry. Since the group of rotations in 3 dimensions, called SO(3), is itself 3-dimensional, it gives 3 conserved quantities, which are the 3 components of angular momentum.

None of this is really surprising. But if we take the angular momentum together with the Runge–Lenz vector, we get 6 conserved quantities—and these turn out to come from the group of rotations in 4 dimensions, SO(4), which is itself 6-dimensional. The obvious symmetries in this group just rotate a planet’s elliptical orbit, while the unobvious ones can also squash or stretch it, changing the eccentricity of the orbit.

(To be precise, all this is true only for the ‘bound states’ of the Kepler problem: the circular and elliptical orbits, not the parabolic or hyperbolic ones, which work in a somewhat different way. I’ll only be talking about bound states in this post!)

Why should the Kepler problem have symmetries coming from rotations in 4 dimensions? This is a fascinating puzzle—we know a lot about it, but I doubt the last word has been spoken. For an overview, go here:

• John Baez, Mysteries of the gravitational 2-body problem.

This SO(4) symmetry applies not only to the classical mechanics of the inverse square force law, but also the quantum mechanics! Nobody cares much about the quantum mechanics of two particles attracting gravitationally via an inverse square force law—but people care a lot about the quantum mechanics of hydrogen atoms, where the electron and proton attract each other via their electric field, which also obeys an inverse square force law.

So, let’s talk about hydrogen. And to keep things simple, let’s pretend the proton stays fixed while the electron orbits it. This is a pretty good approximation, and experts will know how to do things exactly right. It requires only a slight correction.

It turns out that wavefunctions for bound states of hydrogen can be reinterpreted as functions on the 3-sphere, S3 The sneaky SO(4) symmetry then becomes obvious: it just rotates this sphere! And the Hamiltonian of the hydrogen atom is closely connected to the Laplacian on the 3-sphere. The Laplacian has eigenspaces of dimensions n2 where n = 1,2,3,…, and these correspond to the eigenspaces of the hydrogen atom Hamiltonian. The number n is called the principal quantum number, and the hydrogen atom’s energy is proportional to -1/n2.

If you don’t know all this jargon, don’t worry! All you need to know is this: if we find an eigenfunction of the Laplacian on the 3-sphere, it will give a state where the hydrogen atom has a definite energy. And if this eigenfunction is invariant under some subgroup of SO(4), so will this state of the hydrogen atom!

The biggest finite subgroup of SO(4) is the rotational symmetry group of the 600-cell, a wonderful 4-dimensional shape with 120 vertices and 600 dodecahedral faces. The rotational symmetry group of this shape has a whopping 7,200 elements! And here is a marvelous moving image, made by Greg Egan, of an eigenfunction of the Laplacian on S3 that’s invariant under this 7,200-element group:


We’re seeing the wavefunction on a moving slice of the 3-sphere, which is a 2-sphere. This wavefunction is actually real-valued. Blue regions are where this function is positive, yellow regions where it’s negative—or maybe the other way around—and black is where it’s almost zero. When the image fades to black, our moving slice is passing through a 2-sphere where the wavefunction is almost zero.

For a full explanation, go here:

• Greg Egan, In the chambers with seven thousand symmetries, 2 January 2018.

Layra Idarani has come up with a complete classification of all eigenfunctions of the Laplacian on S3 that are invariant under this group… or more generally, eigenfunctions of the Laplacian on a sphere of any dimension that are invariant under the even part of any Coxeter group. For the details, go here:

• Layra Idarani, SG-invariant polynomials, 4 January 2018.

All that is a continuation of a story whose beginning is summarized here:

• John Baez, Quantum mechanics and the dodecahedron.

So, there’s a lot of serious math under the hood. But right now I just want to marvel at the fact that we’ve found a wavefunction for the hydrogen atom that not only has a well-defined energy, but is also invariant under this 7,200-element group. This group includes the usual 60 rotational symmetries of a dodecahedron, but also other much less obvious symmetries.

I don’t have a good picture of what these less obvious symmetries do to the wavefunction of a hydrogen atom. I understand them a bit better classically—where, as I said, they squash or stretch an elliptical orbit, changing its eccentricity while not changing its energy.

We can have fun with this using the old quantum theory—the approach to quantum mechanics that Bohr developed with his colleague Sommerfeld from 1920 to 1925, before Schrödinger introduced wavefunctions.

In the old Bohr–Sommerfeld approach to the hydrogen atom, the quantum states with specified energy, total angular momentum and angular momentum about a fixed axis were drawn as elliptical orbits. In this approach, the symmetries that squash or stretch elliptical orbits are a bit easier to visualize:



This picture by Pieter Kuiper shows some orbits at the 5th energy level, n = 5: namely, those with different eigenvalues of the total angular momentum, ℓ.

While the old quantum theory was superseded by the approach using wavefunctions, it’s possible to make it mathematically rigorous for the hydrogen atom. So, we can draw elliptical orbits that rigorously correspond to a basis of wavefunctions for the hydrogen atom. So, I believe we can draw the orbits corresponding to the basis elements whose linear combination gives the wavefunction shown as a function on the 3-sphere in Greg’s picture above!

We should get a bunch of ellipses forming a complicated picture with dodecahedral symmetry. This would make Kepler happy.

As a first step in this direction, Greg drew the collection of orbits that results when we take a circle and apply all the symmetries of the 600-cell:

For more details, read this:

• Greg Egan, Kepler orbits with the symmetries of the 600-cell.

Postscript

To do this really right, one should learn a bit about ‘old quantum theory’. I believe people have been getting it a bit wrong for quite a while—starting with Bohr and Sommerfeld!

If you look at the ℓ = 0 orbit in the picture above, it’s a long skinny ellipse. But I believe it really should be a line segment straight through the proton: that’s what’s an orbit with no angular momentum looks like.

There’s a paper about this:

• Manfred Bucher, Rise and fall of the old quantum theory.

Matt McIrvin had some comments on this:

This paper from 2008 is a kind of thing I really like: an exploration of an old, incomplete theory that takes it further than anyone actually did at the time.

It has to do with the Bohr-Sommerfeld “old quantum theory”, in which electrons followed definite orbits in the atom, but these were quantized–not all orbits were permitted. Bohr managed to derive the hydrogen spectrum by assuming circular orbits, then Sommerfeld did much more by extending the theory to elliptical orbits with various shapes and orientations. But there were some problems that proved maddeningly intractable with this analysis, and it eventually led to the abandonment of the “orbit paradigm” in favor of Heisenberg’s matrix mechanics and Schrödinger’s wave mechanics, what we know as modern quantum theory.

The paper argues that the old quantum theory was abandoned prematurely. Many of the problems Bohr and Sommerfeld had came not from the orbit paradigm per se, but from a much simpler bug in the theory: namely, their rejection of orbits in which the electron moves entirely radially and goes right through the nucleus! Sommerfeld called these orbits “unphysical”, but they actually correspond to the s orbital states in the full quantum theory, with zero angular momentum. And, of course, in the full theory the electron in these states does have some probability of being inside the nucleus.

So Sommerfeld’s orbital angular momenta were always off by one unit. The hydrogen spectrum came out right anyway because of the happy accident of the energy degeneracy of certain orbits in the Coulomb potential.

I guess the states they really should have been rejecting as “unphysical” were Bohr’s circular orbits: no radial motion would correspond to a certain zero radial momentum in the full theory, and we can’t have that for a confined electron because of the uncertainty principle.


Quantum Mechanics and the Dodecahedron

31 December, 2017

This is an expanded version of my G+ post, which was a watered-down version of Greg Egan’s G+ post and the comments on that. I’ll start out slow, and pick up speed as I go.

Quantum mechanics meets the dodecahedron

In quantum mechanics, the position of a particle is not a definite thing: it’s described by a ‘wavefunction’. This says how probable it is to find the particle at any location… but it also contains other information, like how probable it is to find the particle moving at any velocity.

Take a hydrogen atom, and look at the wavefunction of the electron.

Question 1. Can we make the electron’s wavefunction have all the rotational symmetries of a dodecahedron—that wonderful Platonic solid with 12 pentagonal faces?

Yes! In fact it’s too easy: you can make the wavefunction look like whatever you want.

So let’s make the question harder. Like everything else in quantum mechanics, angular momentum can be uncertain. In fact you can never make all 3 components of angular momentum take definite values simultaneously! However, there are lots of wavefunctions where the magnitude of an electron’s angular momentum is completely definite.

This leads naturally to the next question, which was first posed by Gerard Westendorp:

Question 2. Can an electron’s wavefunction have a definite magnitude for its angular momentum while having all the rotational symmetries of a dodecahedron?

Yes! And there are infinitely many ways for this to happen! This is true even if we neglect the radial dependence of the wavefunction—that is, how it depends on the distance from the proton. Henceforth I’ll always do that, which lets us treat the wavefunction as a function on a sphere. And by the way, I’m also ignoring the electron’s spin! So, whenever I say ‘angular momentum’ I mean orbital angular momentum: the part that depends only on the electron’s position and velocity.

Question 2 has a trivial solution that’s too silly to bother with. It’s the spherically symmetric wavefunction! That’s invariant under all rotations. The real challenge is to figure out the simplest nontrivial solution. Egan figured it out, and here’s what it looks like:


The rotation here is just an artistic touch. Really the solution should be just sitting there, or perhaps changing colors while staying the same shape.

In what sense is this the simplest nontrivial solution? Well, the magnitude of the angular momentum is equal to

\hbar^2 \sqrt{\ell(\ell+1)}

where the number \ell is quantized: it can only take values 0, 1, 2, 3,… and so on.

The trivial solution to Question 2 has \ell = 0. The first nontrivial solution has \ell = 6. Why 6? That’s where things get interesting. We can get it using the 6 lines connecting opposite faces of the dodecahedron!

I’ll explain later how this works. For now, let’s move straight on to a harder question:

Question 3. What’s the smallest choice of \ell where we can find two linearly independent wavefunctions that both have the same \ell and both have all the rotational symmetries of a dodecahedron?

It turns out to be \ell = 30. And Egan created an image of a wavefunction oscillating between these two possibilities!

But we can go a lot further:

Question 4. For each \ell, how many linearly independent functions on the sphere have that value of \ell and all the rotational symmetries of a dodecahedron?

For \ell ranging from 0 to 29 there are either none or one. There are none for these numbers:

1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 17, 19, 23, 29

and one for these numbers:

0, 6, 10, 12, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28

The pattern continues as follows. For \ell ranging from 30 to 59 there are either one or two. There is one for these numbers:

31, 32, 33, 34, 35, 37, 38, 39, 41, 43, 44, 47, 49, 53, 59

and two for these numbers:

30, 36, 40, 42, 45, 46, 48, 50, 51, 52, 54, 55, 56, 57, 58

The numbers in these two lists are just 30 more than the numbers in the first two lists! And it continues on like this forever: there’s always one more linearly independent solution for \ell + 30 than there is for \ell.

Question 5. What’s special about these numbers from 0 to 29?

0, 6, 10, 12, 15, 18, 20, 21, 22, 24, 25, 26, 27, 28

You don’t need to know tons of math to figure this out—but I guess it’s a sort of weird pattern-recognition puzzle unless you know which patterns are likely to be important here. So I’ll give away the answer.

Here’s the answer: these are the numbers below 30 that can be written as sums of the numbers 6, 10 and 15.

But the real question is why? Also: what’s so special about the number 30?

The short, cryptic answer is this. The dodecahedron has 6 axes connecting the centers of opposite faces, 10 axes connecting opposite vertices, and 15 axes connecting the centers of opposite edges. The least common multiple of these numbers is 30.

But this requires more explanation!

For this, we need more math. You may want to get off here. But first, let me show you the solutions for \ell = 6, \ell = 10, and \ell = 15, as drawn by Greg Egan. I’ve already showed you \ell = 6, which we could call the quantum dodecahedron:


Here is \ell = 10, which looks like a quantum icosahedron:


And here is \ell = 15:

Maybe this deserves to be called a quantum Coxeter complex, since the Coxeter complex for the group of rotations and reflections of the dodecahedron looks like this:



Functions with icosahedral symmetry

The dodecahedron and icosahedron have the same symmetries, but for some reason people talk about the icosahedron when discussing symmetry groups, so let me do that.

So far we’ve been looking at the rotational symmetries of the icosahedron. These form a group called \mathrm{A}_5, or \mathrm{I} for short, with 60 elements. We’ve been looking for certain functions on the sphere that are invariant under the action of this group. To get them all, we’ll first get ahold of all polynomials on \mathbb{R}^3 that are invariant under the action of this group Then we’ll restrict these to the sphere.

To save time, we’ll use the work of Claude Chevalley. He looked at rotation and reflection symmetries of the icosahedron. These form the group \mathrm{I} \times \mathbb{Z}/2, also known as \mathrm{H}_3, but let’s call it \hat{\mathrm{I}} for short. It has 120 elements, but never confuse it with two other groups with 120 elements: the symmetric group on 5 letters, and the binary icosahedral group.

Chevalley found all polynomials on \mathbb{R}^3 that are invariant under the action of this bigger group \hat{\mathrm{I}}. These invariant polynomials form an algebra, and Chevalley showed that this algebra is freely generated by 3 homogeneous polynomials:

P(x,y,z) = x^2 + y^2 + z^2, of degree 2.

Q(x,y,z), of degree 6. To get this we take the dot product of (x,y,z) with each of the 6 vectors joining antipodal vertices of the icosahedron, and multiply them together.

R(x,y,z), of degree 10. To get this we take the dot product of (x,y,z) with each of the 10 vectors joining antipodal face centers of the icosahedron, and multiply them together.

So, linear combinations of products of these give all polynomials on \mathbb{R}^3 invariant under all rotation and reflection symmetries of the icosahedron.

But we want the polynomials that are invariant under just rotational symmetries of the icosahedron! To get all these, we need an extra generator:

S(x,y,z), of degree 15. To get this we take the dot product of (x,y,z) with each of the 15 vectors joining antipodal edge centers of the icosahedron, and multiply them together.

You can check that this is invariant under rotational symmetries of the icosahedron. But unlike our other polynomials, this one is not invariant under reflection symmetries! Because 15 is an odd number, S switches sign under ‘total inversion’—that is, replacing (x,y,z) with -(x,y,z). This is a product of three reflection symmetries of the icosahedron.

Thanks to Egan’s extensive computations, I’m completely convinced that P,Q,R and S generate the algebra of all \mathrm{I}-invariant polynomials on \mathbb{R}^3. I’ll take this as a fact, even though I don’t have a clean, human-readable proof. But someone must have proved it already—do you know where?

Since we now have 4 polynomials on \mathbb{R}^3, they must obey a relation. Egan figured it out:

S^2 = 500 P^9 Q^2 - 2275 P^6 Q^3 + 3440 P^3 Q^4 - 1728 Q^5 + 200 P^7 Q R
- 795 P^4 Q^2 R + 720 P Q^3 R + 4 P^5 R^2 -65 P^2 Q R^2 - R^3

The exact coefficients depend on some normalization factors used in defining Q,R and S. Luckily the details don’t matter much. All we’ll really need is that this relation expresses S^2 in terms of the other generators. And this fact is easy to see without any difficult calculations!

How? Well, we’ve seen S is unchanged by rotations, while it changes sign under total inversion. So, the most any rotation or reflection symmetry of the icosahedron can do to S is change its sign. This means that S^2 is invariant under all these symmetries. So, by Chevalley’s result, it must be a polynomial in P, Q, and R.

So, we now have a nice description of the \mathrm{I}-invariant polynomials on \mathbb{R}^3, in terms of generators and relations. Each of these gives an \mathrm{I}-invariant function on the sphere. And Leo Stein, a postdoc at Caltech who has a great blog on math and physics, has kindly created some images of these.

The polynomial P is spherically symmetric so it’s too boring to draw. The polynomial Q, of degree 6, looks like this when restricted to the sphere:


Since it was made by multiplying linear functions, one for each axis connecting opposite vertices of an icosahedron, it shouldn’t be surprising that we see blue blobs centered at these vertices.

The polynomial R, of degree 10, looks like this:


Here the blue blobs are centered on the icosahedron’s 20 faces.

Finally, here’s S, of degree 15:


This time the blue blobs are centered on the icosahedron’s 30 edges.

Now let’s think a bit about functions on the sphere that arise from polynomials on \mathbb{R}^3. Let’s call them algebraic functions on the sphere. They form an algebra, and it’s just the algebra of polynomials on \mathbb{R}^3 modulo the relation P = 1, since the sphere is the set \{P = 1\}.

It makes no sense to talk about the ‘degree’ of an algebraic function on the sphere, since the relation P = 1 equates polynomials of different degree. What makes sense is the number \ell that I was talking about earlier!

The group \mathrm{SO}(3) acts by rotation on the space of algebraic functions on the sphere, and we can break this space up into irreducible representations of \mathrm{SO}(3). It’s a direct sum of irreps, one of each ‘spin’ \ell = 0, 1, 2, \dots.

So, we can’t talk about the degree of a function on the sphere, but we can talk about its \ell value. On the other hand, it’s very convenient to work with homogeneous polynomials on \mathbb{R}^3, which have a definite degree—and these restrict to functions on the sphere. How can we relate the degree and the quantity \ell?

Here’s one way. The polynomials on \mathbb{R}^3 form a graded algebra. That means it’s a direct sum of vector spaces consisting of homogeneous polynomials of fixed degree, and if we multiply two homogeneous polynomials their degrees add. But the algebra of polynomials restricted to the sphere is merely filtered algebra.

What does this mean? Let F be the algebra of all algebraic functions on the sphere, and let F_\ell \subset F consist of those that are restrictions of polynomials of degree \le \ell. Then:

1) F_\ell \subseteq F_{\ell + 1}

and

2) \displaystyle{ F = \bigcup_{\ell = 0}^\infty F_\ell }

and

3) if we multiply a function in F_\ell by one in F_m, we get one in F_{\ell + m}.

That’s what a filtered algebra amounts to.

But starting from a filtered algebra, we can get a graded algebra! It’s called the associated graded algebra.

To do this, we form

G_\ell = F_\ell / F_{\ell - 1}

and let

\displaystyle{ G = \bigoplus_{\ell = 0}^\infty G_\ell }

Then G has a product where multiplying a guy in G_\ell and one in G_m gives one in G_{\ell + m}. So, it’s indeed a graded algebra! For the details, see Wikipedia, which manages to make it look harder than it is. The basic idea is that we multiply in F and then ‘ignore terms of lower degree’. That’s what G_\ell = F_\ell / F_{\ell - 1} is all about.

Now I want to use two nice facts. First, G_\ell is the spin-\ell representation of \mathrm{SO}(3). Second, there’s a natural map from any filtered algebra to its associated graded algebra, which is an isomorphism of vector spaces (though not of algebras). So, we get an natural isomorphism of vector spaces

\displaystyle{  F \cong G = \bigoplus_{\ell = 0}^\infty G_\ell }

from the algebraic functions on the sphere to the direct sum of all the spin-\ell representations!

Now to the point: because this isomorphism is natural, it commutes with symmetries, so we can also use it to study algebraic functions on the sphere that are invariant under a group of linear transformations of \mathbb{R}^3.

Before tackling the group we’re really interested in, let’s try the group of rotation and reflection symmetries of the icosahedron, \hat{\mathrm{I}}. As I mentioned, Chevalley worked out the algebra of polynomials on \mathbb{R}^3 that are invariant under this bigger group. It’s a graded commutative algebra, and it’s free on three generators: P of degree 2, Q of degree 6, and R of degree 10.

Starting from here, to get the algebra of \hat{\mathrm{I}}-invariant algebraic functions on the sphere, we mod out by the relation P = 1. This gives a filtered algebra which I’ll call F^{\hat{\mathrm{I}}}. (It’s common to use a superscript with the name of a group to indicate that we’re talking about the stuff that’s invariant under some action of that group.) From this we can form the associated graded algebra

\displaystyle{ G^{\hat{\mathrm{I}}} = \bigoplus_{\ell = 0}^\infty G_\ell^{\hat{\mathrm{I}}} }

where

G_\ell^{\hat{\mathrm{I}}} = F_\ell^{\hat{\mathrm{I}}} / F_{\ell - 1}^{\hat{\mathrm{I}}}

If you’ve understood everything I’ve been trying to explain, you’ll see that G_\ell^{\hat{\mathrm{I}}} is the space of all functions on the sphere that transform in the spin-\ell representation and are invariant under the rotation and reflection symmetries of the icosahedron.

But now for the fun part: what is this space like? By the work of Chevalley, the algebra F^{\hat{\mathrm{I}}} is spanned by products

P^p Q^q R^r

but since we have the relation P = 1, and no other relations, it has a basis given by products

Q^q R^r

So, the space F_\ell^{\hat{\mathrm{I}}} has a basis of products like this whose degree is \le \ell, meaning

6 q + 10 r \le \ell

Thus, the space we’re really interested in:

G_\ell^{\hat{\mathrm{I}}} = F_\ell^{\hat{\mathrm{I}}} / F_{\ell - 1}^{\hat{\mathrm{I}}}

has a basis consisting of equivalence classes

[Q^q R^r]

where

6 q + 10 r = \ell

So, we get:

Theorem 1. The dimension of the space of functions on the sphere that lie in the spin-\ell representation of \mathrm{SO}(3) and are invariant under the rotation and reflection symmetries of the icosahedron equals the number of ways of writing \ell as an unordered sum of 6’s and 10’s.

Let’s see how this goes:

\ell = 0: dimension 1, with basis [1]

\ell = 1: dimension 0

\ell = 2: dimension 0

\ell = 3: dimension 0

\ell = 4: dimension 0

\ell = 5: dimension 0

\ell = 6: dimension 1, with basis [Q]

\ell = 7: dimension 0

\ell = 8: dimension 0

\ell = 9: dimension 0

\ell = 10: dimension 1, with basis [R]

\ell = 11: dimension 0

\ell = 12: dimension 1, with basis [Q^2]

\ell = 13: dimension 0

\ell = 14: dimension 0

\ell = 15: dimension 0

\ell = 16: dimension 1, with basis [Q R]

\ell = 17: dimension 0

\ell = 18: dimension 1, with basis [Q^3]

\ell = 19: dimension 0

\ell = 20: dimension 1, with basis [R^2]

\ell = 21: dimension 0

\ell = 22: dimension 1, with basis [Q^2 R]

\ell = 23: dimension 0

\ell = 24: dimension 1, with basis [Q^4]

\ell = 25: dimension 0

\ell = 26: dimension 1, with basis [Q R^2]

\ell = 27: dimension 0

\ell = 28: dimension 1, with basis [Q^3 R]

\ell = 29: dimension 0

\ell = 30: dimension 2, with basis [Q^5], [R^3]

So, the story starts out boring, with long gaps. The odd numbers are completely uninvolved. But it heats up near the end, and reaches a thrilling climax at \ell = 30. At this point we get two linearly independent solutions, because 30 is the least common multiple of the degrees of Q and R.

It’s easy to see that from here on the story ‘repeats’ with period 30, with the dimension growing by 1 each time:

\mathrm{dim}(G_{\ell+30}^{\hat{\mathrm{I}}}) = \mathrm{dim}(G_{\ell}^{\hat{\mathrm{I}}}) + 1

Now, finally, we are to tackle Question 4 from the first part of this post: for each \ell, how many linearly independent functions on the sphere have that value of \ell and all the rotational symmetries of a dodecahedron?

We just need to repeat our analysis with \mathrm{I}, the group of rotational symmetries of the dodecahedron, replacing the bigger group \hat{\mathrm{I}}.

We start with algebra of polynomials on \mathbb{R}^3 that are invariant under \mathrm{I}. As we’ve seen, this is a graded commutative algebra with four generators: P,Q,R as before, but also S of degree 15. To make up for this extra generator there’s an extra relation, which expresses S^2 in terms of the other generators.

Starting from here, to get the algebra of \mathrm{I}-invariant algebraic functions on the sphere, we mod out by the relation P = 1. This gives a filtered algebra I’ll call F^{\mathrm{I}}. Then we form the associated graded algebra

\displaystyle{ G^{\mathrm{I}} = \bigoplus_{\ell = 0}^\infty G_\ell^{\mathrm{I}} }

where

G_\ell^{\mathrm{I}} = F_\ell^{\mathrm{I}} / F_{\ell - 1}^{\mathrm{I}}

What we really want to know is the dimension of G_\ell^{\mathrm{I}}, since this is the space of functions on the sphere that transform in the spin-\ell representation and are invariant under the rotational symmetries of the icosahedron.

So, what’s this space like? The algebra F^{\mathrm{I}} is spanned by products

P^p Q^q R^r S^t

but since we have the relation P = 1, and a relation expressing S^2 in terms of other generators, it has a basis given by products

Q^q R^r S^s where s = 0, 1

So, the space F_\ell^{\mathrm{I}} has a basis of products like this whose degree is \le \ell, meaning

6 q + 10 r + 15 s \le \ell and s = 0, 1

Thus, the space we’re really interested in:

G_\ell^{\mathrm{I}} = F_\ell^{\mathrm{I}} / F_{\ell - 1}^{\mathrm{I}}

has a basis consisting of equivalence classes

[Q^q R^r S^s]

where

6 q + 10 r + 15 s = \ell and s = 0, 1

So, we get:

Theorem 2. The dimension of the space of functions on the sphere that lie in the spin-\ell representation of \mathrm{SO}(3) and are invariant under the rotational symmetries of the icosahedron equals the number of ways of writing \ell as an unordered sum of 6’s, 10’s and at most one 15.

Let’s work out these dimensions explicitly, and see how the extra generator S changes the story! Since it has degree 15, it contributes some solutions for odd values of \ell. But when we reach the magic number 30, this extra generator loses its power: S^2 has degree 30, but it’s a linear combination of other things.

\ell = 0: dimension 1, with basis [1]

\ell = 1: dimension 0

\ell = 2: dimension 0

\ell = 3: dimension 0

\ell = 4: dimension 0

\ell = 5: dimension 0

\ell = 6: dimension 1, with basis [Q]

\ell = 7: dimension 0

\ell = 8: dimension 0

\ell = 9: dimension 0

\ell = 10: dimension 1, with basis [R]

\ell = 11: dimension 0

\ell = 12: dimension 1, with basis [Q^2]

\ell = 13: dimension 0

\ell = 14: dimension 0

\ell = 15: dimension 1, with basis [S]

\ell = 16: dimension 1, with basis [Q R]

\ell = 17: dimension 0

\ell = 18: dimension 1, with basis [Q^3]

\ell = 19: dimension 0

\ell = 20: dimension 1, with basis [R^2]

\ell = 21: dimension 1, with basis [Q S]

\ell = 22: dimension 1, with basis [Q^2 R]

\ell = 23: dimension 0

\ell = 24: dimension 1, with basis [Q^4]

\ell = 25: dimension 1, with basis [R S]

\ell = 26: dimension 1, with basis [Q R^2]

\ell = 27: dimension 1, with basis [Q^2 S]

\ell = 28: dimension 1, with basis [Q^3 R]

\ell = 29: dimension 0

\ell = 30: dimension 2, with basis [Q^5], [R^3]

From here on the story ‘repeats’ with period 30, with the dimension growing by 1 each time:

\mathrm{dim}(G_{\ell+30}^{\mathrm{I}}) = \mathrm{dim}(G_{\ell}^{\mathrm{I}}) + 1

So, we’ve more or less proved everything that I claimed in the first part. So we’re done!

Postscript

But I can’t resist saying a bit more.

First, there’s a very different and somewhat easier way to compute the dimensions in Theorems 1 and 2. It uses the theory of characters, and Egan explained it in a comment on the blog post on which this is based.

Second, if you look in these comments, you’ll also see a lot of material about harmonic polynomials on \mathbb{R}^3—that is, those obeying the Laplace equation. These polynomials are very nice when you’re trying to decompose the space of functions on the sphere into irreps of \mathrm{SO}(3). The reason is that the harmonic homogeneous polynomials of degree \ell, when restricted to the sphere, give you exactly the spin-\ell representation!

If you take all homogeneous polynomials of degree \ell and restrict them to the sphere you get a lot of ‘redundant junk’. You get the spin-\ell rep, plus the spin-(\ell-2) rep, plus the spin-(\ell-4) rep, and so on. The reason is the polynomial

P = x^2 + y^2 + z^2

and its powers: if you have a polynomial living in the spin-\ell rep and you multiply it by P, you get another one living in the spin-\ell rep, but you’ve boosted the degree by 2.

Layra Idarani pointed out that this is part of a nice general theory. But I found all this stuff slightly distracting when I was trying to prove Theorems 1 and 2 assuming that we had explicit presentations of the algebras of \hat{\mathrm{I}}– and \mathrm{I}-invariant polynomials on \mathbb{R}^3. So, instead of introducing facts about harmonic polynomials, I decided to use the ‘associated graded algebra’ trick. This is a more algebraic way to ‘eliminate the redundant junk’ in the algebra of polynomials and chop the space of functions on the sphere into irreps of \mathrm{SO}(3).

Also, Egan and Idarani went ahead and considered what happens when we replace the icosahedron by another Platonic solid. It’s enough to consider the cube and tetrahedron. These cases are actually subtler than the icosahedron! For example, when we take the dot product of (x,y,z) with each of the 10 vectors joining antipodal face centers of the cube, and multiply them together, we get a polynomial that’s not invariant under rotations of the cube! Up to a constant it’s just x y z, and this changes sign under some rotations.

People call this sort of quantity, which gets multiplied by a number under transformations instead of staying the same, a semi-invariant. The reason we run into semi-invariants for the cube and tetrahedron is that their rotational symmetry groups, \mathrm{S}_4 and \mathrm{A}_4, have nontrivial abelianizations, namely \mathbb{Z}/2 and \mathbb{Z}/3. The abelianization of \mathrm{I} \cong \mathrm{A}_5 is trivial.

Egan summarized the story as follows:

Just to sum things up for the cube and the tetrahedron, since the good stuff has ended up scattered over many comments:

For the cube, we define:

A of degree 4 from the cube’s vertex-axes, a full invariant
B of degree 6 from the cube’s edge-centre-axes, a semi-invariant
C of degree 3 from the cube’s face-centre-axes, a semi-invariant

We have full invariants:

A of degree 4
C2 of degree 6
BC of degree 9

B2 can be expressed in terms of A, C and P, so we never use it, and we use BC at most once.

So the number of copies of the trivial rep of the rotational symmetry group of the cube in spin ℓ is the number of ways to write ℓ as an unordered sum of 4, 6 and at most one 9.

For the tetrahedron, we embed its vertices as four vertices of the cube. We then define:

V of degree 4 from the tet’s vertices, a full invariant
E of degree 3 from the tet’s edge-centre axes, a full invariant

And the B we defined for the embedding cube serves as a full invariant of the tet, of degree 6.

B2 can be expressed in terms of V, E and P, so we use B at most once.

So the number of copies of the trivial rep of the rotational symmetry group of the tetrahedron in spin ℓ is the number of ways to write ℓ as a sum of 3, 4 and at most one 6.

All of this stuff reminds me of a baby version of the theory of modular forms. For example, the algebra of modular forms is graded by ‘weight’, and it’s the free commutative algebra on a guy of weight 4 and a guy of weight 6. So, the dimension of the space of modular forms of weight k is the number of ways of writing k as an unordered sum of 4’s and 6’s. Since the least common multiple of 4 and 6 is 12, we get a pattern that ‘repeats’, in a certain sense, mod 12. Here I’m talking about the simplest sort of modular forms, based on the group \mathrm{SL}_2(\mathbb{Z}). But there are lots of variants, and I have the feeling that this post is secretly about some sort of variant based on finite subgroups of \mathrm{SL}(2,\mathbb{C}) instead of infinite discrete subgroups.

There’s a lot more to say about all this, but I have to stop or I’ll never stop. Please ask questions and if you want me to say more!


Excitonium

10 December, 2017

In certain crystals you can knock an electron out of its favorite place and leave a hole: a place with a missing electron. Sometimes these holes can move around like particles. And naturally these holes attract electrons, since they are places an electron would want to be.

Since an electron and a hole attract each other, they can orbit each other. An orbiting electron-hole pair is a bit like a hydrogen atom, where an electron orbits a proton. All of this is quantum-mechanical, of course, so you should be imagining smeared-out wavefunctions, not little dots moving around. But imagine dots if it’s easier.

An orbiting electron-hole pair is called an exciton, because while it acts like a particle in its own right, it’s really just a special kind of ‘excited’ electron—an electron with extra energy, not in its lowest energy state where it wants to be.

An exciton usually doesn’t last long: the orbiting electron and hole spiral towards each other, the electron finds the hole it’s been seeking, and it settles down.

But excitons can last long enough to do interesting things. In 1978 the Russian physicist Abrikosov wrote a short and very creative paper in which he raised the possibility that excitons could form a crystal in their own right! He called this new state of matter excitonium.

In fact his reasoning was very simple.

Just as electrons have a mass, so do holes. That sounds odd, since a hole is just a vacant spot where an electron would like to be. But such a hole can move around. It has more energy when it moves faster, and it takes force to accelerate it—so it acts just like it has a mass! The precise mass of a hole depends on the nature of the substance we’re dealing with.

Now imagine a substance with very heavy holes.

When a hole is much heavier than an electron, it will stand almost still when an electron orbits it. So, they form an exciton that’s very similar to a hydrogen atom, where we have an electron orbiting a much heavier proton.

Hydrogen comes in different forms: gas, liquid, solid… and at extreme pressures, like in the core of Jupiter, hydrogen becomes metallic. So, we should expect that excitons can come in all these different forms too!

We should be able to create an exciton gas… an exciton liquid… an exciton solid…. and under the right circumstances, a metallic crystal of excitons. Abrikosov called this metallic excitonium.

People have been trying to create this stuff for a long time. Some claim to have succeeded. But a new paper claims to have found something else: a Bose–Einstein condensate of excitons:

• Anshul Kogar, Melinda S. Rak, Sean Vig, Ali A. Husain, Felix Flicker, Young Il Joe, Luc Venema, Greg J. MacDougall, Tai C. Chiang, Eduardo Fradkin, Jasper van Wezel and Peter Abbamonte, Signatures of exciton condensation in a transition metal dichalcogenide, Science 358 (2017), 1314–1317.

A lone electron acts like a fermion, so I guess a hole does do, and if so that means an exciton acts approximately like a boson. When it’s cold, a gas of bosons will ‘condense’, with a significant fraction of them settling into the lowest energy states available. I guess excitons have been seen to do this!

There’s a fairly good simplified explanation at the University of Illinois website:

• Siv Schwink, Physicists excited by discovery of new form of matter, excitonium, 7 December 2017.

However, the picture on this page, which I used above, shows domain walls moving through crystallized excitonium. I think that’s different than a Bose-Einstein condensate!

I urge you to look at Abrikosov’s paper. It’s short and beautiful:

• Alexei Alexeyevich Abrikosov, A possible mechanism of high temperature superconductivity, Journal of the Less Common Metals
62 (1978), 451–455.

(Cool journal title. Is there a journal of the more common metals?)

In this paper, Abrikoskov points out that previous authors had the idea of metallic excitonium. Maybe his new idea was that this might be a superconductor—and that this might explain high-temperature superconductivity. The reason for his guess is that metallic hydrogen, too, is widely suspected to be a superconductor.

Later, Abrikosov won the Nobel prize for some other ideas about superconductors. I think I should read more of his papers. He seems like one of those physicists with great intuitions.

Puzzle 1. If a crystal of excitons conducts electricity, what is actually going on? That is, which electrons are moving around, and how?

This is a fun puzzle because an exciton crystal is a kind of abstract crystal created by the motion of electrons in another, ordinary, crystal. And that leads me to another puzzle, that I don’t know the answer to:

Puzzle 2. Is it possible to create a hole in excitonium? If so, it possible to create an exciton in excitonium? If so, is it possible to create meta-excitonium: an crystal of excitons in excitonium?


Wigner Crystals

7 December, 2017

I’d like to explain a conjecture about Wigner crystals, which we came up with in a discussion on Google+. It’s a purely mathematical conjecture that’s pretty simple to state, motivated by the picture above. But let me start at the beginning.

Electrons repel each other, so they don’t usually form crystals. But if you trap a bunch of electrons in a small space, and cool them down a lot, they will try to get as far away from each other as possible—and they can do this by forming a crystal!

This is sometimes called an electron crystal. It’s also called a Wigner crystal, because the great physicist Eugene Wigner predicted in 1934 that this would happen.

Only since the late 1980s have we been able to make electron crystals in the lab. Such a crystal can only form if the electron density is low enough. The reasons is that even at absolute zero, a gas of electrons has kinetic energy. At absolute zero the gas will minimize its energy. But it can’t do this by having all the electrons in a state with zero momentum, since you can’t put two electrons in the same state, thanks to the Pauli exclusion principle. So, higher momentum states need to be occupied, and this means there’s kinetic energy. And it has more if its density is high: if there’s less room in position space, the electrons are forced to occupy more room in momentum space.

When the density is high, this prevents the formation of a crystal: instead, we have lots of electrons whose wavefunctions are ‘sitting almost on top of each other’ in position space, but with different momenta. They’ll have lots of kinetic energy, so minimizing kinetic energy becomes more important than minimizing potential energy.

When the density is low, this effect becomes unimportant, and the electrons mainly try to minimize potential energy. So, they form a crystal with each electron avoiding the rest. It turns out they form a body-centered cubic: a crystal lattice formed of cubes, with an extra electron in the middle of each cube.

To know whether a uniform electron gas at zero temperature forms a crystal or not, you need to work out its so-called Wigner-Seitz radius. This is the average inter-particle spacing measured in units of the Bohr radius. The Bohr radius is the unit of length you can cook up from the electron mass, the electron charge and Planck’s constant:

\displaystyle{ a_0=\frac{\hbar^2}{m_e e^2} }

It’s mainly famous as the average distance between the electron and a proton in a hydrogen atom in its lowest energy state.

Simulations show that a 3-dimensional uniform electron gas crystallizes when the Wigner–Seitz radius is at least 106. The picture, however, shows an electron crystal in 2 dimensions, formed by electrons trapped on a thin film shaped like a disk. In 2 dimensions, Wigner crystals form when the Wigner–Seitz radius is at least 31. In the picture, the density is so low that we can visualize the electrons as points with well-defined positions.

So, the picture simply shows a bunch of points x_i trying to minimize the potential energy, which is proportional to

\displaystyle{ \sum_{i \ne j} \frac{1}{\|x_i - x_j\|} }

The lines between the dots are just to help you see what’s going on. They’re showing the Delauney triangulation, where we draw a graph that divides the plane into regions closer to one electron than all the rest, and then take the dual of that graph.

Thanks to energy minimization, this triangulation wants to be a lattice of equilateral triangles. But since such a triangular lattice doesn’t fit neatly into a disk, we also see some ‘defects’:

Most electrons have 6 neighbors. But there are also some red defects, which are electrons with 5 neighbors, and blue defects, which are electrons with 7 neighbors.

Note that there are 6 clusters of defects. In each cluster there is one more red defect than blue defect. I think this is not a coincidence.

Conjecture. When we choose a sufficiently large number of points x_i on a disk in such a way that

\displaystyle{ \sum_{i \ne j} \frac{1}{\|x_i - x_j\|} }

is minimized, and draw the Delauney triangulation, there will be 6 more vertices with 5 neighbors than vertices with 7 neighbors.

Here’s a bit of evidence for this, which is not at all conclusive. Take a sphere and triangulate it in such a way that each vertex has 5, 6 or 7 neighbors. Then here’s a cool fact: there must be 12 more vertices with 5 neighbors than vertices with 7 neighbors.

Puzzle. Prove this fact.

If we think of the picture above as the top half of a triangulated sphere, then each vertex in this triangulated sphere has 5, 6 or 7 neighbors. So, there must be 12 more vertices on the sphere with 5 neighbors than with 7 neighbors. So, it makes some sense that the top half of the sphere will contain 6 more vertices with 5 neighbors than with 7 neighbors. But this is not a proof.

I have a feeling this energy minimization problem has been studied with various numbers of points. So, there either be a lot of evidence for my conjecture, or some counterexamples that will force me to refine it. The picture shows what happens with 600 points on the disk. Maybe something dramatically different happens with 599! Maybe someone has even proved theorems about this. I just haven’t had time to look for such work.

The picture here was drawn by Arunas.rv and placed on Wikicommons on a Creative Commons Attribution-Share Alike 3.0 Unported license.


Information Processing in Chemical Networks (Part 1)

4 January, 2017

There’s a workshop this summer:

Dynamics, Thermodynamics and Information Processing in Chemical Networks, 13-16 June 2017, Complex Systems and Statistical Mechanics Group, University of Luxembourg. Organized by Massimiliano Esposito and Matteo Polettini.

They write, “The idea of the workshop is to bring in contact a small number of high-profile research groups working at the frontier between physics and biochemistry, with particular emphasis on the role of Chemical Networks.”

The speakers may include John Baez, Sophie de Buyl, Massimiliano Esposito, Arren Bar-Even, Christoff Flamm, Ronan Fleming, Christian Gaspard, Daniel Merkle, Philippe Nge, Thomas Ouldridge, Luca Peliti, Matteo Polettini, Hong Qian, Stefan Schuster, Alexander Skupin, Pieter Rein ten Wolde. I believe attendance is by invitation only, so I’ll endeavor to make some of the ideas presented available here at this blog.

Some of the people involved

I’m looking forward to this, in part because there will be a mix of speakers I’ve met, speakers I know but haven’t met, and speakers I don’t know yet. I feel like reminiscing a bit, and I hope you’ll forgive me these reminiscences, since if you try the links you’ll get an introduction to the interface between computation and chemical reaction networks.

In part 25 of the network theory series here, I imagined an arbitrary chemical reaction network and said:

We could try to use these reactions to build a ‘chemical computer’. But how powerful can such a computer be? I don’t know the answer.

Luca Cardelli answered my question in part 26. This was just my first introduction to the wonderful world of chemical computing. Erik Winfree has a DNA and Natural Algorithms Group at Caltech, practically next door to Riverside, and the people there do a lot of great work on this subject. David Soloveichik, now at U. T. Austin, is an alumnus of this group.

In 2014 I met all three of these folks, and many other cool people working on these theme, at a workshop I tried to summarize here:

Programming with chemical reaction networks, Azimuth, 23 March 2014.

The computational power of chemical reaction networks, 10 June 2014.

Chemical reaction network talks, 26 June 2014.

I met Matteo Polettini about a year later, at a really big workshop on chemical reaction networks run by Elisenda Feliu and Carsten Wiuf:

Trends in reaction network theory (part 1), Azimuth, 27 January 2015.

Trends in reaction network theory (part 2), Azimuth, 1 July 2015.

Polettini has his own blog, very much worth visiting. For example, you can see his view of the same workshop here:

• Matteo Polettini, Mathematical trends in reaction network theory: part 1 and part 2, Out of Equilibrium, 1 July 2015.

Finally, I met Massimiliano Esposito and Christoph Flamm recently at the Santa Fe Institute, at a workshop summarized here:

Information processing and biology, Azimuth, 7 November 2016.

So, I’ve gradually become educated in this area, and I hope that by June I’ll be ready to say something interesting about the semantics of chemical reaction networks. Blake Pollard and I are writing a paper about this now.