A Networked World (Part 2)

30 March, 2015

guest post by David Spivak

Creating a knowledge network

In 2007, I asked myself: as mathematically as possible, what can formally ground meaningful information, including both its successful communication and its role in decision-making? I believed that category theory could be useful in formalizing the type of object that we call information, and the type of relationship that we call communication.

Over the next few years, I worked on this project. I tried to understand what information is, how it is stored, and how it can be transferred between entities that think differently. Since databases store information, I wanted to understand databases category-theoretically. I eventually decided that databases are basically just categories \mathcal{C}, corresponding to a collection of meaningful concepts and connections between them, and that these categories are equipped with functors \mathcal{C}\to\mathbf{Set}. Such a functor assigns to each meaningful concept a set of examples and connects them as dictated by the morphisms of \mathbf{Set}. I later found out that this “databases as categories” idea was not original; it is due to Rosebrugh and others. My view on the subject has matured a bit since then, but I still like this basic conception of databases.

If we model a person’s knowledge as a database (interconnected tables of examples of things and relationships), then the network of knowledgeable humans could be conceptualized as a simplicial complex equipped with a sheaf of databases. Here, a vertex represents an individual, with her database of knowledge. An edge represents a pair of individuals and a common ground database relating their individual databases. For example, you and your brother have a database of concepts and examples from your history. The common-ground database is like the intersection of the two databases, but it could be smaller (if the people don’t yet know they agree on something). In a simplicial complex, there are not only vertices and edges, but also triangles (and so on). These would represent databases held in common between three or more people.

I wanted “regular people” to actually make such a knowledge network, i.e., to share their ideas in the form of categories and link them together with functors. Of course, most people don’t know categories and functors, so I thought I’d make things easier for them by equipping categories with linguistic structures: text boxes for objects, labeled arrows for morphisms. For example, “a person has a mother” would be a morphism from the “person” object, to the “mother” object. I called such a linguistic category an olog, playing on the word blog. The idea (originally inspired during a conversation with my friend Ralph Hutchison) was that I wanted people, especially scientists, to blog their ontologies, i.e., to write “onto-logs” like others make web-logs.

Ologs codify knowledge. They are like concept webs, except with more rules that allow them to simultaneously serve as database schemas. By introducing ologs, I hoped I could get real people to upload their ideas into what is now called the cloud, and make the necessary knowledge network. I tried to write my papers to engage an audience of intelligent lay-people rather than for an audience of mathematicians. It was a risk, but to me it was the only honest approach to the larger endeavor.

(For students who might want to try going out on a limb like this, you should know that I was offered zero jobs after my first postdoc at University of Oregon. The risk was indeed risky, and one has to be ok with that. I personally happened to be the beneficiary of good luck and was offered a grant, out of the clear blue sky, by a former PhD in algebraic geometry, who worked at the Office of Naval Research at the time. That, plus the helping hands of Haynes Miller and many other brilliant and wonderful people, can explain how I lived to tell the tale.)

So here’s how the simplicial complex of ologs would ideally help humanity steer. Suppose we say that in order for one person to learn from another, the two need to find a common language and align some ideas. This kind of (usually tacit) agreement on, or alignment of, an initial common-ground vocabulary and concept-set is important to get their communication onto a proper footing.

For two vertices in such a simplicial network, the richer their common-ground olog (i.e., the database corresponding to the edge between them) is, the more quickly and accurately the vertices can share new ideas. As ideas are shared over a simplex, all participating databases can be updated, hence making the communication between them richer. In around 2010, Mathieu Anel and I worked out a formal way this might occur; however, we have not yet written it up. The basic idea can be found here.

In this setup, the simplicial complex of human knowledge should grow organically. Scientists, business people, and other people might find benefit in ologging their ideas and conceptions, and using them to learn from their peers. I imagined a network organizing itself, where simplices of like-minded people could share information with neighboring groups across common faces.

I later wrote a book called Category Theory for the Sciences, available free online, to help scientists learn how category theory could apply to familiar situations like taxonomies, graphs, and symmetries. Category theory, simply explained, becomes a wonderful key to the whole world of pure mathematics. It’s the closest thing we have to a universal language of thought, and therefore an appropriate language for forming connections.

My working hypothesis for the knowledge network was this. The information held by people whose worldview is more true—more accurate—would have better predictive power, i.e., better results. This is by definition: I define ones knowledge to be accurate as the extent to which, when he uses this knowledge to direct his actions, he has good luck handling his worldly affairs. As Louis Pasteur said, “Luck favors the prepared mind.” It follows that if someone has a track record of success, others will value finding broad connections into his olog. However, to link up with someone you must find a part of your olog that aligns with his—a functorial connection—and you can only receive meaningful information from him to the extent that you’ve found such common ground.

Thus, people who like to live in fiction worlds would find it difficult to connect, except to other like-minded “Obama’s a Christian”-type people. To the extent you are imbedded in a fictional—less accurate, less predictive—part of the network, you will find it difficult to establish functorial connections to regions of more accurate knowledge, and therefore you can’t benefit from the predictive and conceptual value of this knowledge.

In other words, people would be naturally inclined to try to align their understanding with people that are better informed. I felt hope that this kind of idea could lead to a system in which honesty and accuracy were naturally rewarded. At the very least, those who used it could share information much more effectively than they do now. This was my plan; I just had to make it real.

I had a fun idea for publicizing ologs. The year was in 2008, and I remember thinking it would be fantastic if I could olog the political platform and worldview of Barack Obama and of Sarah Palin. I wished I could sit down with them and other politicians and help them write ologs about what they believed and wanted for the country. I imagined that some politicians might have ologs that look like a bunch of disconnected text boxes—like a brain with neurons but no synapses—a collection of talking points but no real substantive ideas.

Anyway, there I was, trying to understand everything this way: all information was categories (or perhaps sketches) and presheaves. I would work with interested people from any academic discipline, such as materials science, to make ologs about whatever information they wanted to record category-theoretically. Ologs weren’t a theory of everything, but instead, as Jack Morava put it, a theory of anything.

One day I was working on a categorical sketch to model processes within processes, but somehow it really wasn’t working properly. The idea was simple: each step in a recipe is a mini-recipe of its own. Like chopping the carrots means getting out a knife and cutting board, putting a carrot on there, and bringing the knife down successively along it. You can keep zooming into any of these and see it as its own process. So there is some kind of nested, fractal-like behavior here. The olog I made could model the idea of steps in a recipe, but I found it difficult to encode the fact that each step was itself a recipe.

This nesting thing seemed like an idea that mathematics should treat beautifully, and ologs weren’t doing it justice. It was then when I finally admitted that there might be other fish in the mathematical sea.

A Networked World (Part 1)

27 March, 2015

guest post by David Spivak

The problem

The idea that’s haunted me, and motivated me, for the past seven years or so came to me while reading a book called The Moment of Complexity: our Emerging Network Culture, by Mark C. Taylor. It was a fascinating book about how our world is becoming increasingly networked—wired up and connected—and that this is leading to a dramatic increase in complexity. I’m not sure if it was stated explicitly there, but I got the idea that with the advent of the World Wide Web in 1991, a new neural network had been born. The lights had been turned on, and planet earth now had a brain.

I wondered how far this idea could be pushed. Is the world alive, is it a single living thing? If it is, in the sense I meant, then its primary job is to survive, and to survive it’ll have to make decisions. So there I was in my living room thinking, “oh my god, we’ve got to steer this thing!”

Taylor pointed out that as complexity increases, it’ll become harder to make sense of what’s going on in the world. That seemed to me like a big problem on the horizon, because in order to make good decisions, we need to have a good grasp on what’s occurring. I became obsessed with the idea of helping my species through this time of unprecedented complexity. I wanted to understand what was needed in order to help humanity make good decisions.

What seemed important as a first step is that we humans need to unify our understanding—to come to agreement—on matters of fact. For example, humanity still doesn’t know whether global warming is happening. Sure almost all credible scientists have agreed that it is happening, but does that steer money into programs that will slow it or mitigate its effects? This isn’t an issue of what course to take to solve a given problem; it’s about whether the problem even exists! It’s like when people were talking about Obama being a Muslim, born in Kenya, etc., and some people were denying it, saying he was born in Hawaii. If that’s true, why did he repeatedly refuse to show his birth certificate?

It is important, as a first step, to improve the extent to which we agree on the most obvious facts. This kind of “sanity check” is a necessary foundation for discussions about what course we should take. If we want to steer the ship, we have to make committed choices, like “we’re turning left now,” and we need to do so as a group. That is, there needs to be some amount of agreement about the way we should steer, so we’re not fighting ourselves.

Luckily there are a many cases of a group that needs to, and is able to, steer itself as a whole. For example as a human, my neural brain works with my cells to steer my body. Similarly, corporations steer themselves based on boards of directors, and based on flows of information, which run bureaucratically and/or informally between different parts of the company. Note that in neither case is there any suggestion that each part—cell, employee, or corporate entity—is “rational”; they’re all just doing their thing. What we do see in these cases is that the group members work together in a context where information and internal agreement is valued and often attained.

It seemed to me that intelligent, group-directed steering is possible. It does occur. But what’s the mechanism by which it happens, and how can we think about it? I figured that the way we steer, i.e., make decisions, is by using information.

I should be clear: whenever I say information, I never mean it “in the sense of Claude Shannon”. As beautiful as Shannon’s notion of information is, he’s not talking about the kind of information I mean. He explicitly said in his seminal paper that information in his sense is not concerned with meaning:

Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages.

In contrast, I’m interested in the semantic stuff, which flows between humans, and which makes possible decisions about things like climate change. Shannon invented a very useful quantitative measure of meaningless probability distributions.

That’s not the kind of information I’m talking about. When I say “I want to know what information is”, I’m saying I want to formulate the notion of human-usable semantic meaning, in as mathematical a way as possible.

Back to my problem: we need to steer the ship, and to do so we need to use information properly. Unfortunately, I had no idea what information is, nor how it’s used to make decisions (let alone to make good ones), nor how it’s obtained from our interaction with the world. Moreover, I didn’t have a clue how the minute information-handling at the micro-level, e.g., done by cells inside a body or employees inside a corporation, would yield information-handling at the macro (body or corporate) level.

I set out to try to understand what information is and how it can be communicated. What kind of stuff is information? It seems to follow rules: facts can be put together to form new facts, but only in certain ways. I was once explaining this idea to Dan Kan, and he agreed saying, “Yes, information is inherently a combinatorial affair.” What is the combinatorics of information?

Communication is similarly difficult to understand, once you dig into it. For example, my brain somehow enables me to use information and so does yours. But our brains are wired up in personal and ad hoc ways, when you look closely, a bit like a fingerprint or retinal scan. I found it fascinating that two highly personalized semantic networks could interface well enough to effectively collaborate.

There are two issues that I wanted to understand, and by to understand I mean to make mathematical to my own satisfaction. The first is what information is, as structured stuff, and what communication is, as a transfer of structured stuff. The second is how communication at micro-levels can create, or be, understanding at macro-levels, i.e., how a group can steer as a singleton.

Looking back on this endeavor now, I remain concerned. Things are getting increasingly complex, in the sorts of ways predicted by Mark C. Taylor in his book, and we seem to be losing some control: of the NSA, of privacy, of people 3D printing guns or germs, of drones, of big financial institutions, etc.

Can we expect or hope that our species as a whole will make decisions that are healthy, like keeping the temperature down, given the information we have available? Are we in the driver’s seat, or is our ship currently in the process of spiraling out of our control?

Let’s assume that we don’t want to panic but that we do want to participate in helping the human community to make appropriate decisions. A possible first step could be to formalize the notion of “using information well”. If we could do this rigorously, it would go a long way toward helping humanity get onto a healthy course. Further, mathematics is one of humanity’s best inventions. Using this tool to improve our ability to use information properly is a non-partisan approach to addressing the issue. It’s not about fighting, it’s about figuring out what’s happening, and weighing all our options in an informed way.

So, I ask: What kind of mathematics might serve as a formal ground for the notion of meaningful information, including both its successful communication and its role in decision-making?

Stationary Stability in Finite Populations

24 March, 2015

guest post by Marc Harper

A while back, in the article Relative entropy minimization in evolutionary dynamics, we looked at extensions of the information geometry / evolutionary game theory story to more general time-scales, incentives, and geometries. Today we’ll see how to make this all work in finite populations!

Let’s recall the basic idea from last time, which John also described in his information geometry series. The main theorem is this: when there’s an evolutionarily stable state for a given fitness landscape, the relative entropy between the stable state and the population distribution decreases along the population trajectories as they converge to the stable state. In short, relative entropy is a Lyapunov function. This is a nice way to look at the action of a population under natural selection, and it has interesting analogies to Bayesian inference.

The replicator equation is a nice model from an intuitive viewpoint, and it’s mathematically elegant. But it has some drawbacks when it comes to modeling real populations. One major issue is that the replicator equation implicitly assumes that the population proportions of each type are differentiable functions of time, obeying a differential equation. This only makes sense in the limit of large populations. Other closely related models, such as the Lotka-Volterra model, focus on the number of individuals of each type (e.g. predators and prey) instead of the proportion. But they often assume that the number of individuals is a differentiable function of time, and a population of 3.5 isn’t very realistic either.

Real populations of replicating entities are not infinitely large; in fact they are often relatively small and of course have whole numbers of each type, at least for large biological replicators (like animals). They take up space and only so many can interact meaningfully. There are quite a few models of evolution that handle finite populations and some predate the replicator equation. Models with more realistic assumptions typically have to leave the realm of derivatives and differential equations behind, which means that the analysis of such models is more difficult, but the behaviors of the models are often much more interesting. Hopefully by the end of this post, you’ll see how all of these diagrams fit together:

One of the best-known finite population models is the Moran process, which is a Markov chain on a finite population. This is the quintessential birth-death process. For a moment consider a population of just two types A and B. The state of the population is given by a pair of nonnegative integers (a,b) with a+b=N, the total number of replicators in the population, and a and b the number of individuals of type A and B respectively. Though it may artificial to fix the population size N, this often turns out not to be that big of a deal, and you can assume the population is at its carrying capacity to make the assumption realistic. (Lots of people study populations that can change size and that have replicators spatially distributed say on a graph, but we’ll assume they can all interact with each whenever they want for now).

A Markov model works by transitioning from state to state in each round of the process, so we need to define the transitions probabilities to complete the model. Let’s put a fitness landscape on the population, given by two functions f_A and f_B of the population state (a,b). Now we choose an individual to reproduce proportionally to fitness, e.g. we choose an A individual to reproduce with probability

\displaystyle{ \frac{a f_A}{a f_A + b f_B} }

since there are a individuals of type A and they each have fitness f_A. This is analogous to the ratio of fitness to mean fitness from the discrete replicator equation, since

\displaystyle{ \frac{a f_A}{a f_A + b f_B} =  \frac{\frac{a}{N} f_A}{\frac{a}{N} f_A + \frac{b}{N} f_B} \to \frac{x_i f_i(x)}{\overline{f(x)}} }

and the discrete replicator equation is typically similar to the continuous replicator equation (this can be made precise), so the Moran process captures the idea of natural selection in a similar way. Actually there is a way to recover the replicator equation from the Moran process in large populations—details at the end!

We’ll assume that the fitnesses are nonnegative and that the total fitness (the denominator) is never zero; if that seems artificial, some people prefer to transform the fitness landscape by e^{\beta f(x)}, which gives a ratio reminiscent of the Boltzmann or Fermi distribution from statistical physics, with the parameter \beta playing the role of intensity of selection rather than inverse temperature. This is sometimes called Fermi selection.

That takes care of the birth part. The death part is easier: we just choose an individual at random (uniformly) to be replaced. Now we can form the transition probabilities of moving between population states. For instance the probability of moving from state (a,b) to (a+1, b-1) is given by the product of the birth and death probabilities, since they are independent:

\displaystyle{ T_a^{a+1} = \frac{a f_A}{a f_A + b f_B} \frac{b}{N} }

since we have to chose a replicator of type A to reproduce and one of type B to be replaced. Similarly for (a,b) to (a-1, b+1) (switch all the a’s and b’s), and we can write the probability of staying in the state (a, N-a) as

T_a^{a} = 1 - T_{a}^{a+1} - T_{a}^{a-1}

Since we only replace one individual at a time, this covers all the possible transitions, and keeps the population constant.

We’d like to analyze this model and many people have come up with clever ways to do so, computing quantities like fixation probabilities (also known as absorption probabilities), indicating the chance that the population will end up with one type completely dominating, i.e. in state (0, N) or (N,0). If we assume that the fitness of type A is constant and simply equal to 1, and the fitness of type B is r \neq 1, we can calculate the probability that a single mutant of type B will take over a population of type A using standard Markov chain methods:

\displaystyle{\rho = \frac{1 - r^{-1}}{1 - r^{-N}} }

For neutral relative fitness (r=1), \rho = 1/N, which is the probability a neutral mutant invades by drift alone since selection is neutral. Since the two boundary states (0, N) or (N,0) are absorbing (no transitions out), in the long run every population ends up in one of these two states, i.e. the population is homogeneous. (This is the formulation referred to by Matteo Smerlak in The mathematical origins of irreversibility.)

That’s a bit different flavor of result than what we discussed previously, since we had stable states where both types were present, and now that’s impossible, and a bit disappointing. We need to make the population model a bit more complex to have more interesting behaviors, and we can do this in a very nice way by adding the effects of mutation. At the time of reproduction, we’ll allow either type to mutate into the other with probability \mu. This changes the transition probabilities to something like

\displaystyle{ T_a^{a+1} = \frac{a (1-\mu) f_A + b \mu f_B}{a f_A + b f_B} \frac{b}{N} }

Now the process never stops wiggling around, but it does have something known as a stationary distribution, which gives the probability that the population is in any given state in the long run.

For populations with more than two types the basic ideas are the same, but there are more neighboring states that the population could move to, and many more states in the Markov process. One can also use more complicated mutation matrices, but this setup is good enough to typically guarantee that no one species completely takes over. For interesting behaviors, typically \mu = 1/N is a good choice (there’s some biological evidence that mutation rates are typically inversely proportional to genome size).

Without mutation, once the population reached (0,N) or (N,0), it stayed there. Now the population bounces between states, either because of drift, selection, or mutation. Based on our stability theorems for evolutionarily stable states, it’s reasonable to hope that for small mutation rates and larger populations (less drift), the population should spend most of its time near the evolutionarily stable state. This can be measured by the stationary distribution which gives the long run probabilities of a process being in a given state.

Previous work by Claussen and Traulsen:

• Jens Christian Claussen and Arne Traulsen, Non-Gaussian fluctuations arising from finite populations: exact results for the evolutionary Moran process, Physical Review E 71 (2005), 025101.

suggested that the stationary distribution is at least sometimes maximal around evolutionarily stable states. Specifically, they showed that for a very similar model with fitness landscape given by

\left(\begin{array}{c} f_A \\ f_B \end{array}\right)  = \left(\begin{array}{cc} 1 & 2\\ 2&1 \end{array}\right)  \left(\begin{array}{c} a\\ b \end{array}\right)

the stationary state is essentially a binomial distribution centered at (N/2, N/2).

Unfortunately, the stationary distribution can be very difficult to compute for an arbitrary Markov chain. While it can be computed for the Markov process described above without mutation, and in the case studied by Claussen and Traulsen, there’s no general analytic formula for the process with mutation, nor for more than two types, because the processes are not reversible. Since we can’t compute the stationary distribution analytically, we’ll have to find another way to show that the local maxima of the stationary distribution are “evolutionarily stable”. We can approximate the stationary distribution fairly easily with a computer, so it’s easy to plot the results for just about any landscape and reasonable population size (e.g. N \approx 100).

It turns out that we can use a relative entropy minimization approach, just like for the continuous replicator equation! But how? We lack some essential ingredients such as deterministic and differentiable trajectories. Here’s what we do:

• We show that the local maxima and minima of the stationary distribution satisfy a complex balance criterion.

• We then show that these states minimize an expected relative entropy.

• This will mean that the current state and the expected next state are ‘close’.

• Lastly, we show that these states satisfy an analogous definition of evolutionary stability (now incorporating mutation).

The relative entropy allows us to measure how close the current state is to the expected next state, which captures the idea of stability in another way. This ports the relative minimization Lyapunov result to some more realistic Markov chain models. The only downside is that we’ll assume the populations are “sufficiently large”, but in practice for populations of three types, N=20 is typically enough for common fitness landscapes (there are lots of examples here for N=80, which are prettier than the smaller populations). The reason for this is that the population state (a,b) needs enough “resolution” (a/N, b/N) to get sufficiently close to the stable state, which is not necessarily a ratio of integers. If you allow some wiggle room, smaller populations are still typically pretty close.

Evolutionarily stable states are closely related to Nash equilibria, which have a nice intuitive description in traditional game theory as “states that no player has an incentive to deviate from”. But in evolutionary game theory, we don’t use a game matrix to compute e.g. maximum payoff strategies, rather the game matrix defines a fitness landscape which then determines how natural selection unfolds.

We’re going to see this idea again in a moment, and to help get there let’s introduce an function called an incentive that encodes how a fitness landscape is used for selection. One way is to simply replace the quantities a f_A(a,b) and b f_B(a,b) in the fitness-proportionate selection ratio above, which now becomes (for two population types):

\displaystyle{ \frac{\varphi_A(a,b)}{\varphi_A(a,b) + \varphi_B(a,b)} }

Here \varphi_A(a,b) and \varphi_B(a,b) are the incentive function components that determine how the fitness landscape is used for natural selection (if at all). We have seen two examples above:

\varphi_A(a,b) = a f_A(a, b)

for the Moran process and fitness-proportionate selection, and

\varphi_A(a,b) = a e^{\beta f_A(a, b)}

for an alternative that incorporates a strength of selection term \beta, preventing division by zero for fitness landscapes defined by zero-sum game matrices, such as a rock-paper-scissors game. Using an incentive function also simplifies the transition probabilities and results as we move to populations of more than two types. Introducing mutation, we can describe the ratio for incentive-proportion selection with mutation for the ith population type when the population is in state x=(a,b,\ldots) / N as

\displaystyle{ p_i(x) = \frac{\sum_{k=1}^{n}{\varphi_k(x) M_{i k} }}{\sum_{k=1}^{n}{\varphi_k(x)}} }

for some matrix of mutation probabilities M. This is just the probability that we get a new individual of the ith type (by birth and/or mutation). A common choice for the mutation matrix is to use a single mutation probability \mu and spread it out over all the types, such as letting

M_{ij} = \mu / (n-1)


M_{ii} = 1 - \mu

Now we are ready to define the expected next state for the population and see how it captures a notion of stability. For a given state population x in a multitype population, using x to indicate the normalized population state (a,b,\ldots) / N, consider all the neighboring states y that the population could move to in one step of the process (one birth-death cycle). These neighboring states are the result of increasing a population type by one (birth) and decreasing another by one (death, possibly the same type), of course excluding cases on the boundary where the number of individuals of any type drops below zero or rises above N. Now we can define the expected next state as the sum of neighboring states weighted by the transition probabilities

E(x) = \sum_{y}{y T_x^{y}}

with transition probabilities given by

T_{x}^{y} = p_{i}(x) x_{j}

for states y that differ in 1/N at the ith coordinate and -1/N at jth coordinate from x. Here x_j is just the probability of the random death of an individual of the jth type, so the transition probabilities are still just birth (with mutation) and death as for the Moran process we started with.

Skipping some straightforward algebraic manipulations, we can show that

\displaystyle{ E(x) = \sum_{y}{y T_x^{y}} = \frac{N-1}{N}x + \frac{1}{N}p(x)}

Then it’s easy to see that E(x) = x if and only if x = p(x), and that x = p(x) if and only if x_i = \varphi_i(x). So we have a nice description of ‘stability’ in terms of fixed points of the expected next state function and the incentive function

x = E(x) = p(x) = \varphi(x),

and we’ve gotten back to “no one has an incentive to deviate”. More precisely, for the Moran process

\varphi_i(x) = x_i f_i(x)

and we get back f_i(x) = f_j(x) for every type. So we take x = \varphi(x) as our analogous condition to an evolutionarily stable state, though it’s just the ‘no motion’ part and not also the ‘stable’ part. That’s what we need the stationary distribution for!

To turn this into a useful number that measures stability, we use the relative entropy of the expected next state and the current state, in analogy with the Lyapunov theorem for the replicator equation. The relative entropy

\displaystyle{ D(x, y) = \sum_i x_i \ln(x_i) - y_i \ln(x_i) }

has the really nice property that D(x,y) = 0 if and only if x = y, so we can use the relative entropy D(E(x), x) as a measure of how close to stable any particular state is! Here the expected next state takes the place of the ‘evolutionarily stable state’ in the result described last time for the replicator equation.

Finally, we need to show that the maxima (and minima) of of the stationary distribution are these fixed points by showing that these states minimize the expected relative entropy.

Seeing that local maxima and minima of the stationary distribution minimize the expected relative entropy is a more involved, so let’s just sketch the details. In general, these Markov processes are not reversible, so they don’t satisfy the detailed-balance condition, but the stationary probabilities do satisfy something called the global balance condition, which says that for the stationary distribution s we have that

s_x \sum_{x}{T_x^{y}} = \sum_{y}{s_y T_y^{x}}

When the stationary distribution is at a local maximum (or minimum), we can show essentially that this implies (up to an \epsilon, for a large enough population) that

\displaystyle{\sum_{x}{T_x^{y}} = \sum_{y}{T_y^{x}} }

a sort of probability inflow-outflow equation, which is very similar to the condition of complex balanced equilibrium described by Manoj Gopalkrishnan in this Azimuth post. With some algebraic manipulation, we can show that these states have E(x)=x.

Now let’s look again at the figures from the start. The first shows the vector field of the replicator equation:

You can see rest points at the center, on the center of each boundary edge, and on the corner points. The center point is evolutionarily stable, the center points of the boundary are semi-stable (but stable when the population is restricted to a boundary simplex), and the corner points are unstable.

This one shows the stationary distribution for a finite population model with a Fermi incentive on the same landscape, for a population of size 80:

A fixed population size gives a partitioning of the simplex, and each triangle of the partition is colored by the value of the stationary distribution. So you can see that there are local maxima in the center and on the centers of the triangle boundary edges. In this case, the size of the mutation probability determines how much of the stationary distribution is concentrated on the center of the simplex.

This shows one-half of the Euclidean distance squared between the current state and the expected next state:

And finally, this shows the same thing but with the relative entropy as the ‘distance function':

As you can see, the Euclidean distance is locally minimal at each of the local maxima and minima of the stationary distribution (including the corners); the relative entropy is only guaranteed so on the interior states (because the relative entropy doesn’t play nicely with the boundary, and unlike the replicator equation, the Markov process can jump on and off the boundary). It turns out that the relative Rényi entropies for q between 0 and 1 also work just fine, but for the large population limit (the replicator dynamic), the relative entropy is the somehow the right choice for the replicator equation (has the derivative that easily gives Lyapunov stability), which is due to the connections between relative entropy and Fisher information in the information geometry of the simplex. The Euclidean distance is the q=0 case and the ordinary relative entropy is q=1.

As it turns out, something very similar holds for another popular finite population model, the Wright–Fisher process! This model is more complicated, so if you are interested in the details, check out our paper, which has many nice examples and figures. We also define a process that bridges the gap between the atomic nature of the Moran process and the generational nature of the Wright–Fisher process, and prove the general result for that model.

Finally, let’s see how the Moran process relates back to the replicator equation (see also the appendix in this paper), and how we recover the stability theory of the replicator equation. We can use the transition probabilities of the Moran process to define a stochastic differential equation (called a Langevin equation) with drift and diffusion terms that are essentially (for populations with two types:

\mathrm{Drift}(x) = T^{+}(x) - T^{-}(x)

\displaystyle{ \mathrm{Diffusion}(x) = \sqrt{\frac{T^{+}(x) + T^{-}(x)}{N}} }

As the population size gets larger, the diffusion term drops out, and the stochastic differential equation becomes essentially the replicator equation. For the stationary distribution, the variance (e.g. for the binomial example above) also has an inverse dependence on N, so the distribution limits to a delta-function that is zero except for at the evolutionarily stable state!

What about the relative entropy? Loosely speaking, as the population size gets larger, the iteration of the expected next state also becomes deterministic. Then the evolutionarily stable states is a fixed point of the expected next state function, and the expected relative entropy is essentially the same as the ordinary relative entropy, at least in a neighborhood of the evolutionarily stable state. This is good enough to establish local stability.

Earlier I said both the local maxima and minima minimize the expected relative entropy. Dash and I haven’t proven that the local maxima always correspond to evolutionarily stable states (and the minima to unstable states). That’s because the generalization of evolutionarily stable state we use is really just a ‘no motion’ condition, and isn’t strong enough to imply stability in a neighborhood for the deterministic replicator equation. So for now we are calling the local maxima stationary stable states.

We’ve also tried a similar approach to populations evolving on networks, which is a popular topic in evolutionary graph theory, and the results are encouraging! But there are many more ‘states’ in such a process, since the configuration of the network has to be taken into account, and whether the population is clustered together or not. See the end of our paper for an interesting example of a population on a cycle.

Thermodynamics with Continuous Information Flow

21 March, 2015

guest post by Blake S. Pollard

Over a century ago James Clerk Maxwell created a thought experiment that has helped shape our understanding of the Second Law of Thermodynamics: the law that says entropy can never decrease.

Maxwell’s proposed experiment was simple. Suppose you had a box filled with an ideal gas at equilibrium at some temperature. You stick in an insulating partition, splitting the box into two halves. These two halves are isolated from one another except for one important caveat: somewhere along the partition resides a being capable of opening and closing a door, allowing gas particles to flow between the two halves. This being is also capable of observing the velocities of individual gas particles. Every time a particularly fast molecule is headed towards the door the being opens it, letting fly into the other half of the box. When a slow particle heads towards the door the being keeps it closed. After some time, fast molecules would build up on one side of the box, meaning half the box would heat up! To an observer it would seem like the box, originally at a uniform temperature, would for some reason start splitting up into a hot half and a cold half. This seems to violate the Second Law (as well as all our experience with boxes of gas).

Of course, this apparent violation probably has something to do with positing the existence of intelligent microscopic doormen. This being, and the thought experiment itself, are typically referred to as Maxwell’s demon.

Photo credit: Peter MacDonald, Edmonds, UK

When people cook up situations that seem to violate the Second Law there is typically a simple resolution: you have to consider the whole system! In the case of Maxwell’s demon, while the entropy of the box decreases, the entropy of the system as a whole, demon include, goes up. Precisely quantifying how Maxwell’s demon doesn’t violate the Second Law has led people to a better understanding of the role of information in thermodynamics.

At the American Physical Society March Meeting in San Antonio, Texas, I had the pleasure of hearing some great talks on entropy, information, and the Second Law. Jordan Horowitz, a postdoc at Boston University, gave a talk on his work with Massimiliano Esposito, a researcher at the University of Luxembourg, on how one can understand situations like Maxwell’s demon (and a whole lot more) by analyzing the flow of information between subsystems.

Consider a system made up of two parts, X and Y. Each subsystem has a discrete set of states. Each systems makes transitions among these discrete states. These dynamics can be modeled as Markov processes. They are interested in modeling the thermodynamics of information flow between subsystems. To this end they consider a bipartite system, meaning that either X transitions or Y transitions, never both at the same time. The probability distribution p(x,y) of the whole system evolves according to the master equation:

\displaystyle{ \frac{dp(x,y)}{dt} = \sum_{x', y'} H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}p(x,y) }

where H_{x,x'}^{y,y'} is the rate at which the system transitions from (x',y') \to (x,y). The ‘bipartite’ condition means that H has the form

H_{x,x'}^{y,y'} = \left\{ \begin{array}{cc} H_{x,x'}^y & x \neq x'; y=y' \\   H_x^{y,y'} & x=x'; y \neq y' \\  0 & \text{otherwise.} \end{array} \right.

The joint system is an open system that satisfies the second law of thermodynamics:

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_{XY}}{dt} + \frac{dS_e}{dt} \geq 0 }


\displaystyle{ S_{XY} = - \sum_{x,y} p(x,y) \ln ( p(x,y) ) }

is the Shannon entropy of the system, satisfying

\displaystyle{ \frac{dS_{XY} }{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}   p(x,y) \right] \ln \left( \frac{p(x',y')}{p(x,y)} \right) }


\displaystyle{ \frac{dS_e}{dt}  = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{ H_{x,x'}^{y,y'} } {H_{x',x}^{y',y} } \right) }

is the entropy change of the environment.

We want to investigate how the entropy production of the whole system relates to entropy production in the bipartite pieces X and Y. To this end they define a new flow, the information flow, as the time rate of change of the mutual information

\displaystyle{ I = \sum_{x,y} p(x,y) \ln \left( \frac{p(x,y)}{p(x)p(y)} \right) }

Its time derivative can be split up as

\displaystyle{ \frac{dI}{dt} = \frac{dI^X}{dt} + \frac{dI^Y}{dt}}


\displaystyle{ \frac{dI^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y} p(x',y) - H_{x',x}^{y}p(x,y) \right] \ln \left( \frac{ p(y|x) }{p(y|x')} \right) }


\displaystyle{ \frac{dI^Y}{dt} = \sum_{x,y} \left[ H_{x}^{y,y'}p(x,y') - H_{x}^{y',y}p(x,y) \right] \ln \left( \frac{p(x|y)}{p(x|y')} \right) }

are the information flows associated with the subsystems X and Y respectively.


\displaystyle{ \frac{dI^X}{dt} > 0}

a transition in X increases the mutual information I, meaning that X ‘knows’ more about Y and vice versa.

We can rewrite the entropy production entering into the second law in terms of these information flows as

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_i^X}{dt} + \frac{dS_i^Y}{dt} }


\displaystyle{ \frac{dS_i^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^y p(x',y) - H_{x',x}^y p(x,y) \right] \ln \left( \frac{H_{x,x'}^y p(x',y) } {H_{x',x}^y p(x,y) } \right) \geq 0 }

and similarly for \frac{dS_Y}{dt} . This gives the following decomposition of entropy production in each subsystem:

\displaystyle{ \frac{dS_i^X}{dt} = \frac{dS^X}{dt} + \frac{dS^X_e}{dt} - \frac{dI^X}{dt} \geq 0 }

\displaystyle{ \frac{dS_i^Y}{dt} = \frac{dS^Y}{dt} + \frac{dS^X_e}{dt} - \frac{dI^Y}{dt} \geq 0},

where the inequalities hold for each subsystem. To see this, if you write out the left hand side of each inequality you will find that they are both of the form

\displaystyle{ \sum_{x,y} \left[ x-y \right] \ln \left( \frac{x}{y} \right) }

which is non-negative for x,y \geq 0.

The interaction between the subsystems is contained entirely in the information flow terms. Neglecting these terms gives rise to situations like Maxwell’s demon where a subsystem seems to violate the second law.

Lots of Markov processes have boring equilibria \frac{dp}{dt} = 0 where there is no net flow among the states. Markov processes also admit non-equilibrium steady states, where there may be some constant flow of information. In this steady state all explicit time derivatives are zero, including the net information flow:

\displaystyle{ \frac{dI}{dt} = 0 }

which implies that \frac{dI^X}{dt} = - \frac{dI^Y}{dt}. In this situation the above inequalities become

\displaystyle{ \frac{dS^X_i}{dt} = \frac{dS_e^X}{dt} - \frac{dI^X}{dt} }


\displaystyle{ \frac{dS^Y_i}{dt} = \frac{dS_e^X}{dt} + \frac{dI^X}{dt} }.


\displaystyle{ \frac{dI^X}{dt} > 0 }

then X is learning something about Y or acting as a sensor. The first inequality

\frac{dS_e^X}{dt} \geq \frac{dI^X}{dt} quantifies the minimum amount of energy X must supply to do this sensing. Similarly -\frac{dS_e^Y}{dt} \leq \frac{dI^X}{dt} bounds the amount of useful energy is available to Y as a result of this information transfer.

In their paper Horowitz and Esposito explore a few other examples and show the utility of this simple breakup of a system into two interacting subsystems in explaining various interesting situations in which the flow of information has thermodynamic significance.

For the whole story, read their paper!

• Jordan Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Phys. Rev. X 4 (2014), 031015.

Planets in the Fourth Dimension

17 March, 2015

You probably that planets go around the sun in elliptical orbits. But do you know why?

In fact, they’re moving in circles in 4 dimensions. But when these circles are projected down to 3-dimensional space, they become ellipses!

This animation by Greg Egan shows the idea:

The plane here represents 2 of the 3 space dimensions we live in. The vertical direction is the mysterious fourth dimension. The planet goes around in a circle in 4-dimensional space. But down here in 3 dimensions, its ‘shadow’ moves in an ellipse!

What’s this fourth dimension I’m talking about here? It’s a lot like time. But it’s not exactly time. It’s the difference between ordinary time and another sort of time, which flows at a rate inversely proportional to the distance between the planet and the sun.

The movie uses this other sort of time. Relative to this other time, the planet is moving at constant speed around a circle in 4 dimensions. But in ordinary time, its shadow in 3 dimensions moves faster when it’s closer to the sun.

All this sounds crazy, but it’s not some new physics theory. It’s just a different way of thinking about Newtonian physics!

Physicists have known about this viewpoint at least since 1980, thanks to a paper by the mathematical physicist Jürgen Moser. Some parts of the story are much older. A lot of papers have been written about it.

But I only realized how simple it is when I got this paper in my email, from someone I’d never heard of before:

• Jesper Göransson, Symmetries of the Kepler problem, 8 March 2015.

I get a lot of papers by crackpots in my email, but the occasional gem from someone I don’t know makes up for all those.

The best thing about Göransson’s 4-dimensional description of planetary motion is that it gives a clean explanation of an amazing fact. You can take any elliptical orbit, apply a rotation of 4-dimensional space, and get another valid orbit!

Of course we can rotate an elliptical orbit about the sun in the usual 3-dimensional way and get another elliptical orbit. The interesting part is that we can also do 4-dimensional rotations. This can make a round ellipse look skinny: when we tilt a circle into the fourth dimension, its ‘shadow’ in 3-dimensional space becomes thinner!

In fact, you can turn any elliptical orbit into any other elliptical orbit with the same energy by a 4-dimensional rotation of this sort. All elliptical orbits with the same energy are really just circular orbits on the same sphere in 4 dimensions!

Jesper Göransson explains how this works in a terse and elegant way. But I can’t resist summarizing the key results.

The Kepler problem

Suppose we have a particle moving in an inverse square force law. Its equation of motion is

\displaystyle{ m \ddot{\mathbf{r}} = - \frac{k \mathbf{r}}{r^3} }

where \mathbf{r} is its position as a function of time, r is its distance from the origin, m is its mass, and k says how strong the force is. From this we can derive the law of conservation of energy, which says

\displaystyle{ \frac{m \dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{k}{r} = E }

for some constant E that depends on the particle’s orbit, but doesn’t change with time.

Let’s consider an attractive force, so k > 0, and elliptical orbits, so E < 0. Let’s call the particle a ‘planet’. It’s a planet moving around the sun, where we treat the sun as so heavy that it remains perfectly fixed at the origin.

I only want to study orbits of a single fixed energy E. This frees us to choose units of mass, length and time in which

m = 1, \;\; k = 1, \;\; E = -\frac{1}{2}

This will reduce the clutter of letters and let us focus on the key ideas. If you prefer an approach that keeps in the units, see Göransson’s paper.

Now the equation of motion is

\displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} }

and conservation of energy says

\displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} - \frac{1}{r} = -\frac{1}{2} }

The big idea, apparently due to Moser, is to switch from our ordinary notion of time to a new notion of time! We’ll call this new time s, and demand that

\displaystyle{ \frac{d s}{d t} = \frac{1}{r} }

This new kind of time ticks more slowly as you get farther from the sun. So, using this new time speeds up the planet’s motion when it’s far from the sun. If that seems backwards, just think about it. For a planet very far from the sun, one day of this new time could equal a week of ordinary time. So, measured using this new time, a planet far from the sun might travel in one day what would normally take a week.

This compensates for the planet’s ordinary tendency to move slower when it’s far from the sun. In fact, with this new kind of time, a planet moves just as fast when it’s farthest from the sun as when it’s closest.

Amazing stuff happens with this new notion of time!

To see this, first rewrite conservation of energy using this new notion of time. I’ve been using a dot for the ordinary time derivative, following Newton. Let’s use a prime for the derivative with respect to s. So, for example, we have

\displaystyle{ t' = \frac{dt}{ds} = r }


\displaystyle{ \mathbf{r}' = \frac{dr}{ds} = \frac{dt}{ds}\frac{dr}{dt} = r \dot{\mathbf{r}} }

Using this new kind of time derivative, Göransson shows that conservation of energy can be written as

\displaystyle{ (t' - 1)^2 + \mathbf{r}' \cdot \mathbf{r}' = 1 }

This is the equation of a sphere in 4-dimensional space!

I’ll prove this later. First let’s talk about what it means. To understand it, we should treat the ordinary time coordinate t and the space coordinates (x,y,z) on an equal footing. The point


moves around in 4-dimensional space as the parameter s changes. What we’re seeing is that the velocity of this point, namely

\mathbf{v} = (t',x',y',z')

moves around on a sphere in 4-dimensional space! It’s a sphere of radius one centered at the point


With some further calculation we can show some other wonderful facts:

\mathbf{r}''' = -\mathbf{r}'


t''' = -(t' - 1)

These are the usual equations for a harmonic oscillator, but with an extra derivative!

I’ll prove these wonderful facts later. For now let’s just think about what they mean. We can state both of them in words as follows: the 4-dimensional velocity \mathbf{v} carries out simple harmonic motion about the point (1,0,0,0).

That’s nice. But since \mathbf{v} also stays on the unit sphere centered at this point, we can conclude something even better: v must move along a great circle on this sphere, at constant speed!

This implies that the spatial components of the 4-dimensional velocity have mean 0, while the t component has mean 1.

The first part here makes a lot of sense: our planet doesn’t drift ever farther from the Sun, so its mean velocity must be zero. The second part is a bit subtler, but it also makes sense: the ordinary time t moves forward at speed 1 on average with respect to the new time parameter s, but its rate of change oscillates in a sinusoidal way.

If we integrate both sides of

\mathbf{r}''' = -\mathbf{r}'

we get

\mathbf{r}'' = -\mathbf{r} + \mathbf{a}

for some constant vector \mathbf{a}. This says that the position \mathbf{r} oscillates harmonically about a point \mathbf{a}. Since \mathbf{a} doesn’t change with time, it’s a conserved quantity: it’s called the Runge–Lenz vector.

Often people start with the inverse square force law, show that angular momentum and the Runge–Lenz vector are conserved, and use these 6 conserved quantities and Noether’s theorem to show there’s a 6-dimensional group of symmetries. For solutions with negative energy, this turns out to be the group of rotations in 4 dimensions, \mathrm{SO}(4). With more work, we can see how the Kepler problem is related to a harmonic oscillator in 4 dimensions. Doing this involves reparametrizing time.

I like Göransson’s approach better in many ways, because it starts by biting the bullet and reparametrizing time. This lets him rather efficiently show that the planet’s elliptical orbit is a projection to 3-dimensional space of a circular orbit in 4d space. The 4d rotational symmetry is then evident!

Göransson actually carries out his argument for an inverse square law in n-dimensional space; it’s no harder. The elliptical orbits in n dimensions are projections of circular orbits in n+1 dimensions. Angular momentum is a bivector in n dimensions; together with the Runge–Lenz vector it forms a bivector in n+1 dimensions. This is the conserved quantity associated to the (n+1) dimensional rotational symmetry of the problem.

He also carries out the analogous argument for positive-energy orbits, which are hyperbolas, and zero-energy orbits, which are parabolas. The hyperbolic case has the Lorentz group symmetry and the zero-energy case has Euclidean group symmetry! This was already known, but it’s nice to see how easily Göransson’s calculations handle all three cases.

Mathematical details

Checking all this is a straightforward exercise in vector calculus, but it takes a bit of work, so let me do some here. There will still be details left to fill in, and I urge that you give it a try, because this is the sort of thing that’s more interesting to do than to watch.

There are a lot of equations coming up, so I’ll put boxes around the important ones. The basic ones are the force law, conservation of energy, and the change of variables that gives

\boxed{  t' = r , \qquad  \mathbf{r}' = r \dot{\mathbf{r}} }

We start with conservation of energy:

\boxed{ \displaystyle{ \frac{\dot{\mathbf{r}} \cdot \dot{\mathbf{r}}}{2} -  \frac{1}{r}  = -\frac{1}{2} } }

and then use

\displaystyle{ \dot{\mathbf{r}} = \frac{d\mathbf{r}/dt}{dt/ds} = \frac{\mathbf{r}'}{t'} }

to obtain

\displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{2 t'^2}  - \frac{1}{t'} = -\frac{1}{2} }

With a little algebra this gives

\boxed{ \displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1} }

This shows that the ‘4-velocity’

\mathbf{v} = (t',x',y',z')

stays on the unit sphere centered at (1,0,0,0).

The next step is to take the equation of motion

\boxed{ \displaystyle{\ddot{\mathbf{r}} = - \frac{\mathbf{r}}{r^3} } }

and rewrite it using primes (s derivatives) instead of dots (t derivatives). We start with

\displaystyle{ \dot{\mathbf{r}} = \frac{\mathbf{r}'}{r} }

and differentiate again to get

\ddot{\mathbf{r}} = \displaystyle{ \frac{1}{r} \left(\frac{\mathbf{r}'}{r}\right)' }  = \displaystyle{ \frac{1}{r} \left( \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^2} \right) } = \displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} }

Now we use our other equation for \ddot{\mathbf{r}} and get

\displaystyle{ \frac{r \mathbf{r}'' - r' \mathbf{r}'}{r^3} = - \frac{\mathbf{r}}{r^3} }


r \mathbf{r}'' - r' \mathbf{r}' = -\mathbf{r}


\boxed{ \displaystyle{ \mathbf{r}'' =  \frac{r' \mathbf{r}' - \mathbf{r}}{r} } }

To go further, it’s good to get a formula for r'' as well. First we compute

r' = \displaystyle{ \frac{d}{ds} (\mathbf{r} \cdot \mathbf{r})^{\frac{1}{2}} } = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}}{r} }

and then differentiating again,

r'' = \displaystyle{\frac{d}{ds} \frac{\mathbf{r}' \cdot \mathbf{r}}{r} } = \displaystyle{ \frac{r \mathbf{r}'' \cdot \mathbf{r} + r \mathbf{r}' \cdot \mathbf{r}' - r' \mathbf{r}' \cdot \mathbf{r}}{r^2} }

Plugging in our formula for \mathbf{r}'', some wonderful cancellations occur and we get

r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 }

But we can do better! Remember, conservation of energy says

\displaystyle{ \mathbf{r}' \cdot \mathbf{r}' + (t' - 1)^2 = 1}

and we know t' = r. So,

\mathbf{r}' \cdot \mathbf{r}' = 1 - (r - 1)^2 = 2r - r^2


r'' = \displaystyle{ \frac{\mathbf{r}' \cdot \mathbf{r}'}{r} - 1 } = 1 - r

So, we see

\boxed{ r'' = 1 - r }

Can you get here more elegantly?

Since t' = r this instantly gives

\boxed{ t''' = 1 - t' }

as desired.

Next let’s get a similar formula for \mathbf{r}'''. We start with

\displaystyle{ \mathbf{r}'' =  \frac{r' \mathbf{r}' - \mathbf{r}}{r} }

and differentiate both sides to get

\displaystyle{ \mathbf{r}''' = \frac{r r'' \mathbf{r}' + r r' \mathbf{r}'' - r \mathbf{r}' - r'}{r^2} }

Then plug in our formulas for r'' and \mathbf{r}''. Some truly miraculous cancellations occur and we get

\boxed{  \mathbf{r}''' = -\mathbf{r}' }

I could show you how it works—but to really believe it you have to do it yourself. It’s just algebra. Again, I’d like a better way to see why this happens!

Integrating both sides—which is a bit weird, since we got this equation by differentiating both sides of another one—we get

\boxed{ \mathbf{r}'' = -\mathbf{r} + \mathbf{a} }

for some fixed vector \mathbf{a}, the Runge–Lenz vector. This says \mathbf{r} undergoes harmonic motion about \mathbf{a}. It’s quite remarkable that both \mathbf{r} and its norm r undergo harmonic motion! At first I thought this was impossible, but it’s just a very special circumstance.

The quantum version of a planetary orbit is a hydrogen atom. Everything we just did has a quantum version! For more on that, see

• Greg Egan, The ellipse and the atom.

For more of the history of this problem, see:

• John Baez, Mysteries of the gravitational 2-body problem.

This also treats quantum aspects, connections to supersymmetry and Jordan algebras, and more! Someday I’ll update it to include the material in this blog post.

Quantum Superposition

13 March, 2015

guest post by Piotr Migdał

In this blog post I will introduce some basics of quantum mechanics, with the emphasis on why a particle being in a few places at once behaves measurably differently from a particle whose position we just don’t know. It’s a kind of continuation of the “Quantum Network Theory” series (Part 1, Part 2) by Tomi Johnson about our work in Jake Biamonte’s group at the ISI Foundation in Turin. My goal is to explain quantum community detection. Before that, I need to introduce the relevant basics of quantum mechanics, and of the classical community detection.

But before I start, let me introduce myself, as it’s my first post to Azimuth.

I just finished my quantum optics theory Ph.D in Maciej Lewenstein’s group at The Institute of Photonic Sciences in Castelldefels, a beach near Barcelona. My scientific interests range from quantum physics, through complex networks, to data-driven approach…. to pretty much anything—and now I work as a data science freelancer. I enjoy doing data visualizations (for example of relations between topics in mathematics), I am a big fan of Rényi entropy (largely thanks to Azimuth), and I’m a believer in open science. If you think that there are too many off-topic side projects here, you are absolutely right!

In my opinion, quantum mechanics is easy. Based on my gifted education experience it takes roughly 9 intense hours to introduce entanglement to students having only a very basic linear algebra background. Even more, I believe that it is possible to get familiar with quantum mechanics by just playing with it—so I am developing a Quantum Game!

Quantum weirdness

In quantum mechanics a particle can be in a few places at once. It sounds strange. So strange, that some pioneers of quantum mechanics (including, famously, Albert Einstein) didn’t want to believe in it: not because of any disagreement with experiment, not because of any lack of mathematical beauty, just because it didn’t fit their philosophical view of physics.

It went further: in the Soviet Union the idea that electron can be in many places (resonance bonds) was considered to oppose materialism. Later, in California, hippies investigated quantum mechanics as a basis for parapsychology—which, arguably, gave birth to the field of quantum information.

As Griffiths put it in his Introduction to Quantum Mechanics (Chapter 4.4.1):

To the layman, the philosopher, or the classical physicist, a statement of the form “this particle doesn’t have a well-defined position” [...] sounds vague, incompetent, or (worst of all) profound. It is none of these.

In this guest blog post I will try to show that not only can a particle be in many places at once, but also that if it were not in many places at once then it would cause problems. That is, as fundamental phenomena as atoms forming chemical bonds, or particle moving in the vacuum, require it.

As in many other cases, the simplest non-trivial case is perfect for explaining idea, as it covers the most important phenomena, while being easy to analyze, visualize and comprehend. Quantum mechanics is not an exception—let us start with a system of two states.

A two state system

Let us study a simplified model of the hydrogen molecular ion \mathrm{H}_2^+, that is, a system of two protons and one electron (see Feynman Lectures on Physics, Vol. III, Chapter 10.1). Since the protons are heavy and slow, we treat them as fixed. We focus on the electron moving in the electric field created by protons.

In quantum mechanics we describe the state of a system using a complex vector. In simple terms, this is a list of complex numbers called ‘probability amplitudes’. For an electron that can be near one proton or another, we use a list of two numbers:

|\psi\rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

In this state the electron is near the first proton with probability |\alpha|^2, and near the second one with probability |\beta|^2.

Note that

|\psi \rangle = \alpha \begin{bmatrix}          1 \\ 0      \end{bmatrix} + \beta \begin{bmatrix}          0 \\ 1      \end{bmatrix}

So, we say the electron is in a ‘linear combination’ or ‘superposition’ of the two states

|1\rangle =      \begin{bmatrix}          1 \\ 0      \end{bmatrix}

(where it’s near the first proton) and the state

|2\rangle =      \begin{bmatrix}          0 \\ 1      \end{bmatrix}

(where it’s near the second proton).

Why do we denote unit vectors in strange brackets looking like

| \mathrm{something} \rangle ?

Well, this is called Dirac notation (or bra-ket notation) and it is immensely useful in quantum mechanics. We won’t go into it in detail here; merely note that | \cdot \rangle stands for a column vector and \langle \cdot | stands for a row vector, while \psi is a traditional symbol for a quantum state.).

Amplitudes can be thought as ‘square roots’ of probabilities. We can force an electron to localize by performing a classical measurement, for example by moving protons away and measuring which of them has neutral charge (for being coupled with the electron). Then, we get probability | \alpha|^2 of finding it near the first proton and |\beta|^2 of finding it near the second. So, we require that

|\alpha|^2 + |\beta|^2 = 1

Note that as amplitudes are complex, for a given probability there are many possible amplitudes. For example

1 = |1|^2 = |-1|^2 = |i|^2 = \left| \tfrac{1+i}{\sqrt{2}} \right|^2 = \cdots

where i is the imaginary unit, with i^2 = -1.

We will now show that the electron ‘wants’ to be spread out. Electrons don’t really have desires, so this is physics slang for saying that the electron will have less energy if its probability of being near the first proton is equal to its probability of being near the second proton: namely, 50%.

In quantum mechanics, a Hamiltonian is a matrix that describes the relation between the energy and evolution (i.e. how the state changes in time). The expected value of the energy of any state | \psi \rangle is

E = \langle \psi | H | \psi \rangle

Here the row vector \langle \psi | is the column vector | \psi\rangle after transposition and complex conjugation (i.e. changing i to -i), and

\langle \psi | H | \psi \rangle

means we are doing matrix multiplication on \langle \psi |, H and | \psi \rangle to get a number.

For the electron in the \mathrm{H}_2^+ molecule the Hamiltonian can be written as the following 2 \times 2 matrix with real, positive entries:

H =      \begin{bmatrix}          E_0 & \Delta \\          \Delta & E_0      \end{bmatrix},

where E_0 is the energy of the electron being either in state |1\rangle or state |2\rangle, and \Delta is the ‘tunneling amplitude’, which describes how easy it is for the electron to move from neighborhood of one proton to that of the other.

The expected value—physicists call it the ‘expectation value’—of the energy of a given state |\psi\rangle is:

E = \langle \psi | H | \psi \rangle \equiv      \begin{bmatrix}          \alpha^* & \beta^*      \end{bmatrix}      \begin{bmatrix}          E_0 & \Delta \\          \Delta & E_0      \end{bmatrix}      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}.

The star symbol denotes the complex conjugation. If you are unfamiliar with complex numbers, just work with real numbers on which this operation does nothing.

Exercise 1. Find \alpha and \beta with

|\alpha|^2 + |\beta|^2 = 1

that minimize or maximize the expectation value of energy \langle \psi | H | \psi \rangle for

|\psi\rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

Exercise 2. What’s the expectation value value of the energy for the states | 1 \rangle and | 2 \rangle?

Or if you are lazy, just read the answer! It is straightforward to check that

E = (\alpha^* \alpha + \beta^* \beta) E_0 + (\alpha^* \beta + \beta^* \alpha) \Delta

The coefficient of E_0 is 1, so the minimal energy is E_0 - \Delta and the maximal energy is E_0 + \Delta. The states achieving these energies are spread out:

| \psi_- \rangle =      \begin{bmatrix}          1/\sqrt{2} \\ -1/\sqrt{2}      \end{bmatrix},      \quad \text{with} \quad      \quad E = E_0 - \Delta


| \psi_+ \rangle =      \begin{bmatrix}          1/\sqrt{2} \\ 1/\sqrt{2}      \end{bmatrix},      \quad \text{with} \quad      \quad E = E_0 + \Delta

The energies of these states are below and above the energy E_0, and \Delta says how much.

So, the electron is ‘happier’ (electrons don’t have moods either) to be in the state |\psi_-\rangle than to be localized near only one of the protons. In other words—and this is Chemistry 101—atoms like to share electrons and it bonds them. Also, they like to share electrons in a particular and symmetric way.

For reference, |\psi_+ \rangle is called ‘antibonding state’. If the electron is in this state, the atoms will get repelled from each other—and so much for the molecule!

How to classically add quantum things

How can we tell a difference between an electron being in a superposition between two states, and just not knowing its ‘real’ position? Well, first we need to devise a way to describe probabilistic mixtures.

It looks simple—if we have an electron in the state |1\rangle or |2\rangle with probabilities 1/2, we may be tempted to write

|\psi\rangle = \tfrac{1}{\sqrt{2}} |1\rangle + \tfrac{1}{\sqrt{2}} |2\rangle

We’re getting the right probabilities, so it looks legit. But there is something strange about the energy. We have obtained the state |\psi_+\rangle with energy E_0+\Delta by mixing two states with the energy E_0!

Moreover, we could have used different amplitudes such that |\alpha|^2=|\beta|^2=1/2 and gotten different energies. So, we need to devise a way to avoid guessing amplitudes. All in all, we used quotation marks for ‘square roots’ for a reason!

It turns out that to describe statistical mixtures we can use density matrices.

The states we’d been looking at before are described by vectors like this:

| \psi \rangle =      \begin{bmatrix}          \alpha \\ \beta      \end{bmatrix}

These are called ‘pure states’. For a pure state, here is how we create a density matrix:

\rho = | \psi \rangle \langle \psi |      \equiv      \begin{bmatrix}          \alpha \alpha^* & \alpha \beta^*\\          \beta \alpha^* & \beta \beta^*       \end{bmatrix}

On the diagonal we get probabilities (|\alpha|^2 and |\beta|^2), whereas the off-diagonal terms (\alpha \beta^* and its complex conjugate) are related to the presence of quantum effects. For example, for |\psi_-\rangle we get

\rho =      \begin{bmatrix}          1/2 & -1/2\\          -1/2 & 1/2       \end{bmatrix}

For an electron in the state |1\rangle we get

\rho =      \begin{bmatrix}          1 & 0\\          0 & 0       \end{bmatrix}.

To calculate the energy, the recipe is the following:

E = \mathrm{tr}[H \rho]

where \mathrm{tr} is the ‘trace‘: the sum of the diagonal entries. For a n \times n square matrix with entries A_{ij} its trace is

\mathrm{tr}(A) = A_{11} + A_{22} + \ldots + A_{nn}

Exercise 3. Show that this formula for energy, and the previous one, give the same result on pure states.

I advertised that density matrices allow us to mix quantum states. How do they do that? Very simple: just by adding density matrices, multiplied by the respective probabilities:

\rho = p_1 \rho_1 + p_2 \rho_2 + \cdots + p_n \rho_n

It is exactly how we would mix probability vectors. Indeed, the diagonals are probability vectors!

So, let’s say that our co-worker was drunk and we are not sure if (s)he said that the state is |\psi_-\rangle or |1\rangle. However, we think that the probabilities are 1/3 and 2/3. We get the density matrix:

\rho =      \begin{bmatrix}          5/6 & -1/6\\          -1/6 & 1/6       \end{bmatrix}

So, how about its energy?

Exercise 4. Show that calculating energy using density matrix gives the same result as averaging energy over component pure states.

I may have given the impression that density matrix is an artificial thing, at best—a practical trick, and what we ‘really’ have are pure states (vectors), each with a given probability. If so, the next exercise is for you:

Exercise 5. Show that a 50%-50% mixture of |1\rangle and |2\rangle is the same as a 50%-50% mixture of |\psi_+\rangle and |\psi_-\rangle.

This is different than statistical mechanics, or statistics, where we can always think about probability distributions as uniquely defined statistical mixtures of possible states. Here, as we see, it can be a bit more tricky.

As we said, for the diagonals things work as for classical probabilities. But there is more—at the same time as adding probabilities we also add the off-diagonal terms, which can add up to cancel, depending on their signs. It’s why it’s mixing quantum states may make them losing their quantum properties.

The value of the off-diagonal term is related to so-called ‘coherence’ between the states |1\rangle and |2\rangle. Its value is bounded by the respective probabilities:

\left| \rho_{12} \right| \leq \sqrt{\rho_{11}\rho_{22}} = \sqrt{p_1 p_2}

where for pure states we get equality.

If the value is zero, there are no quantum effects between two positions: this means that the electron is sure to be at one place or the other, though we might be uncertain at which place. This is fundamentally different from a superposition (non-zero \rho_{12}), where we are uncertain at which site a particle is, but it can no longer be thought to be at one site or the other: it must be in some way associated with both simultaneously.

Exercise 6. For each c \in [-1,1] propose how to obtain a mixed state described by density matrix

\rho =       \begin{bmatrix}          1/2 & c/2\\          c/2 & 1/2       \end{bmatrix}

by mixing pure states of your choice.

A spatial wavefunction

A similar thing works for position. Instead of a two-level system let’s take a particle in one dimension. The analogue of a state vector is a wavefunction, a complex-valued function on a line:


In this continuous variant, p(x) = |\psi(x)|^2 is the probability density of finding particle in one place.

We construct the density matrix (or rather: ‘density operator’) in an way that is analogous to what we did for the two-level system:

\rho(x, x') = \psi(x) \psi^*(x')

Instead of a 2×2 matrix matrix, it is a complex function of two real variables. The probability density can be described by its diagonal values, i.e.

p(x) = \rho(x,x)

Again, we may wonder if the particle energetically favors being in many places at once. Well, it does.

Density matrices for a classical and quantum state. They yield the same probability distributions (for positions). However, their off-diagonal values (i.e. $x\neq x’$) are different. The classical state is just a probabilistic mixture of a particle being in a particular place.

What would happen if we had a mixture of perfectly localized particles? Due to Heisenberg’s uncertainly principle we have

\Delta x \Delta p \geq \frac{\hbar}{2},

that is, that the product of standard deviations of position and momentum is at least some value.

If we exactly know the position, then the uncertainty of momentum goes to infinity. (The same thing holds if we don’t know position, but it can be known, even in principle. Quantum mechanics couldn’t care less if the particle’s position is known by us, by our friend, by our detector or by a dust particle.)

The Hamiltonian represents energy, the energy of a free particle in continuous system is


where m is its mass, and p is its momentum: that is, mass times velocity. So, if the particle is completely localized:

• its energy is infinite,
• its velocity are infinite, so in no time its wavefunction will spread everywhere.

Infinite energies sometimes happen if physics. But if we get infinite velocities we see that there is something wrong. So a particle needs to be spread out, or ‘delocalized’, to some degree, to have finite energy.

As a side note, to consider high energies we would need to employ special relativity. In fact, one cannot localize a massive particle that much, as it will create a soup of particles and antiparticles, once its energy related to momentum uncertainty is as much as the energy related to its mass; see the Darwin term in the fine structure.

Moreover, depending on the degree of its delocalization its behavior is different. For example, a statistical mixture of highly localized particles would spread a lot faster than the same $p(x)$ but derived from a single wavefunction. The density matrix of the former would be in between of that of pure state (a ‘circular’ Gaussian function) and the classical state (a ‘linear’ Gaussian). That is, it would be an ‘oval’ Gaussian, with off-diagonal values being smaller than for the pure state.

Let us look at two Gaussian wavefunctions, with varying level of coherent superposition between them. That is, each Gaussian is already a superposition, but when we combine two we let ourselves use a superposition, or a mixture, or something in between. For a perfect superposition of Gaussian, we would have the density matrix

\rho(x,x') = \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) + \phi(x-\tfrac{d}{2}) \right) \left( \phi(x'+\tfrac{d}{2}) + \phi(x'-\tfrac{d}{2}) \right)

where \phi(x) is a normalized Gaussian function. For a statistical mixture between these Gaussians split by a distance of d, we would have:

\rho(x,x') = \frac{1}{2} \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2})  +  \frac{1}{2} \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})

And in general,

\begin{array}{ccl}  \rho(x,x') &=& \frac{1}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'+\tfrac{d}{2})  +  \phi(x-\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})\right) + \\ \\  && \frac{c}{2} \left( \phi(x+\tfrac{d}{2}) \phi(x'-\tfrac{d}{2})  + \phi(x-\tfrac{d}{2}) \phi(x'+\tfrac{d}{2}) \right)  \end{array}

for some |c| \leq 1.

PICTURE: Two Gaussian wavefunctions (centered at -2 and +2) in a coherent superposition with each other (the first and the last plot) and a statistical mixture (the middle plot); the 2nd and 4th plot show intermediary states. Superposition can be with different phase, much like the hydrogen example. Color represents absolute value and hue phase; here red is for positive numbers and teal is for negative.

(Click to enlarge.)


We have seen learnt the difference between the quantum superposition and the statistical mixture of states. In particular, while both of these descriptions may give the same probabilities, their predictions on the physical properties of states differ. For example, we need an electron to be delocalized in a specific way to describe chemical bonds; and we need delocalization of any particle to predict its movement.

We used density matrices to express both quantum superposition and (classical) lack of knowledge on the same ground. We have identified its off-diagonal terms as ones related to the quantum coherence.

But what if there were not only two states, but many? So, instead of \mathrm{H}_2^+ (we were not even considering the full hydrogen atom, but only its ionized version), how about electric excitation on something bigger? Not even \mathrm{C}_2\mathrm{H}_5\mathrm{OH} or some sugar, but a protein complex!

So, this will be your homework (cf. this homework on topology). Just joking, there will be another blog post.

Melting Permafrost (Part 4)

4 March, 2015


Russian scientists have recently found more new craters in Siberia, apparently formed by explosions of methane. Three were found last summer. They looked for more using satellite photos… and found more!

“What I think is happening here is, the permafrost has been acting as a cap or seal on the ground, through which gas can’t permeate,” says Paul Overduin, a permafrost expert at the Alfred Wegener Institute in Germany. “And it reaches a particular temperature where there’s not enough ice in it to act that way anymore. And then gas can rush out.”

It’s rather dramatic. Some Russian villagers have even claimed to see flashes in the distance when these explosions occur. But how bad is it?

The Siberian Times

An English-language newspaper called The Siberian Times has a good article about these craters, which I’ll quote extensively:

• Anna Liesowska Dozens of new craters suspected in northern Russia, The Siberian Times, 23 February 2015.

B1 – famous Yamal hole in 30 kilometers from Bovanenkovo, spotted in 2014 by helicopter pilots. Picture: Marya Zulinova, Yamal regional government press service.

Respected Moscow scientist Professor Vasily Bogoyavlensky has called for ‘urgent’ investigation of the new phenomenon amid safety fears.

Until now, only three large craters were known about in northern Russia with several scientific sources speculating last year that heating from above the surface due to unusually warm climatic conditions, and from below, due to geological fault lines, led to a huge release of gas hydrates, so causing the formation of these craters in Arctic regions.

Two of the newly-discovered large craters—also known as funnels to scientists—have turned into lakes, revealed Professor Bogoyavlensky, deputy director of the Moscow-based Oil and Gas Research Institute, part of the Russian Academy of Sciences.

Examination using satellite images has helped Russian experts understand that the craters are more widespread than was first realised, with one large hole surrounded by as many as 20 mini-craters, The Siberian Times can reveal.

Four Arctic craters: B1 – famous Yamal hole in 30 kilometers from Bovanenkovo, B2 – recently detected crater in 10 kilometers to the south from Bovanenkovo, B3 – crater located in 90 kilometers from Antipayuta village, B4 – crater located near Nosok village, on the north of Krasnoyarsk region, near Taimyr Peninsula. Picture: Vasily Bogoyavlensky.

‘We know now of seven craters in the Arctic area,’ he said. ‘Five are directly on the Yamal peninsula, one in Yamal Autonomous district, and one is on the north of the Krasnoyarsk region, near the Taimyr peninsula.

‘We have exact locations for only four of them. The other three were spotted by reindeer herders. But I am sure that there are more craters on Yamal, we just need to search for them.

‘I would compare this with mushrooms: when you find one mushroom, be sure there are few more around. I suppose there could be 20 to 30 craters more.’

He is anxious to investigate the craters further because of serious concerns for safety in these regions.

The study of satellite images showed that near the famous hole, located in 30 kilometres from Bovanenkovo are two potentially dangerous objects, where the gas emission can occur at any moment.

Satellite image of the site before the forming of the Yamal hole (B1). K1 and the red outline show the hillock (pingo) formed before the gas emission. Yellow outlines show the potentially dangerous objects. Picture: Vasily Bogoyavlensky.

He warned: ‘These objects need to be studied, but it is rather dangerous for the researchers. We know that there can occur a series of gas emissions over an extended period of time, but we do not know exactly when they might happen.

‘For example, you all remember the magnificent shots of the Yamal crater in winter, made during the latest expedition in Novomber 2014. But do you know that Vladimir Pushkarev, director of the Russian Centre of Arctic Exploration, was the first man in the world who went down the crater of gas emission?

‘More than this, it was very risky, because no one could guarantee there would not be new emissions.’

Professor Bogoyavlensky told The Siberian Times: ‘One of the most interesting objects here is the crater that we mark as B2, located 10 kilometres to the south of Bovanenkovo. On the satellite image you can see that it is one big lake surrounded by more than 20 small craters filled with water.

‘Studying the satellite images we found out that initially there were no craters nor a lake. Some craters appeared, then more. Then, I suppose that the craters filled with water and turned to several lakes, then merged into one large lake, 50 by 100 metres in diameter.

‘This big lake is surrounded by the network of more than 20 ‘baby’ craters now filled with water and I suppose that new ones could appear last summer or even now. We now counting them and making a catalogue. Some of them are very small, no more than 2 metres in diameter.’

‘We have not been at the spot yet,’ he said. ‘Probably some local reindeer herders were there, but so far no scientists.’

He explained: ‘After studying this object I am pretty sure that there was a series of gas emissions over an extended period of time. Sadly, we do not know, when exactly these emissions occur, i.e. mostly in summer, or in winter too. We see only the results of this emissions.’

The object B2 is now attracting special attention from the researchers as they seek to understand and explain the phenomenon. This is only 10km from Bovanenkovo, a major gas field, developed by Gazprom, in the Yamalo-Nenets Autonomous Okrug. Yet older satellite images do not show the existence of a lake, nor any craters, in this location.

Not only the new craters constantly forming on Yamal show that the process of gas emission is ongoing actively.

Professor Bogoyavlensky shows the picture of one of the Yamal lakes, taken by him from the helicopter and points on the whitish haze on its surface.

Yamal lake with traces of gas emissions. Picture: Vasily Bogoyavlensky.

He commented: ‘This haze that you see on the surface shows that gas seeps that go from the bottom of the lake to the surface. We call this process ‘degassing’.

‘We do not know, if there was a crater previously and then turned to lake, or the lake formed during some other process. More important is that the gases from within are actively seeping through this lake.

‘Degassing was revealed on the territory of Yamal Autonomous District about 45 years ago, but now we think that it can give us some clues about the formation of the craters and gas emissions. Anyway, we must research this phenomenon urgently, to prevent possible disasters.’

Professor Bogoyavlensky stressed: ‘For now, we can speak only about the results of our work in the laboratory, using the images from space.

‘No one knows what is happening in these craters at the moment. We plan a new expedition. Also we want to put not less than four seismic stations in Yamal district, so they can fix small earthquakes, that occur when the crater appears.

‘In two cases locals told us that they felt earth tremors. The nearest seismic station was yet too far to register these tremors.

‘I think that at the moment we know enough about the crater B1. There were several expeditions, we took probes and made measurements. I believe that we need to visit the other craters, namely B2, B3 and B4, and then visit the rest three craters, when we will know their exact location. It will give us more information and will bring us closer to understanding the phenomenon.’

He urged: ‘It is important not to scare people, but to understand that it is a very serious problem and we must research this.’

In an article for Drilling and Oil magazine, Professor Bogoyavlensky said the parapet of these craters suggests an underground explosion.

‘The absence of charred rock and traces of significant erosion due to possible water leaks speaks in favour of mighty eruption (pneumatic exhaust) of gas from a shallow underground reservoir, which left no traces on soil which contained a high percentage of ice,’ he wrote.

‘In other words, it was a gas-explosive mechanism that worked there. A concentration of 5-to-16% of methane is explosive. The most explosive concentration is 9.5%.’

Gas probably concentrated underground in a cavity ‘which formed due to the gradual melting of buried ice’. Then ‘gas was replacing ice and water’.

‘Years of experience has shown that gas emissions can cause serious damage to drilling rigs, oil and gas fields and offshore pipelines,’ he said. ‘Yamal craters are inherently similar to pockmarks.

‘We cannot rule out new gas emissions in the Arctic and in some cases they can ignite.’

This was possible in the case of the crater found at Antipayuta, on the Yamal peninsula.

‘The Antipayuta residents told how they saw some flash. Probably the gas ignited when appeared the crater B4, near Taimyr peninsula. This shows us, that such explosion could be rather dangerous and destructive.

‘We need to answer now the basic questions: what areas and under what conditions are the most dangerous? These questions are important for safe operation of the northern cities and infrastructure of oil and gas complexes.’

Crater B3 located in 90 kilometres from Antipayuta village, Yamal district. Picture: local residents.

Crater B4 located near Nosok village, on the north of Krasnoyarsk region, near Taimyr Peninsula. Picture: local residents.

How bad is it?

Since methane is a powerful greenhouse gas, some people are getting nervous. If global warming releases the huge amounts of methane trapped under permafrost, will that create more global warming? Could we be getting into a runaway feedback loop?

The Washington Post has a good article telling us to pay attention, but not panic:

• Chris Mooney, Why you shouldn’t freak out about those mysterious Siberian craters, Chris Mooney, 2 March 2015.

David Archer of the University of Chicago, a famous expert on climate change and the carbon cycle, took a look at thes craters and did some quick calculations. He estimated that “it would take about 20,000,000 such eruptions within a few years to generate the standard Arctic Methane Apocalypse that people have been talking about.”

More importantly, people are measuring the amount of methane in the air. We know how it’s doing. For example, you can make graphs of methane concentration here:

• Earth System Research Laboratory, Global Monitoring Division, Data visualization.

Click on a northern station like Alert, the scary name of a military base and research station in Nunavut—the huge northern province in Canada:

(Alert is on the very top, near Greenland.)

Choose Carbon cycle gases from the menu at right, and click on Time series. You’ll go to another page, and then choose Methane—the default choice is carbon dioxide. Go to the bottom of the page and click Submit and you’ll get a graph like this:

Methane has gone up from about 1750 to 1900 nanomoles per mole from 1985 to 2015. That’s a big increase—but not a sign of incipient disaster.

A larger perspective might help. Apparently from 1750 to 2007 the atmospheric CO2 concentration increased about 40% while the methane concentration has increased about 160%. The amount of additional radiative forcing due to CO2 is about 1.6 watts per square meter, while for methane it’s about 0.5:

Greenhouse gas: natural and anthropogenic sources, Wikipedia.

So, methane is significant, and increasing fast. So far CO2 is the 800-pound gorilla in the living room. But I’m glad Russian scientists are doing things like this:

The latest expedition to Yamal crater was initiated by the Russian Center of Arctic Exploration in early November 2014. The researchers were first in the world to enter this crater. Pictures: Vladimir Pushkarev/Russian Center of Arctic Exploration

Previous posts

For previous posts in this series, see:

Melting Permafrost (Part 1).

Melting Permafrost (Part 2).

Melting Permafrost (Part 3).


Get every new post delivered to your Inbox.

Join 2,973 other followers