## Network Theory I

2 March, 2014

Here’s a video of a talk I gave last Tuesday—part of a series. You can see the slides here:

One reason I’m glad I gave this talk is because afterwards Jamie Vicary pointed out some very interesting consequences of the relations among signal-flow diagrams listed in my talk. It turns out they imply equations familiar from the theory of complementarity in categorical quantum mechanics!

This is the kind of mathematical surprise that makes life worthwhile for me. It seemed utterly shocking at first, but I think I’ve figured out why it happens. Now is not the time to explain… but I’ll have to do it soon, both here and in the paper that Jason Eberle are writing about control theory.

• Brendan Fong, A compositional approach to control theory.

## Markov Models of Social Change (Part 1)

24 February, 2014

guest post by Alastair Jamieson-Lane

The world is complex, and making choices in a complex world is sometimes difficult.

As any leader knows, decisions must often be made with incomplete information. To make matters worse, the experts and scientists who are meant to advise on these important matters are also doing so with incomplete information—usually limited to only one or two specialist fields. When decisions need to be made that are dependent on multiple real-world systems, and your various advisors find it difficult to communicate, this can be problematic!

The generally accepted approach is to listen to whichever advisor tells you the things you want to hear.

When such an approach fails (for whatever mysterious and inexplicable reason) it might be prudent to consider such approaches as Bayesian inference, analysis of competing hypotheses or cross-impact balance analysis.

Because these methods require experts to formalize their opinions in an explicit, discipline neutral manner, we avoid many of the problems mentioned above. Also, if everything goes horribly wrong, you can blame the algorithm, and send the rioting public down to the local university to complain there.

In this blog article I will describe cross-impact balance analysis and a recent extension to this method, explaining its use, as well as some basic mathematical underpinnings. No familiarity with cross-impact balance analysis will be required.

### Wait—who is this guy?

Since this is my first time writing a blog post here, I hear introductions are in order.

Hi. I’m Alastair.

I am currently a Master’s student at the University of British Columbia, studying mathematics. In particular, I’m aiming to use evolutionary game theory to study academic publishing and hiring practices… and from there hopefully move on to studying governments (we’ll see how the PhD goes). I figure that both those systems seem important to solving the problems we’ve built for ourselves, and both may be under increasing pressure in coming years.

But that’s not what I’m here for today! Today I’m here to tell the story of cross-impact balance analysis, a tool I was introduced to at the complex systems summer school in Santa Fe.

### The story

Suppose (for example) that the local oracle has foretold that burning the forests will anger the nature gods

… and that if you do not put restrictions in place, your crops will wither and die.

Well, that doesn’t sound very good.

The merchant’s guild claims that such restrictions will cause all trade to grind to a halt.

Your most trusted generals point out that weakened trade will leave you vulnerable to invasion from all neighboring kingdoms.

The sailors guild adds that the wrath of Poseidon might make nautical trade more difficult.

The alchemists propose alternative sources of heat…

… while the druids propose special crops as a way of resisting the wrath of the gods…

… and so on.

Given this complex web of interaction, it might be a good time to consult the philosophers.

### Overview of CIB

This brings us to the question of what CIB (Cross-Impact Balance) analysis is, and how to use it.

At its heart, CIB analysis demands this: first, you must consider what aspects of the world you are interested in studying. This could be environmental or economic status, military expenditure, or the laws governing genetic modification. These we refer to as “descriptors”. For each “descriptor” we must create a list of possible “states”.

For example, if the descriptor we are interested in were “global temperature change” our states might be “+5 degree”, “+4 degrees” and so on down to “-2 degrees”.

The states of a descriptor are not meant to be all-encompassing, or offer complete detail, and they need not be numerical. For example, the descriptor “Agricultural policy” might have such states as “Permaculture subsidy”, “Genetic engineering”, “Intensive farming” or “No policy”.

For each of these states, we ask our panel of experts whether such a state would increase or decrease the tendency for some other descriptor to be in a particular state.

For example, we might ask: “On a scale from -3 to 3, how much does the agricultural policy of Intensive farming increase the probability that we will see global temperature increases of +2 degrees?”

By combining the opinions of a variety of experts in each field, and weighting based on certainty and expertise, we are able to construct matrices, much like the one below:

The above matrix is a description of my ant farm. The health of my colony is determined by the population, income, and education levels of my ants. For a less ant focused version of the above, please refer to:

• Elisabeth A. Lloyd and Vanessa J. Schweizer, Objectivity and a comparison of methodological scenario approaches for climate change research, Synthese (2013).

For any possible combination of descriptor states (referred to as a scenario) we can calculate the total impact on all possible descriptors. In the current scenario we have low population, high income and medium education (see highlighted rows).

Because the current scenario has high ant income, this strongly influences us to have low population (+3) and prevents a jump to high population (-3). This combined with the non-influence from education (zeros) leads to low population being the most favoured state for our population descriptor. Thus we expect no change. We say this is “consistent”.

Education however sees a different story. Here we have a strong influence towards high education levels (summing the column gives a total of 13). Thus our current state (medium education) is inconsistent, and we would expect the abundance of ant wealth to lead to an improvements in the ant schooling system.

Classical CIB analysis acts as a way to classify which hypothetical situations are consistent, and which are not.

Now, it is all well and good to claim that some scenarios are stable, but the real use of such a tool is in predicting (and influencing) the future.

By applying a deterministic rule that determines how inconsistencies are resolved, we can produce a “succession rule”. The most straight-forward example is to replace all descriptor states with whichever state is most favoured by the current scenario. In the example above we would switch to “Low population, medium income, high education”. A generation later we would switch back to “Low population, High income, medium education”, soon finding ourselves trapped in a loop.

All such rules will always lead to either a loop or a “sink”: a self consistent scenario which is succeeded only by itself.

So, how can we use this? How will this help us deal with the wrath of the gods (or ant farms)?

Firstly: we can identify loops and consistent scenarios which we believe are most favourable. It’s all well and good imagining some future utopia, but if it is inconsistent with itself, and will immediately lead to a slide into less favourable scenarios then we should not aim for it, we should find that most favourable realistic scenario and aim for that one.

Secondly: We can examine all our consistent scenarios, and determine whose “basin of attraction” we find ourselves in: that is, which scenario are we likely to end up in.

Thirdly: Suppose we could change our influence matrix slightly? How would we change it to favour scenarios we most prefer? If you don’t like the rules, change the game—or at the very least find out WHAT we would need to change to have the best effect.

### Concerns and caveats

So… what are the problems we might encounter? What are the drawbacks?

Well, first of all, we note that the real world does not tend to reach any form of eternal static scenario or perfect cycle. The fact that our model does might be regarded as reason for suspicion.

Secondly, although the classical method contains succession analysis, this analysis is not necessarily intended as a completely literal “prediction” of events. It gives a rough idea of the basins of attraction of our cycles and consistent scenarios, but is also somewhat arbitrary. What succession rule is most appropriate? Do all descriptors update simultaneously? Or only the one with the most “pressure”? Are our descriptors given in order of malleability, and only the fastest changing descriptor will change?

Thirdly, in collapsing our description of the world down into a finite number of states we are ignoring many tiny details. Most of these details are not important, but in assuming that our succession rules are deterministic, we imply that these details have no impact whatsoever.

If we instead treat succession as a somewhat random process, the first two of these problems can be solved, and the third somewhat reduced.

### Stochastic succession

In the classical CIB succession analysis, some rule is selected which deterministically decides which scenario follows from the present. Stochastic succession analysis instead tells us the probability that a given scenario will lead to another.

The simplest example of a stochastic succession rule is to simply select a single descriptor at random each time step, and only consider updates that might happen to that descriptor. This we refer to as dice succession. This (in some ways) represents hidden information: two systems that might look identical on the surface from the point of view of our very blockish CIB analysis might be different enough underneath to lead to different outcomes. If we have a shaky agricultural system, but a large amount of up-and-coming research, then which of these two factors becomes important first is down to the luck of the draw. Rather than attempt to model this fine detail, we instead merely accept it and incorporate this uncertainty into our model.

Even this most simplistic change leads to dramatics effects on our system. Most importantly, almost all cycles vanish from our results, as forks in the road allow us to diverge from the path of the cycle.

We can take stochastic succession further and consider more exotic rules for our transitions, ones that allow any transition to take place, not merely those that are most favored. For example:

$P(x,y) = A e^{I_x(y)/T}$

Here $x$ is our current scenario, $y$ is some possible future scenario, and $I_x(y)$ is the total impact score of $y$ from the perspective of $x$. $A$ is a simple normalizing constant, and $T$ is our system’s temperature. High temperature systems are dominated by random noise, while low temperature systems are dominated by the influences described by our experts.

Impact score is calculated by summing the impact of each state of our current scenario, on each state of our target scenario. For example, for the above, suppose we want to find $I_x(y)$ when $x$ is the given scenario “Low population, High income, medium education” and $y$ was the scenario “Medium population, medium income, High education”. We consider all numbers that are in rows which were states of $x$ and in columns that are states of $y$. This would give:

$I_x(y)= (0+0+0) + (-2 +0 +10) +(6+7+0) = 21$

Here each bracket refers to the sum of a particular column.
More generically we can write the formula as:

$\displaystyle{ I_x(y)= \sum_{i \subset x, \;j \subset y} M_{i,j} }$

Here $M_{i,j}$ refers to an entry in our cross-impact balance matrix, $i$ and $j$ are both states, and $i \subset x$ reads as “$i$ is a state of $x$”.

We refer to this function for computing transition probabilities as the Boltzmann succession law, due to its similarity to the Boltzmann distribution found in physics. We use it merely as an example, and by no means wish to imply that we expect the transitions for our true system to act in a precisely Boltzmann-like manner. Alternative functions can, and should, be experimented with. The Boltzmann succession law is however an effective example and has a number of nice properties: $P(x,y)$ is always positive, unchanged by adding a constant to every element of the cross-impact balance matrix, contains adjustable parameters, and unbounded above.

The Boltzmann succession rule is what I will refer to as fully stochastic: it allows transitions even against our experts’ judgement (with low probability). This is in contrast to dice succession which picks a direction at random, but still contains scenarios from which our system can not escape.

### Effects of stochastic succession

‘Partially stochastic’ processes such as the dice rule have very limited effect on the long term behavior of the model. Aside from removing most cycles, they behave almost exactly like our deterministic succession rules. So, let us instead discuss the more interesting fully stochastic succession rules.

In the fully stochastic system we can ask “after a very long time, what is the probability we will be in scenario $x$?”

By asking this question we can get some idea of the relative importance of all our future scenarios and states.

For example, if the scenario “high population, low education, low income” has a 40% probability in the long term, while most other scenarios have a probability of 0.2%, we can see that this scenario is crucial to the understanding of our system. Often scenarios already identified by deterministic succession analysis are the ones with the greatest long term probability—but by looking at long term probability we also gain information about the relative importance of each scenario.

In addition, we can encounter scenarios which are themselves inconsistent, but form cycles and/or clusters of interconnected scenarios. We can also notice scenarios that while technically ‘consistent’ in the deterministic rules are only barely so, and have limited weight due to a limited basin of attraction. We might identify scenarios that seem familiar in the real world, but are apparently highly unlikely in our analysis, indicating either that we should expect change… or perhaps suggesting a missing descriptor or a cross-impact in need of tweaking.

Armed with such a model, we can investigate what we can do to increase the short term and long term likelihood of desirable scenarios, and decrease the likelihood of undesirable scenarios.

As a last note, here are a few freely available resources that may prove useful. For a more formal introduction to CIB, try:

• Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

• Wolfgang Weimer-Jehle, Properties of cross-impact balance analysis.

You can find free software for doing a classical CIB analysis here:

• ZIRIUS, ScenarioWizard.

ZIRIUS is the Research Center for Interdisciplinary Risk and Innovation Studies of the University of Stuttgart.

Here are some examples of CIB in action:

• Gerhard Fuchs, Ulrich Fahl, Andreas Pyka, Udo Staber, Stefan Voegele and Wolfgang Weimer-Jehle, Generating innovation scenarios using the cross-impact methodology, Department of Economics, University of Bremen, Discussion-Papers Series No. 007-2008.

• Ortwin Renn, Alexander Jager, Jurgen Deuschle and Wolfgang Weimer-Jehle, A normative-functional concept of sustainability and its indicators, International Journal of Global Environmental Issues, 9 (2008), 291–317.

Finally, this page contains a more complete list of articles, both practical and theoretical:

## Network Theory Overview

22 February, 2014

Here’s a video of a talk I gave yesterday, made by Brendan Fong. You can see the slides here—and then click the items in blue, and the pictures, for more information!

The idea: nature and the world of human technology are full of networks! People like to draw diagrams of networks. Mathematical physicists know that in principle these diagrams can be understood using category theory. But why should physicists have all the fun? This is the century of understanding living systems and adapting to life on a finite planet. Math isn’t the main thing we need for this, but it’s got to be part of the solution… so one thing we should do is develop a unified and powerful theory of networks.

We are still far from doing this. In this overview, I briefly described three parts of the jigsaw puzzle, and invited everyone to help fit them together:

• electrical circuits and signal-flow graphs.

• stochastic Petri nets, chemical reaction networks and Feynman diagrams.

• Bayesian networks, information and entropy.

In my talks coming up, I’ll go into more detail on each of these.﻿ With luck, you’ll be able to see videos here.

But if you’re near Oxford, you might as well actually attend! You can see dates, times, locations, my slides, and the talks themselves as they show up by going here.

## Relative Entropy (Part 4)

16 February, 2014

In recent posts by Manoj Gopalkrishnan and Marc Harper we’ve seen how not just entropy but relative entropy—the entropy of a probability distribution relative to the equilibrium distribution—is a driving force in chemistry and evolution. Now Tobias Fritz and I have finally finished our paper on this subject:

Very roughly, we proved that any reasonable measure of the information you gain when you to update your assumptions about the world based on discovering what a system is really doing must be some constant $c$ times the relative entropy.

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, $\mathrm{FinStat},$ and how relative entropy defines a functor on this category.

Relative Entropy (Part 3): statement of our main theorem, which characterizes relative entropy up to a constant multiple as the only functor $F : \mathrm{FinStat} \to [0,\infty)$ with a few nice properties.

Now that the paper is done, we’re having a nice conversation about it on the n-Category Café. Since I don’t want to splinter the conversation, I won’t enable comments here—please go there and join the fun!

One thing is that our conversation is getting more deeply into the category-theoretic aspects. Read the long parenthetical remarks in my post on the n-Café to get up to speed on that aspect.

Another is that by looking at our paper, you can actually see the proof of our result. As I mention on the n-Café.

The proof is surprisingly hard. Or maybe we’re just surprisingly bad at proving things. But the interesting thing is this: the proof is swift and effective in the ‘generic’ case—the case where the support of the probability measures involved is the whole set they’re living on, and the constant $c$ is finite.

It takes some more work to handle the case where the probability measures have smaller support.

But the really hard work starts when we handle the case that, in the end, has $c = \infty.$ Then the proof becomes more like analysis than what you normally expect in category theory. We slowly corner the result, blocking off all avenues of escape. Then we close in, grab its neck, and strangle it, crushing its larynx ever tighter, as it loses the will to fight back and finally expires… still twitching.

We haven’t gotten into discussing this much yet, perhaps because the mathematicians on the n-Café are too dainty and civilized. But someone into analysis might be able to find a more efficient proof.

That would make me a bit sad—since why didn’t we find it?—but mainly happy—since this subject deserves to be clean and elegant. We really need a category-theoretic formulation of the second law of thermodynamics that’s suitable for studying complex networks: that’s the long-term goal here.

## Triangular Numbers

12 February, 2014

This post is just for fun. I’ll start with Pascal’s triangle and show you the number e hiding inside it. Using this, we’ll see how the product of all numbers in the nth row of Pascal’s triangle is related to the nth triangular number.

That’s cute, because Pascal’s triangle looks so… triangular!

But then, with a massive amount of help from Greg Egan, we’ll dig a lot deeper, and meet strange things like superfactorials, the magic properties of the number 1/12, and the Glaisher–Kinkelin constant.

First let’s get warmed up.

### Warmup

Pascal’s triangle is a famous and much-studied thing. It was discovered long before Pascal. It looks like this:

We write 1′s down the edges. We get every other number in the triangle by adding the two numbers above it to its left and right. People call these numbers binomial coefficients, since you can also get them like this:

$(x + y)^5 = x^5 + 5 x^4 y + 10 x^3 y^2 + 10 x^2 y^3 + 5 x y^4 + y^5$

We get the 10 in $10 x^3 y^2$ here because there are 10 ways to take 5 things and choose 3 to call x’s and 2 to call y’s. In general we have

$\displaystyle{ (x + y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k} }$

where the binomial coefficient $\binom{n}{k}$ is the kth number on the nth row of Pascal’s triangle. Here count both rows and the numbers in a row starting from zero.

Since we can permute n things in

$n! = 1 \cdot 2 \cdot 3 \cdot \cdots \cdot n$

ways, there are

$\displaystyle{ \frac{n!}{k! (n-k)!} }$

ways to take n things and choose k of them to be x’s and (n-k) of them to be y’s. You see, permuting the x’s or the y’s doesn’t change our choice, so we have to divide by $k!$ and $(n-k)!.$

So, the kth number in the nth row of Pascal’s triangle is:

$\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }$

All this will be painfully familiar to many of you. But I want everyone to have an fair chance at understanding the next section, where we’ll see something nice about Pascal’s triangle and the number

$e \approx 2.718281828459045...$

### The hidden e in Pascal’s triangle

In 2012, a guy named Harlan Brothers found the number e hiding in Pascal’s triangle… in a very simple way!

If we add up all the numbers in the nth row of Pascal’s triangle we get $2^n$. But what if we take the product of all these numbers? Let’s call it $t_n.$ Then here’s what Brothers discovered:

$\displaystyle{ \lim_{n \to \infty} \frac{t_n t_{n+2}}{t^2_{n+1}} = e }$

This may seem mysterious, but in fact it’s rather simple once you see the trick. I’ll use a nice argument given by Greg Egan.

We’ve said $t_n$ is the product of all numbers in the nth row of Pascal’s triangle. Just for fun, let’s divide this by the product of all numbers in the next row:

$\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }$

And since this is so much fun, let’s do it again! Divide this quantity by its next value:

$\displaystyle{ v_n = \frac{u_n}{u_{n+1}} }$

Now look:

$\displaystyle{v_n = \frac{t_n/t_{n+1}}{t_{n+1}/t_{n+2}} = \frac{t_n t_{n+2}}{t_{n+1}^2} }$

So, this is the thing that should approach e.

But why does it approach e? To see this, first take Pascal’s triangle and divide each number by the number to its lower left. We get a second triangle of numbers. Then take each number in this second triangle and divide it by the number to its lower right! We get a third triangle, like this:

If you think a bit, you’ll see:

• In the first triangle, the product of all numbers in the nth row is $t_n.$

• In the second, the product of all numbers in the nth row is $u_n.$

• In the third, the product of all numbers in the nth row is $v_n.$

And look—there’s a cool pattern! In the third triangle, all the numbers in any given row are equal. In the row with n numbers, all those numbers equal

$\displaystyle{ (n+1)/n = 1 + \frac{1}{n} }$

So, the product of all these numbers is

$\displaystyle{ \left(1 + \frac{1}{n}\right)^n }$

But it’s a famous fact that

$\displaystyle{ \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = e}$

Some people even use this as a definition of e. So,

$\displaystyle{ \lim_{n \to \infty} v_n = e}$

which is just what we wanted!

Puzzle 1. Use the formula I mentioned:

$\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }$

to show all the numbers in the same row of the third triangle are equal.

### Triangular numbers

The number of balls in a triangular stack with n balls at the bottom is called the nth triangular number,

$T_n = 1 + 2 + \cdots + n$

If you chop a square of balls in half along a diagonal you get a triangle, so $T_n$ is approximately half the nth square number:

$\displaystyle{ T_n \approx \frac{n^2}{2} }$

But if you chop the square in half this way, the balls on the diagonal get cut in half, so to get the nth triangle number we need to include their other halves:

$T_n = \displaystyle{ \frac{n^2}{2} + \frac{n}{2} = \frac{n(n+1)}{2} }$

These numbers go like

$1, 3, 6, 10, \dots$

and they are one of the simple pleasures of life, like sunshine and good bread. Spotting them in a knight-move zigzag pattern in the multiplication table was one of my first truly mathematical experiences. You can also see them in Pascal’s triangle:

because

$T_n = \displaystyle{ \binom{n+1}{2} }$

But today it’s time to see how $T_n$ is related to $t_n$, the product of all the numbers in the nth row of Pascal’s triangle!

If we subtract each triangular number from the the one before it we get the numbers $-n$, and if we subtract each of these numbers from the one before it we get the number 1. This should remind you of something we’ve seen:

$\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }$

$\displaystyle{ v_n = \frac{u_n}{u_{n+1}} }$

$\displaystyle{ \lim_{n \to \infty} v_n = e }$

Here we’re dividing instead of subtracting. But if take logarithms, we’ll be subtracting, and we get

$\ln(u_n) = \ln(t_n) - \ln(t_{n+1})$

$\ln(v_n) = \ln(u_n) - \ln(u_{n+1})$

$\displaystyle{ \lim_{n \to \infty} \ln(v_n) = 1 }$

What can we do with this? Well, suppose $\ln(v_n)$ were equal to 1, instead of approaching it. Then $\ln(t_n)$ could be the nth triangular number… and we’d get a cool formula for the product of all numbers in the nth row of Pascal’s triangle!

But since $\ln(v_n)$ is just approximately equal to 1, we should only expect an approximate formula for the product of all numbers in the nth row of Pascal’s triangle:

$\ln(t_n) \approx T_n$

or

$\displaystyle{ t_n \approx e^{T_n} = e^{n(n+1)/2} }$

This was my hope. But how good are these approximations? I left this as a puzzle on Google+, and then Greg Egan stepped in and solved it.

For starters, he graphed the ratio $\ln(t_n)/T_n,$ and got this:

That looks pretty good: it looks like it’s approaching 1. But he also graphed the ratio $t_n/e^{T_n},$ and got this:

Not so good. Taking exponentials amplifies the errors. If we want a good asymptotic formula $t_n,$ we have to work harder. And this is what Egan did.

### Digging deeper

So far we’ve talked a lot about $t_n,$ the product of all numbers in the nth row in Pascal’s triangle… but we haven’t actually computed it. Let’s try:

$\begin{array}{ccl} t_n &=& \displaystyle{ \binom{n}{0} \cdot \binom{n}{1} \cdot \cdots \cdot \binom{n}{n} } \\ \\ &=& \displaystyle{ \frac{n!}{0! \cdot n!} \cdot \frac{n!}{1! \cdot (n-1)!} \cdot \cdots \cdot \frac{n!}{n! \cdot 0!} } \\ \\ &=& \displaystyle{ \frac{(n!)^{n+1}}{\left(0! \cdot 1! \cdot \cdots \cdot n!\right)^2} } \end{array}$

So, we’re seeing the superfactorial

$\displaystyle{ 0! \cdot 1! \cdot \cdots \cdot n! }$

raise its pretty head. This is also called the Barnes G-function, presumably by people who want to make it sound more boring.

Actually that’s not fair: the Barnes G-function generalizes superfactorials to complex numbers, just as Euler’s gamma function generalizes factorials. Unfortunately, Euler made the mistake of defining his gamma function so that

$\Gamma(n) = (n-1)!$

when $n = 1, 2, 3, \cdots.$ Everyone I trust assures me this was a bad idea, not some deep insight… but Barnes doubled down on Euler’s mistake and defined his G-function so that

$G(n) = 0! \cdot 1! \cdot \cdots \cdot (n-2)!$

So, we’ll have to be careful when looking up formulas on Wikipedia: the superfactorial of n is $G(n+2).$ Thus, we have

$\displaystyle{ t_n = \frac{(n!)^{n+1}}{G(n+2)^2} }$

so

$\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }$

Now, there’s a great approximate formula for the logarithm of a factorial. It’s worth its weight in gold… or at least silver, so it’s called Stirling’s formula:

$\displaystyle{ \ln(n!) = n \ln (n) \; - \; n \; + \;\tfrac{1}{2} \ln(2 \pi n) \; + \; \frac{1}{12n} \cdots }$

where the dots mean stuff that goes to zero when divided by the last term, in the limit $n \to \infty.$

There’s also an approximate formula for the logarithm of the superfactorial! It’s a bit scary:

$\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + } \\ \\ && \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\ && \tfrac{1}{12} - \ln A \; + \cdots \end{array}$

where the dots mean stuff that actually goes to zero as $n \to \infty.$

What’s A? It’s the Glaisher–Kinkelin constant! If you’re tired of memorizing digits of pi and need a change of pace, you can look up the first 20,000 digits of the Glaisher–Kinkelin constant here, but roughly we have

$A \approx 1.282427129100622636875342...$

Of course most mathematicians don’t care much about the digits; what we really want to know is what this constant means!

Euler, who did some wacky nonrigorous calculations, once argued that

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

Riemann made this rigorous by defining

$\displaystyle{ \zeta(s) = 1^{-s} + 2^{-s} + 3^{-s} + \cdots }$

which converges when $\mathrm{Re}(s) > 1,$ and then analytically continuing this to other values of $s.$ He found that indeed

$\displaystyle{ \zeta(-1) = -\tfrac{1}{12} }$

This fact has many marvelous ramifications: for example, it’s why bosonic string theory works best in 24 dimensions! It’s also connected to the $\tfrac{1}{12n}$ term in Stirling’s formula.

But anyway, we might wonder what happens if we compute $\zeta(s)$ for $s$ near -1. This is where the Glaisher–Kinkelin constant shows up, because if we try a Taylor series we get

$\displaystyle{ \zeta'(-1) = \tfrac{1}{12} - \ln A }$

To me, this means that $\tfrac{1}{12} - \ln A$ is more fundamental than A itself, and indeed you’ll see it’s this combination that shows up in the approximate formula for superfactorials. So, we can simplify that formula a bit:

$\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + } \\ \\ && \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\ && \zeta'(-1) \; + \cdots \end{array}$

Now, let’s use this together with Stirling’s formula to estimate the logarithm of the product of all numbers in the nth row of Pascal’s triangle. Remember, that’s

$\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }$

So, we get

$\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\ \\ && - \displaystyle{ \Big[ \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\ && \quad \displaystyle{ \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big] } \end{array}$

and exponentiating this gives a good approximation to $t_n.$

Here is a graph of $t_n$ divided by this approximation, created by Egan:

As you can see, the ratio goes to 1 quite nicely!

So, we’ve seem some nice relationships between these things:

$1 + 2 + \cdots + n = T_n$

$1 \cdot 2 \cdot \cdots \cdot n = n!$

$\binom{n}{1} \cdot \binom{n}{2} \cdot \cdots \cdot \binom{n}{n} = t_n$

$1! \cdot 2! \cdot \cdots \cdot n! = G(n+2)$

$\frac{1}{0!} +\frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots = e$

and Euler’s wacky formula

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

Puzzle 2. Can you take the formula

$\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\ \\ && - \displaystyle{ \Big[ \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\ && \quad \displaystyle{ \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big] } \end{array}$

and massage it until it looks like $n(n+1)/2$ plus ‘correction terms’? How big are these correction terms?

### References

Any errors in the formulas above are my fault. Here are the papers that first squeezed e out of Pascal’s triangle:

• Harlan J. Brothers, Pascal’s triangle: The hidden stor-e, The Mathematical Gazette, March 2012, 145.

• Harlan J. Brothers, Finding e in Pascal’s triangle, Mathematics Magazine 85 (2012), 51﻿.

I learned the idea from here, thanks to Richard Elwes:

• Alexander Bogomolny, e in the Pascal Triangle, Interactive Mathematics Miscellany and Puzzles.

For how Euler derived his crazy identity

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

and how it’s relevant to string theory, try:

• John Baez, My favorite numbers: 24.

## Network Theory Talks at Oxford

7 February, 2014

I’m giving some talks at Oxford:

### Network Theory

Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:

I. Electrical circuits and signal-flow graphs.

II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.

III. Bayesian networks, information and entropy.

If you’re nearby I hope you can come! All these talks will take place in Lecture Theatre B in the Computer Science Department—see the map below. Here are the times:

• Friday 21 February 2014, 2 pm: Network Theory: overview. See the slides or watch a video.

• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal-flow graphs. See the slides or watch a video.

• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams. See the slides or watch a video.

• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy. See the slides.

The first talk will be part of the OASIS series, meaning the “Oxford Advanced Seminar on Informatic Structures”.

I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.

Ulrike Tillmann has also kindly invited me to give a topology seminar:

### Operads and the Tree of Life

Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.

I’m not sure exactly where this will take place, but surely somewhere in the Mathematical Institute building:

• Monday 24 February, 3:30 pm, Operads and the Tree of Life. See the slides.

The Computer Science Department is shown in the map here:

The Mathematical Institute is a bit to the west:

## Categories in Control

6 February, 2014

I’m visiting Erlangen from now until the end of May, since my wife got a grant to do research here. I’m trying to get a lot of papers finished. But today I’m giving a talk in the math department of the university here, which with Germanic brevity is called the Friedrich-Alexander-Universität Erlangen-Nürnberg.

You can see my slides here, or maybe even come to my talk:

Categories in control, Thursday 6 February 2014, 16:15–18:00, Mathematics Department of the FAU, in Übungsraum 1.

The title is a pun. It’s about categories in control theory, the branch of engineering that studies dynamical systems with inputs and outputs, and how to optimize their behavior.

Control theorists often describe these systems using signal-flow graphs. Here is a very rough schematic signal-flow graph, describing the all-important concept of a ‘feedback loop’:

Here is a detailed one, describing a specific device called a servo:

The device is shown on top, and the signal-flow graph describing its behavior is at bottom. For details, click on the picture.

Now, if you have a drop of category-theorist’s blood in your veins, you’ll look at this signal-flow graph and think my god, that’s a string diagram for a morphism in a monoidal category!

And you’d be right. But if you want to learn what that means, and why it matters, read my talk slides!

The slides should make sense if you’re a mathematician, but maybe not otherwise. So, here’s the executive summary. The same sort of super-abstract math that handles things like Feynman diagrams:

also handles signal-flow graphs. The details are different in important and fascinating ways, and this is what I’m mainly concerned with. But we now understand how signal-flow graphs fit into the general theory of networks. This means we can proceed to use modern math to study them—and their relation to other kinds of networks, like electrical circuit diagrams:

### More talks

Thanks to the Azimuth Project team, my graduate students and many other folks, the dream of network theory as a step toward ‘green mathematics’ seems to be coming true! There’s a vast amount left to be done, so I’d have trouble convincing a skeptic, but I feel the project has turned a corner. I now feel in my bones that it’s going to work: we’ll eventually develop a language for biology and ecology based in part on category theory.

So, I think it’s a good time to explain all the various aspects of this project that have been cooking away—some quite visibly, but others on secret back burners:

• Jacob Biamonte and I have written a book on Petri nets and chemical reaction networks. You may have seen parts of this on the blog. We started this project at the Centre for Quantum Technologies, but now he’s working at the Institute for Scientific Interchange, in Turin—and collaborating with people there on various aspects of network theory.

• Brendan Fong is working with me on electrical circuits. You may know him for his posts here on chemical reaction networks. He’s now a grad student in computer science at Oxford.

• Jason Erbele, a math grad student at U.C. Riverside, is working with me on control theory. This work is the main topic of my talk—but I also sketch how it ties together with what Brendan is doing. There’s a lot more to say here.

• Tobias Fritz, a postdoc at the Perimeter Institute, is working with me on category-theoretic aspects of information theory. We published a paper on entropy with Tom Leinster, and we’ve got a followup on relative entropy that’s almost done. I should be working on it right this instant! But for now, read the series of posts here on Azimuth: Relative Entropy Part 1, Part 2 and Part 3.

• Brendan Fong has also done some great work on Bayesian networks, using ideas that connect nicely to what Tobias and I are doing.

• Tu Pham and Franciscus Rebro are working on the math that underlies all these projects: bicategories of spans.

The computer science department at Oxford is a great place for category theory and diagrammatic reasoning, thanks to the presence of Samson Abramsky, Bob Coecke and others. I’m going to visit them from February 21 to March 14. It seems like a good time to give a series of talks on this stuff. So, stay tuned! I’ll try to make slides available here.

## Relative Entropy in Evolutionary Dynamics

22 January, 2014

guest post by Marc Harper

In John’s information geometry series, he mentioned some of my work in evolutionary dynamics. Today I’m going to tell you about some exciting extensions!

### The replicator equation

First a little refresher. For a population of $n$ replicating types, such as individuals with different eye colors or a gene with $n$ distinct alleles, the ‘replicator equation’ expresses the main idea of natural selection: the relative rate of growth of each type should be proportional to the difference between the fitness of the type and the mean fitness in the population.

To see why this equation should be true, let $P_i$ be the population of individuals of the $i$th type, which we allow to be any nonnegative real number. We can list all these numbers and get a vector:

$P = (P_1, \dots, P_n)$

The Lotka–Volterra equation is a very general rule for how these numbers can change with time:

$\displaystyle{ \frac{d P_i}{d t} = f_i(P) P_i }$

Each population grows at a rate proportional to itself, where the ‘constant of proportionality’, $f_i(P),$ is not necessarily constant: it can be any real-valued function of $P.$ This function is called the fitness of the $i$th type. Taken all together, these functions $f_i$ are called the fitness landscape.

Let $p_i$ be the fraction of individuals who are of the $i$th type:

$\displaystyle{ p_i = \frac{P_i}{\sum_{i =1}^n P_i } }$

These numbers $p_i$ are between 0 and 1, and they add up to 1. So, we can also think of them as probabilities: $p_i$ is the probability that a randomly chosen individual is of the $i$th type. This is how probability theory, and eventually entropy, gets into the game.

Again, we can bundle these numbers into a vector:

$p = (p_1, \dots, p_n)$

which we call the population distribution. It turns out that the Lotka–Volterra equation implies the replicator equation:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(P) - \langle f(P) \rangle \right) \, p_i }$

where

$\displaystyle{ \langle f(P) \rangle = \sum_{i =1}^n f_i(P) p_i }$

is the mean fitness of all the individuals. You can see the proof in Part 9 of the information geometry series.

By the way: if each fitness $f_i(P)$ only depends on the fraction of individuals of each type, not the total numbers, we can write the replicator equation in a simpler way:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }$

Anyway, the take-home message is this: the replicator equation says the fraction of individuals of any type changes at a rate proportional to fitness of that type minus the mean fitness.

Now, it has been known since the late 1970s or early 1980s, thanks to the work of Akin, Bomze, Hofbauer, Shahshahani, and others, that the replicator equation has some very interesting properties. For one thing, it often makes ‘relative entropy’ decrease. For another, it’s often an example of ‘gradient flow’. Let’s look at both of these in turn, and then talk about some new generalizations of these facts.

### Relative entropy as a Lyapunov function

I mentioned that we can think of a population distribution as a probability distribution. This lets us take ideas from probability theory and even information theory and apply them to evolutionary dynamics! For example, given two population distributions $p$ and $q,$ the information of $q$ relative to $p$ is

$I(q,p) = \displaystyle{ \sum_i q_i \ln \left(\frac{q_i}{p_i }\right)}$

This measures how much information you gain if you have a hypothesis about some state of affairs given by the probability distribution $p,$ and then someone tells you “no, the best hypothesis is $q$!”

It may seem weird to treat a population distribution as a hypothesis, but this turns out to be a good idea. Evolution can then be seen as a learning process: a process of improving the hypothesis.

We can make this precise by seeing how the relative information changes with the passage of time. Suppose we have two population distributions $q$ and $p.$ Suppose $q$ is fixed, while $p$ evolves in time according to the replicator equation. Then

$\displaystyle{ \frac{d}{d t} I(q,p) = \sum_i f_i(P) (p_i - q_i) }$

For the proof, see Part 11 of the information geometry series.

So, the information of $q$ relative to $p$ will decrease as $p$ evolves according to the replicator equation if

$\displaystyle{ \sum_i f_i(P) (p_i - q_i) } \le 0$

If $q$ makes this true for all $p,$ we say $q$ is an evolutionarily stable state. For some reasons why, see Part 13.

What matters now is that when $q$ is an evolutionarily stable state, $I(q,p)$ says how much information the population has ‘left to learn’—and we’re seeing that this always decreases. Moreover, it turns out that we always have

$I(q,p) \ge 0$

and $I(q,p) = 0$ precisely when $p = q.$

People summarize all this by saying that relative information is a ‘Lyapunov function’. Very roughly, a Lyapunov function is something that decreases with the passage of time, and is zero only at the unique stable state. To be a bit more precise, suppose we have a differential equation like

$\displaystyle{ \frac{d}{d t} x(t) = v(x(t)) }$

where $x(t) \in \mathbb{R}^n$ and $v$ is some smooth vector field on $\mathbb{R}^n.$ Then a smooth function

$V : \mathbb{R}^n \to \mathbb{R}$

is a Lyapunov function if

$V(x) \ge 0$ for all $x$

$V(x) = 0$ iff $x$ is some particular point $x_0$

and

$\displaystyle{ \frac{d}{d t} V(x(t)) \le 0 }$ for every solution of our differential equation.

In this situation, the point $x_0$ is a stable equilibrium for our differential equation: this is Lyapunov’s theorem.

### The replicator equation as a gradient flow equation

The basic idea of Lyapunov’s theorem is that when a ball likes to roll downhill and the landscape has just one bottom point, that point will be the unique stable equilibrium for the ball.

The idea of gradient flow is similar, but different: sometimes things like to roll downhill as efficiently as possible: they move in the exactly the best direction to make some quantity smaller! Under certain conditions, the replicator equation is an example of this phenomenon.

Let’s fill in some details. For starters, suppose we have some function

$F : \mathbb{R}^n \to \mathbb{R}$

Think of $V$ as ‘height’. Then the gradient flow equation says how a point $x(t) \in \mathbb{R}^n$ will move if it’s always trying its very best to go downhill:

$\displaystyle{ \frac{d}{d t} x(t) = - \nabla V(x(t)) }$

Here $\nabla$ is the usual gradient in Euclidean space:

$\displaystyle{ \nabla V = \left(\partial_1 V, \dots, \partial_n V \right) }$

where $\partial_i$ is short for the partial derivative with respect to the $i$th coordinate.

The interesting thing is that under certain conditions, the replicator equation is an example of a gradient flow equation… but typically not one where $\nabla$ is the usual gradient in Euclidean space. Instead, it’s the gradient on some other space, the space of all population distributions, which has a non-Euclidean geometry!

The space of all population distributions is a simplex:

$\{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_{i = 1}^n p_i = 1 \} .$

For example, it’s an equilateral triangle when $n = 3.$ The equilateral triangle looks flat, but if we measure distances another way it becomes round, exactly like a portion of a sphere, and that’s the non-Euclidean geometry we need!

In fact this trick works in any dimension. The idea is to give the simplex a special Riemannian metric, the ‘Fisher information metric’. The usual metric on Euclidean space is

$\delta_{i j} = \left\{\begin{array}{ccl} 1 & \mathrm{ if } & i = j \\ 0 &\mathrm{ if } & i \ne j \end{array} \right.$

This simply says that two standard basis vectors like $(0,1,0,0)$ and $(0,0,1,0)$ have dot product zero if the 1′s are in different places, and one if they’re in the same place. The Fisher information metric is a bit more complicated:

$\displaystyle{ g_{i j} = \frac{\delta_{i j}}{p_i} }$

As before, $g_{i j}$ is a formula for the dot product of the $i$th and $j$th standard basis vectors, but now it depends on where you are in the simplex of population distributions.

We saw how this formula arises from information theory back in Part 7. I won’t repeat the calculation, but the idea is this. Fix a population distribution $p$ and consider the information of another one, say $q,$ relative to this. We get $I(q,p).$ If $q = p$ this is zero:

$\displaystyle{ \left. I(q,p)\right|_{q = p} = 0 }$

and this point is a local minimum for the relative information. So, the first derivative of $I(q,p)$ as we change $q$ must be zero:

$\displaystyle{ \left. \frac{\partial}{\partial q_i} I(q,p) \right|_{q = p} = 0 }$

But the second derivatives are not zero. In fact, since we’re at a local minimum, it should not be surprising that we get a positive definite matrix of second derivatives:

$\displaystyle{ g_{i j} = \left. \frac{\partial^2}{\partial q_i \partial q_j} I(q,p) \right|_{q = p} = 0 }$

And, this is the Fisher information metric! So, the Fisher information metric is a way of taking dot products between vectors in the simplex of population distribution that’s based on the concept of relative information.

This is not the place to explain Riemannian geometry, but any metric gives a way to measure angles and distances, and thus a way to define the gradient of a function. After all, the gradient of a function should point at right angles to the level sets of that function, and its length should equal the slope of that function:

So, if we change our way of measuring angles and distances, we get a new concept of gradient! The $i$th component of this new gradient vector field turns out to b

$(\nabla_g V)^i = g^{i j} \partial_j V$

where $g^{i j}$ is the inverse of the matrix $g_{i j},$ and we sum over the repeated index $j.$ As a sanity check, make sure you see why this is the usual Euclidean gradient when $g_{i j} = \delta_{i j}.$

Now suppose the fitness landscape is the good old Euclidean gradient of some function. Then it turns out that the replicator equation is a special case of gradient flow on the space of population distributions… but where we use the Fisher information metric to define our concept of gradient!

To get a feel for this, it’s good to start with the Lotka–Volterra equation, which describes how the total number of individuals of each type changes. Suppose the fitness landscape is the Euclidean gradient of some function $V$:

$\displaystyle{ f_i(P) = \frac{\partial V}{\partial P_i} }$

Then the Lotka–Volterra equation becomes this:

$\displaystyle{ \frac{d P_i}{d t} = \frac{\partial V}{\partial P_i} \, P_i }$

This doesn’t look like the gradient flow equation, thanks to that annoying $P_i$ on the right-hand side! It certainly ain’t the gradient flow coming from the function $V$ and the usual Euclidean gradient. However, it is gradient flow coming from $V$ and some other metric on the space

$\{ P \in \mathbb{R}^n : \; P_i \ge 0 \}$

For a proof, and the formula for this other metric, see Section 3.7 in this survey:

• Marc Harper, Information geometry and evolutionary game theory.

Now let’s turn to the replicator equation:

$\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }$

Again, if the fitness landscape is a Euclidean gradient, we can rewrite the replicator equation as a gradient flow equation… but again, not with respect to the Euclidean metric. This time we need to use the Fisher information metric! I sketch a proof in my paper above.

In fact, both these results were first worked out by Shahshahani:

• Siavash Shahshahani, A New Mathematical Framework for the Study of Linkage and Selection, Memoirs of the AMS 17, 1979.

### New directions

All this is just the beginning! The ideas I just explained are unified in information geometry, where distance-like quantities such as the relative entropy and the Fisher information metric are studied. From here it’s a short walk to a very nice version of Fisher’s fundamental theorem of natural selection, which is familiar to researchers both in evolutionary dynamics and in information geometry.

You can see some very nice versions of this story for maximum likelihood estimators and linear programming here:

• Akio Fujiwara and Shun-ichi Amari, Gradient systems in view of information geometry, Physica D: Nonlinear Phenomena 80 (1995), 317–327.

Indeed, this seems to be the first paper discussing the similarities between evolutionary game theory and information geometry.

Dash Fryer (at Pomona College) and I have generalized this story in several interesting ways.

First, there are two famous ways to generalize the usual formula for entropy: Tsallis entropy and Rényi entropy, both of which involve a parameter $q.$ There are Tsallis and Rényi versions of relative entropy and the Fisher information metric as well. Everything I just explained about:

• conditions under which relative entropy is a Lyapunov function for the replicator equation, and

• conditions under which the replicator equation is a special case of gradient flow

generalize to these cases! However, these generalized entropies give modified versions of the replicator equation. When we set $q=1$ we get back the usual story. See

• Marc Harper, Escort evolutionary game theory.

My initial interest in these alternate entropies was mostly mathematical—what is so special about the corresponding geometries?—but now researchers are starting to find populations that evolve according to these kinds of modified population dynamics! For example:

• A. Hernando et al, The workings of the Maximum Entropy Principle in collective human behavior.

There’s an interesting special case worth some attention. Lots of people fret about the relative entropy not being a distance function obeying the axioms that mathematicians like: for example, it doesn’t obey the triangle inequality. Many describe the relative entropy as a distance-like function, and this is often a valid interpretation contextually. On the other hand, the $q=0$ relative entropy is one-half the Euclidean distance squared! In this case the modified version of the replicator equation looks like this:

$\displaystyle{ \frac{d p_i}{d t} = f_i(p) - \frac{1}{n} \sum_{j = 1}^n f_j(p) }$

This equation is called the projection dynamic.

Later, I showed that there is a reasonable definition of relative entropy for a much larger family of geometries that satisfies a similar distance minimization property.

In a different direction, Dash showed that you can change the way that selection acts by using a variety of alternative ‘incentives’, extending the story to some other well-known equations describing evolutionary dynamics. By replacing the terms $x_i f_i(x)$ in the replicator equation with a variety of other functions, called incentives, we can generate many commonly studied models of evolutionary dynamics. For instance if we exponentiate the fitness landscape (to make it always positive), we get what is commonly known as the logit dynamic. This amounts to changing the fitness landscape as follows:

$\displaystyle{ f_i \mapsto \frac{x_i e^{\beta f_i}}{\sum_j{x_j e^{\beta f_j}}} }$

where $\beta$ is known as an inverse temperature in statistical thermodynamics and as an intensity of selection in evolutionary dynamics. There are lots of modified versions of the replicator equation, like the best-reply and projection dynamics, more common in economic applications of evolutionary game theory, and they can all be captured in this family. (There are also other ways to simultaneously capture such families, such as Bill Sandholm’s revision protocols, which were introduced earlier in his exploration of the foundations of game dynamics.)

Dash showed that there is a natural generalization of evolutionarily stable states to ‘incentive stable states’, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. For the logit and projection dynamics, incentive stable states are simply evolutionarily stable states, and this happens frequently, but not always.

The third generalization is to look at different ‘time-scales’—that is, different ways of describing time! We can make up the symbol $\mathbb{T}$ for a general choice of ‘time-scale’. So far I’ve been treating time as a real number, so

$\mathbb{T} = \mathbb{R}$

But we can also treat time as coming in discrete evenly spaced steps, which amounts to treating time as an integer:

$\mathbb{T} = \mathbb{Z}$

More generally, we can make the steps have duration $h,$ where $h$ is any positive real number:

$\mathbb{T} = h\mathbb{Z}$

There is a nice way to simultaneously describe the cases $\mathbb{T} = \mathbb{R}$ and $\mathbb{T} = h\mathbb{Z}$ using the time-scale calculus and time-scale derivatives. For the time-scale $\mathbb{T} = \mathbb{R}$ the time-scale derivative is just the ordinary derivative. For the time-scale $\mathbb{T} = h\mathbb{Z},$ the time-scale derivative is given by the difference quotient from first year calculus:

$\displaystyle{ f^{\Delta}(z) = \frac{f(z+h) - f(z)}{h} }$

and using this as a substitute for the derivative gives difference equations like a discrete-time version of the replicator equation. There are many other choices of time-scale, such as the quantum time-scale given by $\mathbb{T} = q^{\mathbb{Z}},$ in which case the time-scale derivative is called the q-derivative, but that’s a tale for another time. In any case, the fact that the successive relative entropies are decreasing can be simply state by saying they have negative $\mathbb{T} = h\mathbb{Z}$ time-scale derivative. The continuous case we started with corresponds to $\mathbb{T} = \mathbb{R}.$

Remarkably, Dash and I were able to show that you can combine all three of these generalizations into one theorem, and even allow for multiple interacting populations! This produces some really neat population trajectories, such as the following two populations with three types, with fitness functions corresponding to the rock-paper-scissors game. On top we have the replicator equation, which goes along with the Fisher information metric; on the bottom we have the logit dynamic, which goes along with the Euclidean metric on the simplex:

From our theorem, it follows that the relative entropy (ordinary relative entropy on top, the $q = 0$ entropy on bottom) converges to zero along the population trajectories.

The final form of the theorem is loosely as follows. Pick a Riemannian geometry given by a metric $g$ (obeying some mild conditions) and an incentive for each population, as well as a time scale ($\mathbb{R}$ or $h \mathbb{Z}$) for every population. This gives an evolutionary dynamic with a natural generalization of evolutionarily stable states, and a suitable version of the relative entropy. Then, if there is an evolutionarily stable state in the interior of the simplex, the time-scale derivative of sum of the relative entropies for each population will decrease as the trajectories converge to the stable state!

When there isn’t such a stable state, we still get some interesting population dynamics, like the following:

See this paper for details:

• Marc Harper and Dashiell E. A. Fryer, Stability of evolutionary dynamics on time scales.

Next time we’ll see how to make the main idea work in finite populations, without derivatives or deterministic trajectories!

## Wormholes and Entanglement

20 January, 2014

An apparent contradiction in what most physicists believe about black holes—the ‘firewall problem’—is making some very good physicists reach for some very crazy-sounding ideas to find a way out. In particular, Maldacena and Susskind have come up with the idea that any pair of quantum-entangled particles is actually connected by a wormhole.

Entanglement is a spooky way for far-away particles to be correlated, but you can’t use it communicate faster than light. It’s been seen in the lab, but it’s only possible thanks to quantum mechanics. The first people to make a fuss over entanglement were Einstein, Podolsky and Rosen, back in 1935.

A wormhole is a spooky way for far-away regions of space to be connected by a kind of ‘tunnel’—but you probably can’t use it to communicate faster than light. Nobody has ever seen one, but they’re theoretically possible thanks to general relativity. The first people to make a fuss over wormholes were Einstein and Rosen, back in 1935.

So, superficially, it makes sense that there should be a connection between wormholes and entanglement. But when you learn enough physics, you’ll see that Maldacena and Susskind’s proposal sounds completely hare-brained.

But when you learn more physics—maybe more than enough?—you might decide there’s some merit to this idea after all. At the Centre for Quantum Technologies last summer, Jamie Vicary and I noticed some interesting connections between wormholes and quantum entanglement. We now have a paper out!

In it, we study quantum gravity in a universe where space is just 2-dimensional, not 3-dimensional like ours. It’s not realistic, but it has one huge advantage: there’s a working theory of what quantum gravity could be like when space is 2-dimensional, so you can calculate stuff!

So, we calculate what happens when a wormhole forms, and we show the ends look like a particle and its antiparticle (this was already known), and we note that this particle-antiparticle pair is entangled. In fact it’s completely entangled: any piece of information you might want to know about one can also be found in the other.

However, in a sense that Jamie and I make precise, this entanglement is ‘fake’. The reason is that the two ends of the wormhole are not independent things. They’re just two views of the same thing… and, technically, it doesn’t count as entanglement when something is ‘entangled with itself’. This fact is crucial to how Maldacena and Susskind want to get around the firewall problem.

For more details, try this:

Wormholes and entanglement, The n-Category Café.

This has links to other stuff, including our paper, but also some blog articles explaining the firewall problem, the paper by Maldacena and Susskind, and the original Einstein–Podolsky–Rosen and Einstein–Rosen papers (in English).

Since this quantum gravity stuff is more suited to the n-Category Café than here, I won’t enable comments here. If you want to talk, please go there. Sorry!

## Geometry Puzzles

12 January, 2014

Here are four puzzles about areas, in approximate order of increasing difficulty.

### Mysteries of the equilateral triangle

Puzzle: Show the area of the orange circle equals the total area of the two blue regions.

In case you’re wondering, the picture above shows an equilateral triangle with a small circle inscribed in it and a big circle circumscribed around it.

The puzzle is easy if you think about it the right way. I never knew this cool fact until last month, when I read this fun free book:

• Brian McCartin, Mysteries of the equilateral triangle.

I’ve given talks on the numbers 5, 8, and 24. I’ve imagined writing a book that had chapters on all my favorite natural numbers, starting with Chapter 0. But it would be very hard to write Chapter 3, since this number has too many interesting properties! McCartin’s book covers a lot of them.

### Square the lune to the tune of Claire de Lune

Puzzle: Show the crescent has the same area as the triangle.

Greek mathematicians really wanted to square the circle, by which I mean: use straightedge and compass to first draw a circle and then construct a square with the same area.

In 440 BC, Hippocrates of Chios figured out how to square the above crescent-shaped region, which lies between the circle centered at O and the smaller circle centered at D. So, this region is called the Lune of Hippocrates.

I’ve heard it said that this result gave some Greek geometers hope that the circle could be squared. I’m not sure this is true; Hippocrates himself was probably too smart to be fooled. But in any event, it would have been a false hope. Much later, around 1885, Lindemann and Weierstrass proved that squaring the circle was impossible.

Any crescent-shaped region formed by two circular arcs is called a lune. It’s widely believed that there are only 5 squarable lunes. In other words, there are only 5 shapes of lune constructible by straightedge and compass whose area equals that of a square constructible using straightedge and compass. (Obviously these lunes can come in many different sizes.)

Hippocrates discovered three squarable lunes. Two more were discovered by Martin Johan Wallenius in 1766. A proof that these are the only squarable lunes was given by Tchebatorew and Dorodnow, and summarized by the famous topologist Postnikov:

• M. M. Postnikov, The problem of squarable Lunes, translated from the Russian by Abe Shenitzer, American Mathematical Monthly, 107 (Aug.-Sep. 2000), 645–651.

However, there may be a loophole in this proof: Will Jagy claims that these Russians assumed without proof that the ratio of two angles involved in the construction of the lune is rational. With this assumption, finding a squarable lune amounts to finding a rational number $u$ and a constructible number $\sin a$ such that

$\sin ( u a) = \sqrt{u} \sin a$

Puzzle: Why should $u$ be rational? Do you know the true state of the art here?

(My puzzles include hard questions I don’t know the answer to, and this is one.)

For a nice discussion of the 5 squarable lunes, with pictures, see this page:

The five squarable lunes, MathPages.

### Twice in a blue lune

Puzzle: Show the area of this right triangle equals the total area inside the blue lunes. The outside of each lune is a semicircle. The inside of each lune is part of the circle containing the points A, B, and C.

The circle through all 3 corners of a triangle is called its circumcircle. You can construct the circumcircle using a straightedge and compass, if you want.

Again this is a famous old problem. The two blue lunes here are called the Lunes of Alhazen. This problem was later posed and solve by Leonardo da Vinci!

This puzzle is a lot of fun, so I urge you not to give up—but if you do, you can see da Vinci’s solution here.

### Two semicircles is half a circle

Puzzle: show the total area of the two semicircles is half the area of the large circle.

Amazingly, it seems this fact was noticed only in 2011:

• Andrew K. Jobbings, Two semicircles fill half a circle, The Mathematical Gazette 95 (Nov. 2011), 538–540.

However, Andrew Jobbings is a genius when it comes to ‘recreational mathematics’. For more of his work, check out this page:

• Andrew Jobbings, Arbelos.

I learned about this puzzle from Alexander Bogomolny, who has a wonderful website full of Javascript geometry demonstrations:

• Alexander Bogomolny, A property of semicircles.

You’ll get a scary warning asking “Do you want to run this application?” Say yes. You’ll get an applet that lets you slide the point where the semicircles touch: no matter where it is, the semicircles have the same total area! Click “hint” and you’ll get a hint. If you’re still stuck, and too impatient to solve the puzzle yourself, scroll down and see a proof!

However, that proof is long and confusing. With geometry I like demonstrations where after some thought you can simply see that the result is true. For this puzzle, such a demonstration was provided by Greg Egan.

Click here if you give up on this puzzle and want to see Egan’s solution. You may need to ponder it a bit, but when you’re done you should say “yes, this result is clear!

As a separate hint: the answers to this puzzle and the previous one are similar, in a very nice way!