Network Theory Overview

22 February, 2014

 

Here’s a video of a talk I gave yesterday, made by Brendan Fong. You can see the slides here—and then click the items in blue, and the pictures, for more information!

The idea: nature and the world of human technology are full of networks! People like to draw diagrams of networks. Mathematical physicists know that in principle these diagrams can be understood using category theory. But why should physicists have all the fun? This is the century of understanding living systems and adapting to life on a finite planet. Math isn’t the main thing we need for this, but it’s got to be part of the solution… so one thing we should do is develop a unified and powerful theory of networks.

We are still far from doing this. In this overview, I briefly described three parts of the jigsaw puzzle, and invited everyone to help fit them together:

• electrical circuits and signal-flow graphs.

• stochastic Petri nets, chemical reaction networks and Feynman diagrams.

• Bayesian networks, information and entropy.

In my talks coming up, I’ll go into more detail on each of these. With luck, you’ll be able to see videos here.

But if you’re near Oxford, you might as well actually attend! You can see dates, times, locations, my slides, and the talks themselves as they show up by going here.


Relative Entropy (Part 4)

16 February, 2014

In recent posts by Manoj Gopalkrishnan and Marc Harper we’ve seen how not just entropy but relative entropy—the entropy of a probability distribution relative to the equilibrium distribution—is a driving force in chemistry and evolution. Now Tobias Fritz and I have finally finished our paper on this subject:

A Bayesian characterization of relative entropy.

Very roughly, we proved that any reasonable measure of the information you gain when you to update your assumptions about the world based on discovering what a system is really doing must be some constant c times the relative entropy.

I’ve blogged about this here before:

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, \mathrm{FinStat}, and how relative entropy defines a functor on this category.

Relative Entropy (Part 3): statement of our main theorem, which characterizes relative entropy up to a constant multiple as the only functor F : \mathrm{FinStat} \to [0,\infty) with a few nice properties.

Now that the paper is done, we’re having a nice conversation about it on the n-Category Café. Since I don’t want to splinter the conversation, I won’t enable comments here—please go there and join the fun!

But having blogged about this thrice before, what’s new?

One thing is that our conversation is getting more deeply into the category-theoretic aspects. Read the long parenthetical remarks in my post on the n-Café to get up to speed on that aspect.

Another is that by looking at our paper, you can actually see the proof of our result. As I mention on the n-Café.

The proof is surprisingly hard. Or maybe we’re just surprisingly bad at proving things. But the interesting thing is this: the proof is swift and effective in the ‘generic’ case—the case where the support of the probability measures involved is the whole set they’re living on, and the constant c is finite.

It takes some more work to handle the case where the probability measures have smaller support.

But the really hard work starts when we handle the case that, in the end, has c = \infty. Then the proof becomes more like analysis than what you normally expect in category theory. We slowly corner the result, blocking off all avenues of escape. Then we close in, grab its neck, and strangle it, crushing its larynx ever tighter, as it loses the will to fight back and finally expires… still twitching.

We haven’t gotten into discussing this much yet, perhaps because the mathematicians on the n-Café are too dainty and civilized. But someone into analysis might be able to find a more efficient proof.

That would make me a bit sad—since why didn’t we find it?—but mainly happy—since this subject deserves to be clean and elegant. We really need a category-theoretic formulation of the second law of thermodynamics that’s suitable for studying complex networks: that’s the long-term goal here.


Triangular Numbers

12 February, 2014

This post is just for fun. I’ll start with Pascal’s triangle and show you the number e hiding inside it. Using this, we’ll see how the product of all numbers in the nth row of Pascal’s triangle is related to the nth triangular number.

That’s cute, because Pascal’s triangle looks so… triangular!

But then, with a massive amount of help from Greg Egan, we’ll dig a lot deeper, and meet strange things like superfactorials, the magic properties of the number 1/12, and the Glaisher–Kinkelin constant.

First let’s get warmed up.

Warmup

Pascal’s triangle is a famous and much-studied thing. It was discovered long before Pascal. It looks like this:

We write 1’s down the edges. We get every other number in the triangle by adding the two numbers above it to its left and right. People call these numbers binomial coefficients, since you can also get them like this:

(x + y)^5 = x^5 + 5 x^4 y + 10 x^3 y^2 + 10 x^2 y^3 + 5 x y^4 + y^5

We get the 10 in 10 x^3 y^2 here because there are 10 ways to take 5 things and choose 3 to call x’s and 2 to call y’s. In general we have

\displaystyle{ (x + y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k} }

where the binomial coefficient \binom{n}{k} is the kth number on the nth row of Pascal’s triangle. Here count both rows and the numbers in a row starting from zero.

Since we can permute n things in

n! = 1 \cdot 2 \cdot 3 \cdot \cdots \cdot n

ways, there are

\displaystyle{ \frac{n!}{k! (n-k)!} }

ways to take n things and choose k of them to be x’s and (n-k) of them to be y’s. You see, permuting the x’s or the y’s doesn’t change our choice, so we have to divide by k! and (n-k)!.

So, the kth number in the nth row of Pascal’s triangle is:

\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }

All this will be painfully familiar to many of you. But I want everyone to have an fair chance at understanding the next section, where we’ll see something nice about Pascal’s triangle and the number

e \approx 2.718281828459045...

The hidden e in Pascal’s triangle

In 2012, a guy named Harlan Brothers found the number e hiding in Pascal’s triangle… in a very simple way!

If we add up all the numbers in the nth row of Pascal’s triangle we get 2^n. But what if we take the product of all these numbers? Let’s call it t_n. Then here’s what Brothers discovered:

\displaystyle{ \lim_{n \to \infty}  \frac{t_n t_{n+2}}{t^2_{n+1}} = e }

This may seem mysterious, but in fact it’s rather simple once you see the trick. I’ll use a nice argument given by Greg Egan.

We’ve said t_n is the product of all numbers in the nth row of Pascal’s triangle. Just for fun, let’s divide this by the product of all numbers in the next row:

\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }

And since this is so much fun, let’s do it again! Divide this quantity by its next value:

\displaystyle{ v_n = \frac{u_n}{u_{n+1}}   }

Now look:

\displaystyle{v_n = \frac{t_n/t_{n+1}}{t_{n+1}/t_{n+2}} = \frac{t_n t_{n+2}}{t_{n+1}^2}  }

So, this is the thing that should approach e.

But why does it approach e? To see this, first take Pascal’s triangle and divide each number by the number to its lower left. We get a second triangle of numbers. Then take each number in this second triangle and divide it by the number to its lower right! We get a third triangle, like this:

If you think a bit, you’ll see:

• In the first triangle, the product of all numbers in the nth row is t_n.

• In the second, the product of all numbers in the nth row is u_n.

• In the third, the product of all numbers in the nth row is v_n.

And look—there’s a cool pattern! In the third triangle, all the numbers in any given row are equal. In the row with n numbers, all those numbers equal

\displaystyle{ (n+1)/n = 1 + \frac{1}{n} }

So, the product of all these numbers is

\displaystyle{ \left(1 + \frac{1}{n}\right)^n }

But it’s a famous fact that

\displaystyle{ \lim_{n \to \infty}  \left(1 + \frac{1}{n}\right)^n = e}

Some people even use this as a definition of e. So,

\displaystyle{ \lim_{n \to \infty} v_n = e}

which is just what we wanted!

Puzzle 1. Use the formula I mentioned:

\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }

to show all the numbers in the same row of the third triangle are equal.

Triangular numbers

The number of balls in a triangular stack with n balls at the bottom is called the nth triangular number,

T_n = 1 + 2 + \cdots + n

If you chop a square of balls in half along a diagonal you get a triangle, so T_n is approximately half the nth square number:

\displaystyle{ T_n \approx \frac{n^2}{2} }

But if you chop the square in half this way, the balls on the diagonal get cut in half, so to get the nth triangle number we need to include their other halves:

T_n = \displaystyle{ \frac{n^2}{2} + \frac{n}{2} = \frac{n(n+1)}{2} }

These numbers go like

1, 3, 6, 10, \dots

and they are one of the simple pleasures of life, like sunshine and good bread. Spotting them in a knight-move zigzag pattern in the multiplication table was one of my first truly mathematical experiences. You can also see them in Pascal’s triangle:

because

T_n = \displaystyle{ \binom{n+1}{2} }

But today it’s time to see how T_n is related to t_n, the product of all the numbers in the nth row of Pascal’s triangle!

If we subtract each triangular number from the the one before it we get the numbers -n, and if we subtract each of these numbers from the one before it we get the number 1. This should remind you of something we’ve seen:

\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }

\displaystyle{ v_n = \frac{u_n}{u_{n+1}}   }

\displaystyle{ \lim_{n \to \infty} v_n = e }

Here we’re dividing instead of subtracting. But if take logarithms, we’ll be subtracting, and we get

\ln(u_n) = \ln(t_n) - \ln(t_{n+1})

\ln(v_n) = \ln(u_n) - \ln(u_{n+1})

\displaystyle{ \lim_{n \to \infty} \ln(v_n) = 1 }

What can we do with this? Well, suppose \ln(v_n) were equal to 1, instead of approaching it. Then \ln(t_n) could be the nth triangular number… and we’d get a cool formula for the product of all numbers in the nth row of Pascal’s triangle!

But since \ln(v_n) is just approximately equal to 1, we should only expect an approximate formula for the product of all numbers in the nth row of Pascal’s triangle:

\ln(t_n) \approx T_n

or

\displaystyle{ t_n \approx e^{T_n} = e^{n(n+1)/2}  }

This was my hope. But how good are these approximations? I left this as a puzzle on Google+, and then Greg Egan stepped in and solved it.

For starters, he graphed the ratio \ln(t_n)/T_n, and got this:

That looks pretty good: it looks like it’s approaching 1. But he also graphed the ratio t_n/e^{T_n}, and got this:

Not so good. Taking exponentials amplifies the errors. If we want a good asymptotic formula t_n, we have to work harder. And this is what Egan did.

Digging deeper

So far we’ve talked a lot about t_n, the product of all numbers in the nth row in Pascal’s triangle… but we haven’t actually computed it. Let’s try:

\begin{array}{ccl}  t_n &=& \displaystyle{ \binom{n}{0} \cdot \binom{n}{1} \cdot \cdots \cdot \binom{n}{n} }   \\  \\  &=& \displaystyle{ \frac{n!}{0! \cdot n!} \cdot \frac{n!}{1! \cdot (n-1)!} \cdot \cdots \cdot \frac{n!}{n! \cdot 0!} }   \\  \\  &=& \displaystyle{ \frac{(n!)^{n+1}}{\left(0! \cdot 1! \cdot \cdots \cdot n!\right)^2} }  \end{array}

So, we’re seeing the superfactorial

\displaystyle{ 0! \cdot 1! \cdot \cdots \cdot n! }

raise its pretty head. This is also called the Barnes G-function, presumably by people who want to make it sound more boring.

Actually that’s not fair: the Barnes G-function generalizes superfactorials to complex numbers, just as Euler’s gamma function generalizes factorials. Unfortunately, Euler made the mistake of defining his gamma function so that

\Gamma(n) = (n-1)!

when n = 1, 2, 3, \cdots. Everyone I trust assures me this was a bad idea, not some deep insight… but Barnes doubled down on Euler’s mistake and defined his G-function so that

G(n) = 0! \cdot 1! \cdot \cdots \cdot (n-2)!

So, we’ll have to be careful when looking up formulas on Wikipedia: the superfactorial of n is G(n+2). Thus, we have

\displaystyle{ t_n = \frac{(n!)^{n+1}}{G(n+2)^2} }

so

\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }

Now, there’s a great approximate formula for the logarithm of a factorial. It’s worth its weight in gold… or at least silver, so it’s called Stirling’s formula:

\displaystyle{ \ln(n!) = n \ln (n)  \; - \; n \; + \;\tfrac{1}{2} \ln(2 \pi n) \; +  \; \frac{1}{12n} \cdots }

where the dots mean stuff that goes to zero when divided by the last term, in the limit n \to \infty.

There’s also an approximate formula for the logarithm of the superfactorial! It’s a bit scary:

\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + }  \\  \\  &&  \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\  && \tfrac{1}{12} - \ln A \; + \cdots   \end{array}

where the dots mean stuff that actually goes to zero as n \to \infty.

What’s A? It’s the Glaisher–Kinkelin constant! If you’re tired of memorizing digits of pi and need a change of pace, you can look up the first 20,000 digits of the Glaisher–Kinkelin constant here, but roughly we have

A \approx 1.282427129100622636875342...

Of course most mathematicians don’t care much about the digits; what we really want to know is what this constant means!

Euler, who did some wacky nonrigorous calculations, once argued that

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

Riemann made this rigorous by defining

\displaystyle{ \zeta(s) = 1^{-s} + 2^{-s} + 3^{-s} + \cdots }

which converges when \mathrm{Re}(s) > 1, and then analytically continuing this to other values of s. He found that indeed

\displaystyle{ \zeta(-1) = -\tfrac{1}{12} }

This fact has many marvelous ramifications: for example, it’s why bosonic string theory works best in 24 dimensions! It’s also connected to the \tfrac{1}{12n} term in Stirling’s formula.

But anyway, we might wonder what happens if we compute \zeta(s) for s near -1. This is where the Glaisher–Kinkelin constant shows up, because if we try a Taylor series we get

\displaystyle{ \zeta'(-1) = \tfrac{1}{12} - \ln A }

To me, this means that \tfrac{1}{12} - \ln A is more fundamental than A itself, and indeed you’ll see it’s this combination that shows up in the approximate formula for superfactorials. So, we can simplify that formula a bit:

\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + }  \\  \\  &&  \tfrac{\ln(2\pi)}{2} \, (n+1) \; -  \\ \\ &&  \tfrac{1}{12} \ln(n+1) \; + \\ \\  &&   \zeta'(-1) \; + \cdots   \end{array}

Now, let’s use this together with Stirling’s formula to estimate the logarithm of the product of all numbers in the nth row of Pascal’s triangle. Remember, that’s

\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }

So, we get

\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\  \\  && - \displaystyle{ \Big[  \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\  && \quad \displaystyle{   \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big]  }  \end{array}

and exponentiating this gives a good approximation to t_n.

Here is a graph of t_n divided by this approximation, created by Egan:

As you can see, the ratio goes to 1 quite nicely!

So, we’ve seem some nice relationships between these things:

1 + 2 + \cdots + n = T_n

1 \cdot 2 \cdot \cdots \cdot n = n!

\binom{n}{1} \cdot \binom{n}{2} \cdot \cdots \cdot \binom{n}{n} = t_n

1! \cdot 2! \cdot \cdots \cdot n! = G(n+2)

\frac{1}{0!} +\frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots = e

and Euler’s wacky formula

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

Puzzle 2. Can you take the formula

\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\  \\  && - \displaystyle{ \Big[  \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\  && \quad \displaystyle{   \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big]  }  \end{array}

and massage it until it looks like n(n+1)/2 plus ‘correction terms’? How big are these correction terms?

References

Any errors in the formulas above are my fault. Here are the papers that first squeezed e out of Pascal’s triangle:

• Harlan J. Brothers, Pascal’s triangle: The hidden stor-e, The Mathematical Gazette, March 2012, 145.

• Harlan J. Brothers, Finding e in Pascal’s triangle, Mathematics Magazine 85 (2012), 51.

I learned the idea from here, thanks to Richard Elwes:

• Alexander Bogomolny, e in the Pascal Triangle, Interactive Mathematics Miscellany and Puzzles.

For how Euler derived his crazy identity

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

and how it’s relevant to string theory, try:

• John Baez, My favorite numbers: 24.


Network Theory Talks at Oxford

7 February, 2014

I’m giving some talks at Oxford:

Network Theory

Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:

I. Electrical circuits and signal-flow graphs.

II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.

III. Bayesian networks, information and entropy.

If you’re nearby I hope you can come! All these talks will take place in Lecture Theatre B in the Computer Science Department—see the map below. Here are the times:

• Friday 21 February 2014, 2 pm: Network Theory: overview. See the slides or watch a video.

• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal-flow graphs. See the slides or watch a video.

• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams. See the slides or watch a video.

• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy. See the slides or watch a video

The first talk will be part of the OASIS series, meaning the “Oxford Advanced Seminar on Informatic Structures”.

I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.

Ulrike Tillmann has also kindly invited me to give a topology seminar:

Operads and the Tree of Life

Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.

I’m not sure exactly where this will take place, but surely somewhere in the Mathematical Institute building:

• Monday 24 February, 3:30 pm, Operads and the Tree of Life. See the slides.

The Computer Science Department is shown in the map here:

 

The Mathematical Institute is a bit to the west:


Categories in Control – Erlangen Talk

6 February, 2014

 

I’m visiting Erlangen from now until the end of May, since my wife got a grant to do research here. I’m trying to get a lot of papers finished. But today I’m giving a talk in the math department of the university here, which with Germanic brevity is called the Friedrich-Alexander-Universität Erlangen-Nürnberg.

You can see my slides here, or maybe even come to my talk:

Categories in control, Thursday 6 February 2014, 16:15–18:00, Mathematics Department of the FAU, in Übungsraum 1.

The title is a pun. It’s about categories in control theory, the branch of engineering that studies dynamical systems with inputs and outputs, and how to optimize their behavior.

Control theorists often describe these systems using signal-flow graphs. Here is a very rough schematic signal-flow graph, describing the all-important concept of a ‘feedback loop’:

Here is a detailed one, describing a specific device called a servo:

The device is shown on top, and the signal-flow graph describing its behavior is at bottom. For details, click on the picture.

Now, if you have a drop of category-theorist’s blood in your veins, you’ll look at this signal-flow graph and think my god, that’s a string diagram for a morphism in a monoidal category!

And you’d be right. But if you want to learn what that means, and why it matters, read my talk slides!

The slides should make sense if you’re a mathematician, but maybe not otherwise. So, here’s the executive summary. The same sort of super-abstract math that handles things like Feynman diagrams:

also handles signal-flow graphs. The details are different in important and fascinating ways, and this is what I’m mainly concerned with. But we now understand how signal-flow graphs fit into the general theory of networks. This means we can proceed to use modern math to study them—and their relation to other kinds of networks, like electrical circuit diagrams:

More talks

Thanks to the Azimuth Project team, my graduate students and many other folks, the dream of network theory as a step toward ‘green mathematics’ seems to be coming true! There’s a vast amount left to be done, so I’d have trouble convincing a skeptic, but I feel the project has turned a corner. I now feel in my bones that it’s going to work: we’ll eventually develop a language for biology and ecology based in part on category theory.

So, I think it’s a good time to explain all the various aspects of this project that have been cooking away—some quite visibly, but others on secret back burners:

• Jacob Biamonte and I have written a book on Petri nets and chemical reaction networks. You may have seen parts of this on the blog. We started this project at the Centre for Quantum Technologies, but now he’s working at the Institute for Scientific Interchange, in Turin—and collaborating with people there on various aspects of network theory.

• Brendan Fong is working with me on electrical circuits. You may know him for his posts here on chemical reaction networks. He’s now a grad student in computer science at Oxford.

• Jason Erbele, a math grad student at U.C. Riverside, is working with me on control theory. This work is the main topic of my talk—but I also sketch how it ties together with what Brendan is doing. There’s a lot more to say here.

• Tobias Fritz, a postdoc at the Perimeter Institute, is working with me on category-theoretic aspects of information theory. We published a paper on entropy with Tom Leinster, and we’ve got a followup on relative entropy that’s almost done. I should be working on it right this instant! But for now, read the series of posts here on Azimuth: Relative Entropy Part 1, Part 2 and Part 3.

• Brendan Fong has also done some great work on Bayesian networks, using ideas that connect nicely to what Tobias and I are doing.

• Tu Pham and Franciscus Rebro are working on the math that underlies all these projects: bicategories of spans.

The computer science department at Oxford is a great place for category theory and diagrammatic reasoning, thanks to the presence of Samson Abramsky, Bob Coecke and others. I’m going to visit them from February 21 to March 14. It seems like a good time to give a series of talks on this stuff. So, stay tuned! I’ll try to make slides available here.


Relative Entropy in Evolutionary Dynamics

22 January, 2014

guest post by Marc Harper

In John’s information geometry series, he mentioned some of my work in evolutionary dynamics. Today I’m going to tell you about some exciting extensions!

The replicator equation

First a little refresher. For a population of n replicating types, such as individuals with different eye colors or a gene with n distinct alleles, the ‘replicator equation’ expresses the main idea of natural selection: the relative rate of growth of each type should be proportional to the difference between the fitness of the type and the mean fitness in the population.

To see why this equation should be true, let P_i be the population of individuals of the ith type, which we allow to be any nonnegative real number. We can list all these numbers and get a vector:

P = (P_1, \dots, P_n)

The Lotka–Volterra equation is a very general rule for how these numbers can change with time:

\displaystyle{ \frac{d P_i}{d t} = f_i(P) P_i }

Each population grows at a rate proportional to itself, where the ‘constant of proportionality’, f_i(P), is not necessarily constant: it can be any real-valued function of P. This function is called the fitness of the ith type. Taken all together, these functions f_i are called the fitness landscape.

Let p_i be the fraction of individuals who are of the ith type:

\displaystyle{ p_i = \frac{P_i}{\sum_{i =1}^n P_i } }

These numbers p_i are between 0 and 1, and they add up to 1. So, we can also think of them as probabilities: p_i is the probability that a randomly chosen individual is of the ith type. This is how probability theory, and eventually entropy, gets into the game.

Again, we can bundle these numbers into a vector:

p = (p_1, \dots, p_n)

which we call the population distribution. It turns out that the Lotka–Volterra equation implies the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(P) - \langle f(P) \rangle \right) \, p_i }

where

\displaystyle{ \langle f(P) \rangle = \sum_{i =1}^n  f_i(P)  p_i  }

is the mean fitness of all the individuals. You can see the proof in Part 9 of the information geometry series.

By the way: if each fitness f_i(P) only depends on the fraction of individuals of each type, not the total numbers, we can write the replicator equation in a simpler way:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p) - \langle f(p) \rangle \right) \, p_i }

From now on, when talking about this equation, that’s what I’ll do.

Anyway, the take-home message is this: the replicator equation says the fraction of individuals of any type changes at a rate proportional to fitness of that type minus the mean fitness.

Now, it has been known since the late 1970s or early 1980s, thanks to the work of Akin, Bomze, Hofbauer, Shahshahani, and others, that the replicator equation has some very interesting properties. For one thing, it often makes ‘relative entropy’ decrease. For another, it’s often an example of ‘gradient flow’. Let’s look at both of these in turn, and then talk about some new generalizations of these facts.

Relative entropy as a Lyapunov function

I mentioned that we can think of a population distribution as a probability distribution. This lets us take ideas from probability theory and even information theory and apply them to evolutionary dynamics! For example, given two population distributions p and q, the information of q relative to p is

I(q,p) = \displaystyle{ \sum_i q_i \ln \left(\frac{q_i}{p_i }\right)}

This measures how much information you gain if you have a hypothesis about some state of affairs given by the probability distribution p, and then someone tells you “no, the best hypothesis is q!”

It may seem weird to treat a population distribution as a hypothesis, but this turns out to be a good idea. Evolution can then be seen as a learning process: a process of improving the hypothesis.

We can make this precise by seeing how the relative information changes with the passage of time. Suppose we have two population distributions q and p. Suppose q is fixed, while p evolves in time according to the replicator equation. Then

\displaystyle{  \frac{d}{d t} I(q,p)  =  \sum_i f_i(P) (p_i - q_i) }

For the proof, see Part 11 of the information geometry series.

So, the information of q relative to p will decrease as p evolves according to the replicator equation if

\displaystyle{  \sum_i f_i(P) (p_i - q_i) } \le 0

If q makes this true for all p, we say q is an evolutionarily stable state. For some reasons why, see Part 13.

What matters now is that when q is an evolutionarily stable state, I(q,p) says how much information the population has ‘left to learn’—and we’re seeing that this always decreases. Moreover, it turns out that we always have

I(q,p) \ge 0

and I(q,p) = 0 precisely when p = q.

People summarize all this by saying that relative information is a ‘Lyapunov function’. Very roughly, a Lyapunov function is something that decreases with the passage of time, and is zero only at the unique stable state. To be a bit more precise, suppose we have a differential equation like

\displaystyle{  \frac{d}{d t} x(t) = v(x(t)) }

where x(t) \in \mathbb{R}^n and v is some smooth vector field on \mathbb{R}^n. Then a smooth function

V : \mathbb{R}^n \to \mathbb{R}

is a Lyapunov function if

V(x) \ge 0 for all x

V(x) = 0 iff x is some particular point x_0

and

\displaystyle{ \frac{d}{d t} V(x(t)) \le 0 } for every solution of our differential equation.

In this situation, the point x_0 is a stable equilibrium for our differential equation: this is Lyapunov’s theorem.

The replicator equation as a gradient flow equation

The basic idea of Lyapunov’s theorem is that when a ball likes to roll downhill and the landscape has just one bottom point, that point will be the unique stable equilibrium for the ball.

The idea of gradient flow is similar, but different: sometimes things like to roll downhill as efficiently as possible: they move in the exactly the best direction to make some quantity smaller! Under certain conditions, the replicator equation is an example of this phenomenon.

Let’s fill in some details. For starters, suppose we have some function

F : \mathbb{R}^n \to \mathbb{R}

Think of V as ‘height’. Then the gradient flow equation says how a point x(t) \in \mathbb{R}^n will move if it’s always trying its very best to go downhill:

\displaystyle{ \frac{d}{d t} x(t) = - \nabla V(x(t)) }

Here \nabla is the usual gradient in Euclidean space:

\displaystyle{ \nabla V = \left(\partial_1 V, \dots, \partial_n V \right)  }

where \partial_i is short for the partial derivative with respect to the ith coordinate.

The interesting thing is that under certain conditions, the replicator equation is an example of a gradient flow equation… but typically not one where \nabla is the usual gradient in Euclidean space. Instead, it’s the gradient on some other space, the space of all population distributions, which has a non-Euclidean geometry!

The space of all population distributions is a simplex:

\{ p \in \mathbb{R}^n : \; p_i \ge 0, \; \sum_{i = 1}^n p_i = 1 \} .

For example, it’s an equilateral triangle when n = 3. The equilateral triangle looks flat, but if we measure distances another way it becomes round, exactly like a portion of a sphere, and that’s the non-Euclidean geometry we need!

In fact this trick works in any dimension. The idea is to give the simplex a special Riemannian metric, the ‘Fisher information metric’. The usual metric on Euclidean space is

\delta_{i j} = \left\{\begin{array}{ccl} 1 & \mathrm{ if } & i = j \\                                       0 &\mathrm{ if } & i \ne j \end{array} \right.

This simply says that two standard basis vectors like (0,1,0,0) and (0,0,1,0) have dot product zero if the 1’s are in different places, and one if they’re in the same place. The Fisher information metric is a bit more complicated:

\displaystyle{ g_{i j} = \frac{\delta_{i j}}{p_i} }

As before, g_{i j} is a formula for the dot product of the ith and jth standard basis vectors, but now it depends on where you are in the simplex of population distributions.

We saw how this formula arises from information theory back in Part 7. I won’t repeat the calculation, but the idea is this. Fix a population distribution p and consider the information of another one, say q, relative to this. We get I(q,p). If q = p this is zero:

\displaystyle{ \left. I(q,p)\right|_{q = p} = 0 }

and this point is a local minimum for the relative information. So, the first derivative of I(q,p) as we change q must be zero:

\displaystyle{ \left. \frac{\partial}{\partial q_i} I(q,p) \right|_{q = p} = 0 }

But the second derivatives are not zero. In fact, since we’re at a local minimum, it should not be surprising that we get a positive definite matrix of second derivatives:

\displaystyle{  g_{i j} = \left. \frac{\partial^2}{\partial q_i \partial q_j} I(q,p) \right|_{q = p} = 0 }

And, this is the Fisher information metric! So, the Fisher information metric is a way of taking dot products between vectors in the simplex of population distribution that’s based on the concept of relative information.

This is not the place to explain Riemannian geometry, but any metric gives a way to measure angles and distances, and thus a way to define the gradient of a function. After all, the gradient of a function should point at right angles to the level sets of that function, and its length should equal the slope of that function:

So, if we change our way of measuring angles and distances, we get a new concept of gradient! The ith component of this new gradient vector field turns out to b

(\nabla_g V)^i = g^{i j} \partial_j V

where g^{i j} is the inverse of the matrix g_{i j}, and we sum over the repeated index j. As a sanity check, make sure you see why this is the usual Euclidean gradient when g_{i j} = \delta_{i j}.

Now suppose the fitness landscape is the good old Euclidean gradient of some function. Then it turns out that the replicator equation is a special case of gradient flow on the space of population distributions… but where we use the Fisher information metric to define our concept of gradient!

To get a feel for this, it’s good to start with the Lotka–Volterra equation, which describes how the total number of individuals of each type changes. Suppose the fitness landscape is the Euclidean gradient of some function V:

\displaystyle{ f_i(P) = \frac{\partial V}{\partial P_i} }

Then the Lotka–Volterra equation becomes this:

\displaystyle{ \frac{d P_i}{d t} = \frac{\partial V}{\partial P_i} \, P_i }

This doesn’t look like the gradient flow equation, thanks to that annoying P_i on the right-hand side! It certainly ain’t the gradient flow coming from the function V and the usual Euclidean gradient. However, it is gradient flow coming from V and some other metric on the space

\{ P \in \mathbb{R}^n : \; P_i \ge 0 \}

For a proof, and the formula for this other metric, see Section 3.7 in this survey:

• Marc Harper, Information geometry and evolutionary game theory.

Now let’s turn to the replicator equation:

\displaystyle{ \frac{d p_i}{d t} = \left( f_i(p)  - \langle f(p) \rangle \right) \, p_i }

Again, if the fitness landscape is a Euclidean gradient, we can rewrite the replicator equation as a gradient flow equation… but again, not with respect to the Euclidean metric. This time we need to use the Fisher information metric! I sketch a proof in my paper above.

In fact, both these results were first worked out by Shahshahani:

• Siavash Shahshahani, A New Mathematical Framework for the Study of Linkage and Selection, Memoirs of the AMS 17, 1979.

New directions

All this is just the beginning! The ideas I just explained are unified in information geometry, where distance-like quantities such as the relative entropy and the Fisher information metric are studied. From here it’s a short walk to a very nice version of Fisher’s fundamental theorem of natural selection, which is familiar to researchers both in evolutionary dynamics and in information geometry.

You can see some very nice versions of this story for maximum likelihood estimators and linear programming here:

• Akio Fujiwara and Shun-ichi Amari, Gradient systems in view of information geometry, Physica D: Nonlinear Phenomena 80 (1995), 317–327.

Indeed, this seems to be the first paper discussing the similarities between evolutionary game theory and information geometry.

Dash Fryer (at Pomona College) and I have generalized this story in several interesting ways.

First, there are two famous ways to generalize the usual formula for entropy: Tsallis entropy and Rényi entropy, both of which involve a parameter q. There are Tsallis and Rényi versions of relative entropy and the Fisher information metric as well. Everything I just explained about:

• conditions under which relative entropy is a Lyapunov function for the replicator equation, and

• conditions under which the replicator equation is a special case of gradient flow

generalize to these cases! However, these generalized entropies give modified versions of the replicator equation. When we set q=1 we get back the usual story. See

• Marc Harper, Escort evolutionary game theory.

My initial interest in these alternate entropies was mostly mathematical—what is so special about the corresponding geometries?—but now researchers are starting to find populations that evolve according to these kinds of modified population dynamics! For example:

• A. Hernando et al, The workings of the Maximum Entropy Principle in collective human behavior.

There’s an interesting special case worth some attention. Lots of people fret about the relative entropy not being a distance function obeying the axioms that mathematicians like: for example, it doesn’t obey the triangle inequality. Many describe the relative entropy as a distance-like function, and this is often a valid interpretation contextually. On the other hand, the q=0 relative entropy is one-half the Euclidean distance squared! In this case the modified version of the replicator equation looks like this:

\displaystyle{ \frac{d p_i}{d t} = f_i(p) - \frac{1}{n} \sum_{j = 1}^n f_j(p) }

This equation is called the projection dynamic.

Later, I showed that there is a reasonable definition of relative entropy for a much larger family of geometries that satisfies a similar distance minimization property.

In a different direction, Dash showed that you can change the way that selection acts by using a variety of alternative ‘incentives’, extending the story to some other well-known equations describing evolutionary dynamics. By replacing the terms x_i f_i(x) in the replicator equation with a variety of other functions, called incentives, we can generate many commonly studied models of evolutionary dynamics. For instance if we exponentiate the fitness landscape (to make it always positive), we get what is commonly known as the logit dynamic. This amounts to changing the fitness landscape as follows:

\displaystyle{ f_i \mapsto \frac{x_i e^{\beta f_i}}{\sum_j{x_j e^{\beta f_j}}} }

where \beta is known as an inverse temperature in statistical thermodynamics and as an intensity of selection in evolutionary dynamics. There are lots of modified versions of the replicator equation, like the best-reply and projection dynamics, more common in economic applications of evolutionary game theory, and they can all be captured in this family. (There are also other ways to simultaneously capture such families, such as Bill Sandholm’s revision protocols, which were introduced earlier in his exploration of the foundations of game dynamics.)

Dash showed that there is a natural generalization of evolutionarily stable states to ‘incentive stable states’, and that for incentive stable states, the relative entropy is decreasing to zero when the trajectories get near the equilibrium. For the logit and projection dynamics, incentive stable states are simply evolutionarily stable states, and this happens frequently, but not always.

The third generalization is to look at different ‘time-scales’—that is, different ways of describing time! We can make up the symbol \mathbb{T} for a general choice of ‘time-scale’. So far I’ve been treating time as a real number, so

\mathbb{T} = \mathbb{R}

But we can also treat time as coming in discrete evenly spaced steps, which amounts to treating time as an integer:

\mathbb{T} = \mathbb{Z}

More generally, we can make the steps have duration h, where h is any positive real number:

\mathbb{T} = h\mathbb{Z}

There is a nice way to simultaneously describe the cases \mathbb{T} = \mathbb{R} and \mathbb{T} = h\mathbb{Z} using the time-scale calculus and time-scale derivatives. For the time-scale \mathbb{T} = \mathbb{R} the time-scale derivative is just the ordinary derivative. For the time-scale \mathbb{T} = h\mathbb{Z}, the time-scale derivative is given by the difference quotient from first year calculus:

\displaystyle{ f^{\Delta}(z) = \frac{f(z+h) - f(z)}{h} }

and using this as a substitute for the derivative gives difference equations like a discrete-time version of the replicator equation. There are many other choices of time-scale, such as the quantum time-scale given by \mathbb{T} = q^{\mathbb{Z}}, in which case the time-scale derivative is called the q-derivative, but that’s a tale for another time. In any case, the fact that the successive relative entropies are decreasing can be simply state by saying they have negative \mathbb{T} = h\mathbb{Z} time-scale derivative. The continuous case we started with corresponds to \mathbb{T} = \mathbb{R}.

Remarkably, Dash and I were able to show that you can combine all three of these generalizations into one theorem, and even allow for multiple interacting populations! This produces some really neat population trajectories, such as the following two populations with three types, with fitness functions corresponding to the rock-paper-scissors game. On top we have the replicator equation, which goes along with the Fisher information metric; on the bottom we have the logit dynamic, which goes along with the Euclidean metric on the simplex:

From our theorem, it follows that the relative entropy (ordinary relative entropy on top, the q = 0 entropy on bottom) converges to zero along the population trajectories.

The final form of the theorem is loosely as follows. Pick a Riemannian geometry given by a metric g (obeying some mild conditions) and an incentive for each population, as well as a time scale (\mathbb{R} or h \mathbb{Z}) for every population. This gives an evolutionary dynamic with a natural generalization of evolutionarily stable states, and a suitable version of the relative entropy. Then, if there is an evolutionarily stable state in the interior of the simplex, the time-scale derivative of sum of the relative entropies for each population will decrease as the trajectories converge to the stable state!

When there isn’t such a stable state, we still get some interesting population dynamics, like the following:


See this paper for details:

• Marc Harper and Dashiell E. A. Fryer, Stability of evolutionary dynamics on time scales.

Next time we’ll see how to make the main idea work in finite populations, without derivatives or deterministic trajectories!


Wormholes and Entanglement

20 January, 2014

 

An apparent contradiction in what most physicists believe about black holes—the ‘firewall problem’—is making some very good physicists reach for some very crazy-sounding ideas to find a way out. In particular, Maldacena and Susskind have come up with the idea that any pair of quantum-entangled particles is actually connected by a wormhole.

Entanglement is a spooky way for far-away particles to be correlated, but you can’t use it communicate faster than light. It’s been seen in the lab, but it’s only possible thanks to quantum mechanics. The first people to make a fuss over entanglement were Einstein, Podolsky and Rosen, back in 1935.

A wormhole is a spooky way for far-away regions of space to be connected by a kind of ‘tunnel’—but you probably can’t use it to communicate faster than light. Nobody has ever seen one, but they’re theoretically possible thanks to general relativity. The first people to make a fuss over wormholes were Einstein and Rosen, back in 1935.

So, superficially, it makes sense that there should be a connection between wormholes and entanglement. But when you learn enough physics, you’ll see that Maldacena and Susskind’s proposal sounds completely hare-brained.

But when you learn more physics—maybe more than enough?—you might decide there’s some merit to this idea after all. At the Centre for Quantum Technologies last summer, Jamie Vicary and I noticed some interesting connections between wormholes and quantum entanglement. We now have a paper out!

In it, we study quantum gravity in a universe where space is just 2-dimensional, not 3-dimensional like ours. It’s not realistic, but it has one huge advantage: there’s a working theory of what quantum gravity could be like when space is 2-dimensional, so you can calculate stuff!

So, we calculate what happens when a wormhole forms, and we show the ends look like a particle and its antiparticle (this was already known), and we note that this particle-antiparticle pair is entangled. In fact it’s completely entangled: any piece of information you might want to know about one can also be found in the other.

However, in a sense that Jamie and I make precise, this entanglement is ‘fake’. The reason is that the two ends of the wormhole are not independent things. They’re just two views of the same thing… and, technically, it doesn’t count as entanglement when something is ‘entangled with itself’. This fact is crucial to how Maldacena and Susskind want to get around the firewall problem.

For more details, try this:

Wormholes and entanglement, The n-Category Café.

This has links to other stuff, including our paper, but also some blog articles explaining the firewall problem, the paper by Maldacena and Susskind, and the original Einstein–Podolsky–Rosen and Einstein–Rosen papers (in English).

Since this quantum gravity stuff is more suited to the n-Category Café than here, I won’t enable comments here. If you want to talk, please go there. Sorry!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers