Periodic Patterns in Peptide Masses

6 April, 2017

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers m_1, ... , m_{20}, how many ways can you write any natural number N as a finite ordered sum of these numbers? Call it F(N) and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for F(N). It obeys a linear recurrence:

F(N) = F(N - m_1) + \cdots + F(N - m_{20})

and we can solve this using the ansatz

F(N) = x^N

Then the recurrence relation will hold if

x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}

for all N. But this is fairly easy to achieve! If m_{20} is the biggest mass, we just need this polynomial equation to hold:

x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1

There will be a bunch of solutions, about m_{20} of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for F(N) we need to find the right linear combination of functions x^N where x ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root x with the largest magnitude will make x^N grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses m_1 = 1, m_2 = 2. Then the numbers F(N) are the Fibonacci numbers. I hope you know this: the Nth Fibonacci number is the number of ways to write N as the sum of an ordered list of 1’s and 2’s!

1

1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

x^2 = x + 1

and the root with largest magnitude is the golden ratio:

\Phi = 1.6180339...

The other root is

1 - \Phi = -0.6180339...

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers m_1 < \cdots < m_k are all the roots of

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

real?

I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that F(N) is asymptotically proportional to x^N where x is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of F(N). The red roots have the second largest magnitude. These give the main oscillations in F(N), which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers m_1 < \cdots < m_k and draw all the roots of the equations

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The beauty of roots.

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:


But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses m_i are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!


Applied Category Theory

6 April, 2017

The American Mathematical Society is having a meeting here at U. C. Riverside during the weekend of November 4th and 5th, 2017. I’m organizing a session on Applied Category Theory, and I’m looking for people to give talks.

The goal is to start a conversation about applications of category theory, not within pure math or fundamental physics, but to other branches of science and engineering—especially those where the use of category theory is not already well-established! For example, my students and I have been applying category theory to chemistry, electrical engineering, control theory and Markov processes.

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through cyberspace to me! Abstracts are due September 12th, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 4–5 November 2017.

We’ll be having some interesting plenary talks:

• Paul Balmer, UCLA, An invitation to tensor-triangular geometry.

• Pavel Etingof, MIT, Double affine Hecke algebras and their applications.

• Monica Vazirani, U.C. Davis, Combinatorics, categorification, and crystals.


Pi and the Golden Ratio

7 March, 2017

Two of my favorite numbers are pi:

\pi = 3.14159...

and the golden ratio:

\displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} } = 1.6180339...

They’re related:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Greg Egan and I came up with this formula last weekend. It’s probably not new, and it certainly wouldn’t surprise experts, but it’s still fun coming up with a formula like this. Let me explain how we did it.

History has a fractal texture. It’s not exactly self-similar, but the closer you look at any incident, the more fine-grained detail you see. The simplified stories we learn about the history of math and physics in school are like blurry pictures of the Mandelbrot set. You can see the overall shape, but the really exciting stuff is hidden.

François Viète is a French mathematician who doesn’t show up in those simplified stories. He studied law at Poitiers, graduating in 1559. He began his career as an attorney at a quite high level, with cases involving the widow of King Francis I of France and also Mary, Queen of Scots. But his true interest was always mathematics. A friend said he could think about a single question for up to three days, his elbow on the desk, feeding himself without changing position.

Nonetheless, he was highly successful in law. By 1590 he was working for King Henry IV. The king admired his mathematical talents, and Viète soon confirmed his worth by cracking a Spanish cipher, thus allowing the French to read all the Spanish communications they were able to obtain.

In 1591, François Viète came out with an important book, introducing what is called the new algebra: a symbolic method for dealing with polynomial equations. This deserves to be much better known; it was very familiar to Descartes and others, and it was an important precursor to our modern notation and methods. For example, he emphasized care with the use of variables, and advocated denoting known quantities by consonants and unknown quantities by vowels. (Later people switched to using letters near the beginning of the alphabet for known quantities and letters near the end like x,y,z for unknowns.)

In 1593 he came out with another book, Variorum De Rebus Mathematicis Responsorum, Liber VIII. Among other things, it includes a formula for pi. In modernized notation, it looks like this:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

This is remarkable! First of all, it looks cool. Second, it’s the earliest known example of an infinite product in mathematics. Third, it’s the earliest known formula for the exact value of pi. In fact, it seems to be the earliest formula representing a number as the result of an infinite process rather than of a finite calculation! So, Viète’s formula has been called the beginning of analysis. In his article “The life of pi”, Jonathan Borwein went even further and called Viète’s formula “the dawn of modern mathematics”.

How did Viète come up with his formula? I haven’t read his book, but the idea seems fairly clear. The area of the unit circle is pi. So, you can approximate pi better and better by computing the area of a square inscribed in this circle, and then an octagon, and then a 16-gon, and so on:

If you compute these areas in a clever way, you get this series of numbers:

\begin{array}{ccl} A_4 &=& 2 \\  \\ A_8 &=& 2 \cdot \frac{2}{\sqrt{2}} \\  \\ A_{16} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}}  \\  \\ A_{32} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}}  \end{array}

and so on, where A_n is the area of a regular n-gon inscribed in the unit circle. So, it was only a small step for Viète (though an infinite leap for mankind) to conclude that

\displaystyle{ \pi = 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

or, if square roots in a denominator make you uncomfortable:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

The basic idea here would not have surprised Archimedes, who rigorously proved that

223/71 < \pi < 22/7

by approximating the circumference of a circle using a regular 96-gon. Since 96 = 2^5 \times 3, you can draw a regular 96-gon with ruler and compass by taking an equilateral triangle and bisecting its edges to get a hexagon, bisecting the edges of that to get a 12-gon, and so on up to 96. In a more modern way of thinking, you can figure out everything you need to know by starting with the angle \pi/3 and using half-angle formulas 4 times to work out the sine or cosine of \pi/96. And indeed, before Viète came along, Ludolph van Ceulen had computed pi to 35 digits using a regular polygon with 2^{62} sides! So Viète’s daring new idea was to give an exact formula for pi that involved an infinite process.

Now let’s see in detail how Viète’s formula works. Since there’s no need to start with a square, we might as well start with a regular n-gon inscribed in the circle and repeatedly bisect its sides, getting better and better approximations to pi. If we start with a pentagon, we’ll get a formula for pi that involves the golden ratio!

We have

\displaystyle{ \pi = \lim_{k \to \infty} A_k }

so we can also compute pi by starting with a regular n-gon and repeatedly doubling the number of vertices:

\displaystyle{ \pi = \lim_{k \to \infty} A_{2^k n} }

The key trick is to write A_{2^k}{n} as a ‘telescoping product’:

A_{2^k n} = A_n \cdot \frac{A_{2n}}{A_n} \cdot  \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}}

Thus, taking the limit as k \to \infty we get

\displaystyle{ \pi = A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }

where we start with the area of the n-gon and keep ‘correcting’ it to get the area of the 2n-gon, the 4n-gon, the 8n-gon and so on.

There’s a simple formula for the area of a regular n-gon inscribed in a circle. You can chop it into 2 n right triangles, each of which has base \sin(\pi/n) and height \cos(\pi/n), and thus area n \sin(\pi/n) \cos(\pi/n):

Thus,

A_n = n \sin(\pi/n) \cos(\pi/n) = \displaystyle{\frac{n}{2} \sin(2 \pi / n)}

This lets us understand how the area changes when we double the number of vertices:

\displaystyle{ \frac{A_{n}}{A_{2n}} = \frac{\frac{n}{2} \sin(2 \pi / n)}{n \sin(\pi / n)} = \frac{n \sin( \pi / n) \cos(\pi/n)}{n \sin(\pi / n)} = \cos(\pi/n) }

This is nice and simple, but we really need a recursive formula for this quantity. Let’s define

\displaystyle{ R_n = 2\frac{A_{n}}{A_{2n}} = 2 \cos(\pi/n) }

Why the factor of 2? It simplifies our calculations slightly. We can express R_{2n} in terms of R_n using the half-angle formula for the cosine:

\displaystyle{ R_{2n} = 2 \cos(\pi/2n) = 2\sqrt{\frac{1 + \cos(\pi/n)}{2}} = \sqrt{2 + R_n} }

Now we’re ready for some fun! We have

\begin{array}{ccl} \pi &=& \displaystyle{ A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }  \\ \\ & = &\displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{R_{2n}} \cdot \frac{2}{R_{4n}} \cdots } \end{array}

so using our recursive formula R_{2n} = \sqrt{2 + R_n}, which holds for any n, we get

\pi =  \displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{\sqrt{2 + R_n}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_n}}} \cdots }

I think this deserves to be called the generalized Viète formula. And indeed, if we start with a square, we get

A_4 = \displaystyle{\frac{4}{2} \sin(2 \pi / 4)} = 2

and

R_4 = 2 \cos(\pi/4) = \sqrt{2}

giving Viète’s formula:

\pi = \displaystyle{ 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

as desired!

But what if we start with a pentagon? For this it helps to remember a beautiful but slightly obscure trig fact:

\cos(\pi / 5) = \Phi/2

and a slightly less beautiful one:

\displaystyle{ \sin(2\pi / 5) = \frac{1}{2} \sqrt{2 + \Phi} }

It’s easy to prove these, and I’ll show you how later. For now, note that they imply

A_5 = \displaystyle{\frac{5}{2} \sin(2 \pi / 5)} = \frac{5}{4} \sqrt{2 + \Phi}

and

R_5 = 2 \cos(\pi/5) = \Phi

Thus, the formula

\pi =  \displaystyle{ A_5 \cdot \frac{2}{R_5} \cdot \frac{2}{\sqrt{2 + R_5}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_5}}} \cdots }

gives us

\pi =  \displaystyle{ \frac{5}{4} \sqrt{2 + \Phi} \cdot \frac{2}{\Phi} \cdot \frac{2}{\sqrt{2 + \Phi}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdots }

or, cleaning it up a bit, the formula we want:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Voilà!

There’s a lot more to say, but let me just explain the slightly obscure trigonometry facts we needed. To derive these, I find it nice to remember that a regular pentagon, and the pentagram inside it, contain lots of similar triangles:



Using the fact that all these triangles are similar, it’s easy to show that for any one, the ratio of the long side to the short side is \Phi to 1, since

\displaystyle{\Phi = 1 + \frac{1}{\Phi} }

Another important fact is that the pentagram trisects the interior angle of the regular pentagon, breaking the interior angle of 108^\circ = 3\pi/5 into 3 angles of 36^\circ = \pi/5:



Again this is easy and fun to show.

Combining these facts, we can prove that

\displaystyle{ \cos(2\pi/5) = \frac{1}{2\Phi}  }

and

\displaystyle{ \cos(\pi/5) = \frac{\Phi}{2} }

To prove the first equation, chop one of those golden triangles into two right triangles and do things you learned in high school. To prove the second, do the same things to one of the short squat isosceles triangles:

Starting from these equations and using \cos^2 \theta + \sin^2 \theta = 1, we can show

\displaystyle{ \sin(2\pi/5) = \frac{1}{2}\sqrt{2 + \Phi}}

and, just for completeness (we don’t need it here):

\displaystyle{ \sin(\pi/5) = \frac{1}{2}\sqrt{3 - \Phi}}

These require some mildly annoying calculations, where it helps to use the identity

\displaystyle{\frac{1}{\Phi^2} = 2 - \Phi }

Okay, that’s all for now! But if you want more fun, try a couple of puzzles:

Puzzle 1. We’ve gotten formulas for pi starting from a square or a regular pentagon. What formula do you get starting from an equilateral triangle?

Puzzle 2. Using the generalized Viète formula, prove Euler’s formula

\displaystyle{  \frac{\sin x}{x} = \cos\frac{x}{2} \cdot \cos\frac{x}{4} \cdot \cos\frac{x}{8} \cdots }

Conversely, use Euler’s formula to prove the generalized Viète formula.

So, one might say that the real point of Viète’s formula, and its generalized version, is not any special property of pi, but Euler’s formula.


Information Geometry (Part 16)

1 February, 2017

This week I’m giving a talk on biology and information:

• John Baez, Biology as information dynamics, talk for Biological Complexity: Can it be Quantified?, a workshop at the Beyond Center, 2 February 2017.

While preparing this talk, I discovered a cool fact. I doubt it’s new, but I haven’t exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of ‘Fisher’s fundamental theorem of natural selection’. I won’t start by explaining that theorem, since my version looks rather different than Fisher’s, and I came up with mine precisely because I had trouble understanding his. I’ll say a bit more about this at the end.

Here’s my version:

The square of the rate at which a population learns information is the variance of its fitness.

This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!

The setup

Let’s start by assuming we have n different kinds of self-replicating entities with populations P_1, \dots, P_n. As usual, these could be all sorts of things:

• molecules of different chemicals
• organisms belonging to different species
• genes of different alleles
• restaurants belonging to different chains
• people with different beliefs
• game-players with different strategies
• etc.

I’ll call them replicators of different species.

Let’s suppose each population P_i is a function of time that grows at a rate equal to this population times its ‘fitness’. I explained the resulting equation back in Part 9, but it’s pretty simple:

\displaystyle{ \frac{d}{d t} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t)   }

Here f_i is a completely arbitrary smooth function of all the populations! We call it the fitness of the ith species.

This equation is important, so we want a short way to write it. I’ll often write f_i(P_1(t), \dots, P_n(t)) simply as f_i, and P_i(t) simply as P_i. With these abbreviations, which any red-blooded physicist would take for granted, our equation becomes simply this:

\displaystyle{ \frac{dP_i}{d t}  = f_i \, P_i   }

Next, let p_i(t) be the probability that a randomly chosen organism is of the ith species:

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }

Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:

\displaystyle{ \frac{d}{d t} p_i(t)  = ( f_i - \langle f \rangle ) \, p_i(t) }

Here \langle f \rangle is the average fitness of all the replicators, or mean fitness:

\displaystyle{ \langle f \rangle = \sum_j f_j(P_1(t), \dots, P_n(t)) \, p_j(t)  }

In what follows I’ll abbreviate the replicator equation as follows:

\displaystyle{ \frac{dp_i}{d t}  = ( f_i - \langle f \rangle ) \, p_i }

The result

Okay, now let’s figure out how fast the probability distribution

p(t) = (p_1(t), \dots, p_n(t))

changes with time. For this we need to choose a way to measure the length of the vector

\displaystyle{  \frac{dp}{dt} = (\frac{d}{dt} p_1(t), \dots, \frac{d}{dt} p_n(t)) }

And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.

I’ve talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution p(t) changes with time, its speed

\displaystyle{  \left\| \frac{dp}{dt} \right\|}

as measured using the Fisher information metric can be seen as the rate at which information is learned. I’ll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose v and w are two tangent vectors to the point p in the space of probability distributions. Then the Fisher information metric is given as follows:

\displaystyle{ \langle v, w \rangle = \sum_i \frac{1}{p_i} \, v_i w_i }

Using this we can calculate the speed at which p(t) moves when it obeys the replicator equation. Actually the square of the speed is simpler:

\begin{array}{ccl}  \displaystyle{ \left\| \frac{dp}{dt}  \right\|^2 } &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( \frac{dp_i}{dt} \right)^2 } \\ \\  &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( ( f_i - \langle f \rangle ) \, p_i \right)^2 } \\ \\  &=& \displaystyle{ \sum_i  ( f_i - \langle f \rangle )^2 p_i }   \end{array}

The answer has a nice meaning, too! It’s just the variance of the fitness: that is, the square of its standard deviation.

So, if you’re willing to buy my claim that the speed \|dp/dt\| is the rate at which our population learns new information, then we’ve seen that the square of the rate at which a population learns information is the variance of its fitness!

Fisher’s fundamental theorem

Now, how is this related to Fisher’s fundamental theorem of natural selection? First of all, what is Fisher’s fundamental theorem? Here’s what Wikipedia says about it:

It uses some mathematical notation but is not a theorem in the mathematical sense.

It states:

“The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Or in more modern terminology:

“The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time”.

Largely as a result of Fisher’s feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher’s theorem was indeed correct (and that Fisher’s proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection

Price’s paper is here:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 36 (1972), 129–140.

I don’t find it very clear, perhaps because I didn’t spend enough time on it. But I think I get the idea.

My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher’s original equation: the variance of the fitness.

But my left-hand side is different: it’s the square of the speed of the corresponding probability distribution, where speed is measured using the ‘Fisher information metric’. This metric was discovered by the same guy, Ronald Fisher, but I don’t think he used it in his work on the fundamental theorem!

Something a bit similar to my statement appears as Theorem 2 of this paper:

• Marc Harper, Information geometry and evolutionary game theory.

and for that theorem he cites:

• Josef Hofbauer and Karl Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998.

However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher’s fundamental theorem. Moreover, he assumes that the probability distribution p(t) flows along the gradient of a function, and I’m not assuming that. Indeed, my version applies to situations where the probability distribution moves round and round in periodic orbits!

Relative information and the Fisher information metric

The key to generalizing Fisher’s fundamental theorem is thus to focus on the speed at which p(t) moves, rather than the increase in fitness. Why do I call this speed the ‘rate at which the population learns information’? It’s because we’re measuring this speed using the Fisher information metric, which is closely connected to relative information, also known as relative entropy or the Kullback–Leibler divergence.

I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here’s a faster one, which I created while preparing my talk.

The information of a probability distribution q relative to a probability distribution p is

\displaystyle{     I(q,p) = \sum_{i =1}^n q_i \log\left(\frac{q_i}{p_i}\right) }

It says how much information you learn if you start with a hypothesis p saying that the probability of the ith situation was p_i, and then update this to a new hypothesis q.

Now suppose you have a hypothesis that’s changing with time in a smooth way, given by a time-dependent probability p(t). Then a calculation shows that

\displaystyle{ \left.\frac{d}{dt} I(p(t),p(t_0)) \right|_{t = t_0} = 0 }

for all times t_0. This seems paradoxical at first. I like to jokingly put it this way:

To first order, you’re never learning anything.

However, as long as the velocity \frac{d}{dt}p(t_0) is nonzero, we have

\displaystyle{ \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} > 0 }

so we can say

To second order, you’re always learning something… unless your opinions are fixed.

This lets us define a ‘rate of learning’—that is, a ‘speed’ at which the probability distribution p(t) moves. And this is precisely the speed given by the Fisher information metric!

In other words:

\displaystyle{ \left\|\frac{dp}{dt}(t_0)\right\|^2 =  \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} }

where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.

In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a ‘rate at which information is learned’, we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it’s evolving according to the replicator equation, this speed is also just the variance of the fitness!


Biology as Information Dynamics (Part 1)

31 January, 2017

This is my talk for the workshop Biological Complexity: Can It Be Quantified?

• John Baez, Biology as information dynamics, 2 February 2017.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’—a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clean general formulation of Fisher’s fundamental theorem of natural selection.

For more, read:

• Marc Harper, The replicator equation as an inference dynamic.

• Marc Harper, Information geometry and evolutionary game theory.

• Barry Sinervo and Curt M. Lively, The rock-paper-scissors game and the evolution of alternative male strategies, Nature 380 (1996), 240–243.

• John Baez, Diversity, entropy and thermodynamics.

• John Baez, Information geometry.

The last reference contains proofs of the equations shown in red in my slides.
In particular, Part 16 contains a proof of my updated version of Fisher’s fundamental theorem.


Globular

14 December, 2016

One of my goals is to turn category theory, and even higher category theory, into a practical tool for science. For this we need good scientific ideas—but we also need good software.

My friend Jamie Vicary has been developing some of this software, together with Aleks Kissinger and Krzysztof Bar and others. Jamie demonstrated it at the Simons Institute workshop on compositionality. You can watch his demonstration here:

But since Globular runs on a web browser, you can also try it out yourself here:

Globular.

You can see his talk slides:

• Jamie Vicary, Data structures for quasistrict higher categories. (Talk slides here.)

Abstract. Higher category theory is one of the most general approaches to compositionality, with broad and striking applications across computer science, mathematics and physics. We present a new, simple way to define higher categories, in which many important compositional properties emerge as theorems, rather than axioms. Our approach is amenable to computer implementation, and we present a new proof assistant we have developed, with a powerful graphical calculus. In particular, we will outline a substantial new proof we have developed in our setting.

And in December 2015, he wrote an article about this software on the n-Category Café. It’s been improved since then, but it can’t hurt to read what he wrote—so I append it here!

Globular: the basic idea

When you’re trying to prove something in a monoidal category, or a higher category, string diagrams are a really useful technique, especially when you’re trying to get an intuition for what you’re doing. But when it comes to writing up your results, the problems start to mount. For a complex proof, it’s hard to be sure your result is correct—a slip of the pen could lead to a false proof, and an error that’s hard to find. And writing up your results can be a huge time-sink, requiring weeks or months using a graphics package, all just for some nice pictures that tell you little about the correctness of the proof, and become useless if you decide to change your approach. Computers should be able help with all these things, in the way that proof assistants like Coq and Agda are allowing us to work with traditional syntactic proofs in a more sophisticated way.

The purpose of this post is to introduce Globular, a new proof assistant for working with higher-categorical proofs using string diagrams. It’s available at http://globular.science, with documentation on the nab. It’s web-based, so everything happens right in your browser: build formal proofs, visualize and step through them; keep your proofs private, share them with collaborators, or make them publicly available.

Before we get into the technical details, here’s a screenshot of Globular in action:

Screenshot of Globular

The main part of the screen shows a diagram, which in this case is 2-dimensional. It represents a composite 2-cell in a finitely-presented 2-category, with the blue and red regions representing objects, the lines representing 1-cells, and the vertices representing 2-cells. In fact, this 2d diagram is just an intermediate state of a 3d proof, through which we’re navigating with the ‘Slice’ controls in the top-right. The proof itself has been built up by composing the generators listed in the signature, down the left-hand side of the screen. (If you want to take a look at this proof yourself, you can go straight there; in the top-right, set ‘Project’ to 0, then increment the second ‘Slice’ counter to scroll through the proof.)

Globular has been developed so far in the Quantum Group in
the Oxford Computer Science department, by Krzysztof
Bar
, Katherine Casey, Aleks Kissinger, Jamie Vicary and Caspar Wylie. We haven’t quite got around to it yet, but Globular will be open-source, and we’re really keen for people to get involved and help build the software—there’s a huge amount to do! If you want to help out, get in touch.

Mathematical foundations

Globular is based on the theory of finitely-presented semistrict n-categories; at the moment, it works up to the level of 3-categories, with an extension to 4-categories actively in development. (You can build cells of any dimension, but from 4-cells and up, some structures are missing.)

Definitions of n-category vary in how strict they are; a definition is semistrict when it’s as strict as possible, while still having the property that every weak n-category satisfies it, up to equivalence. Definitions of semistrict n-category are not unique: in dimension 3, Gray categories put all the weak structure in the interchangers, while Simpson snucategories put it all in the unitors. Globular implements the axioms of a Gray category, because this is the most appropriate for the graphical calculus: the interchangers can be seen graphically, as changes in height of the components of the diagram. By the theory of k-tuply monoidal n-categories, this also lets you build proofs in a monoidal category, or a braided monoidal category, or a monoidal 2-category.

The only things that Globular understands are k-cells, for some value of k. So if you want to build an n-category where an equation f=g holds between n-cells, you have to do it by adding (n+1)-cells a:f \to g and b:g \to f. If you then build some composite C(f) involving f, you can apply the cell a to obtain C(g), and we interpret this as the equation C(f) = C(g). In a slogan, this is equality via rewriting. This is consistent with the basic premise of homotopy type theory: treat your proofs as first-order structures, which can in turn be reasoned about themselves.

Globular can also handle invertibility in a nice way. For a cell F:A \to B to be invertible, indicated by ticking a box in the signature, means that there also exists an invertible cell F^{-1}: B \to A, and invertible cells \text{id}_A \to F . F^{-1} and \text{id}_B \to F^{-1} . F. This is a coinductive definition (see Mike Shulman’s nice post on this topic), since we’re defining the notion of invertibility in terms of itself in a higher dimension. This sort of a definition is great for proof assistants to work with, as it allows a lot of structure to be generated from a single compact definition.

How it works

For a lot more details, take a look at the nLab page. Everything that happens in Globular involves in interaction between the signature on the left-hand side, and the diagram in the main part of the screen. The signature stores the ‘library’ of cells you have available, and the diagram is a particular composite of cells that you have constructed.

To construct a new diagram, clear whatever is currently displayed by clicking the ‘Clear’ button on the right, or pressing ‘c’. Then start by clicking the icon of a n-cell in your signature, which will make a diagram consisting just of that cell. Clicking on the icons of other k-generators for 0 < k \leq n will display a list of ways the cell can be attached, and when you choose one of these ways, the attachment will be performed, growing your n-diagram. (If you’re starting with a blank workspace you will only have a single 0-cell available, so you won’t be able to do this yet!) Clicking an (n+1)-cell G displays a list of ways that your n-diagram D can be rewritten, by identifying the source of G as a subdiagram of D. Selecting one of these ways will implement the rewrite, by ‘cutting out’ the chosen subdiagram of D, and replacing it with the target of G.

Another way to modify the diagram is to click directly on it. Clicking near the edge of the diagram performs an attachment, while clicking in the interior of the diagram performs a rewrite. If more than one attachment or rewrite is consistent with your click, a little menu will pop up for you to choose what you want to do. When you move your mouse pointer over the diagram, a little label pops up to show you what your cursor is hovering over, which is helpful when choosing where to click.

You can also click-and-drag on the diagram. This will attach or rewrite with an interchanger, or naturality for an interchanger, or invertibility for an interchanger, depending on where you have clicked and the direction of the drag. Clicking and dragging is designed to work as if you were really ‘touching’ the strings. So if you want to braid one strand over another, click the strand to ‘grab’ it, and ‘pull’ it to one side. If you want to pull a vertex through a braiding, click the vertex to ‘grab’ it, and ‘pull’ it up or down through its adjacent braiding. Of course, Globular will only carry out the command if the move you are attempting to make is actually valid in that location.

Example theorems

Here are four worked examples of nontrivial proofs in Globular:

Frobenius implies associative: http://globular.science/1512.004. In a monoidal category, if multiplication and comultiplication morphisms are unital, counital and Frobenius, then they are associative and coassociative.

Strengthening an equivalence: http://globular.science/1512.007. In a 2-category, an equivalence gives rise to an adjoint equivalence, satisfying the snake equations.

Swallowtail comes for free: http://globular.science/1512.006. In a monoidal 2-category, a weakly-dual pair of objects gives rise to a strongly-dual pair, satisfying the swallowtail equations.

Pentagon and triangle implies \lambda_I = \rho_I: http://globular.science/1512.002. In a monoidal 2-category, if a pseudomonoid object satisfies pentagon and triangle equations, then it satisfies \lambda_I = \rho_I.

We’ll focus on the second example project “Strengthening an equivalence” listed above, and see how it was constructed. This project investigates the factthat every equivalence in a 2-category gives rise to an adjoint equivalence. To start, we therefore need the basic data that exhibits an equivalence in a 2-category: two 0-cells A and B, and an invertible 1-cell F:A \to B, by the weak definition of ‘invertible’ discussed above. This gives us the following signature:

The 2-cells that witness invertibility of F look like cups and caps in the graphical calculus, but they won’t satisfy the snake equations that define an adjoint equivalence. The idea of this proof is to define a new cap, built out of the invertible structure of F, which does satisfy the snake equations with the existing cup.

By starting with a diagram consisting of F alone, pressing ‘i’ to take the identity diagram, and clicking-and-dragging, we build the following 2-diagram, out of the invertible structure associated to F:

This is our candidate for our redefined cup. Its source is the identity on A, and its target is F composed with F^{-1}. It looks a bit like the curved end of a hockey stick.

To store it for later use, we now click the ‘Theorem’ button. Writing D for the 2-diagram we have constructed, this does two things. First, it creates a 2-cell generator that we call “New Cup”, whose source is s(D), and whose target is t(D). This is the redefined cup that we can use in future expressions. Second, it creates an invertible 3-cell generator that we call “New Cup Definition”, with source given by “New Cup”, and with target given by our hockey-stick diagram D. This says what “New Cup” means in terms of our original structure. This adds the following cells to our signature:

Because “New Cup Definition” is a 3-cell, by default we see two little icons: one for its source, and one for its target. See how its source is a little picture of “New Cup”, and its target is a little picture of the hockey stick, just as we defined it.

We’re now ready to prove one of the snake equations. We start by building the snake composite, using “New Cup” for the cup, and the invertible structure of F for the cap:

Now have to prove that this equals the identity. Since equality is implemented by rewriting, we must construct a 3-diagram whose source is this snake composite, and whose target is the identity on F. To start, we click the ‘Identity’ button to convert our diagram into an identity 3-diagram. The only apparent effect this has is to add a number scroller to the ‘Slice’ area of the controls in the top-right. At the moment we can set this to ‘0’ and ‘1’, representing the source and target of our identity 3-diagram respectively. We set it to ‘1’, because we want to compose things to the target.

We now build up our proof. First, we click on the pink vertex that represents “New Cup”. This will attach our 3-cell “New Cup Definition”, replacing “New Cup” with our hockey-stick picture. By clicking-and-dragging on the diagram, we obtain the following sequence
of pictures:

           

             
 

Pictures 3 to 10 were created by attaching interchangers, and pictures 11 to 15 were created by attaching higher structure generated by the invertibility of F. In all cases, this structure was attached just by clicking-and-dragging on the appropriate vertices of the diagram. We’ve turned the snake into the identity, so we’ve finished our proof, which required 14 3-cells. By using the ‘Slice’ control in the top-right, we can navigate through the 15 slices that make up our proof, and review what we just did. Even better, turning the ‘Project’ control to the value ‘1’ tells Globular to project out the lowest dimension. This means that our entire 3-diagram proof can be viewed as a single 2-dimensional diagram, as follows:

This is just like the Morse singularity graphics used by topologists to study the structure of higher-dimensional manifolds. In this picture, the vertices are 3-cells, the lines are 2-cells, and the regions are 1-cells (in fact, every region is the 1-cell F.) By moving your mouse pointer over the different parts of the diagram, you can see what the different components are. Interchangers are represented in this projection by braidings.

Now we can do something cool: we can modify our proof, by clicking-and-dragging on the Morse projection. For example, just to the right of centre, there is a crossing, out of which emerge two long vertical lines that travel up a long way before annihilating with one another. Our proof would be much simpler if these two lines just annihilated with each other right after the interchanger. So, we click the vertex at the top of the lines, and drag it down repeatedly, until it gets to where we want it:

We’ve simplified our proof. By clicking-and-dragging some more, you can change the proof in lots of different ways, although you probably won’t get it much simpler than this. Putting the ‘Project’ control back to ‘0’, and navigating through the stages of the proof with the ‘Slice’ control as we were doing before, we can see that our proof has indeed been modified.

This project has been in development for about 18 months, so it feels great to finally launch. We hope the whole community will get clicking-and-dragging, and let us know how easy it is to use, and what other features would be useful. There are certain to still be bugs, so let us know about them too, and we’ll get right on them.


Modelling Interconnected Systems with Decorated Corelations

9 December, 2016

Here at the Simons Institute workshop on compositionality, my talk on network theory explained how to use ‘decorated cospans’ as a general model of open systems. These were invented by Brendan Fong, and are nicely explained in his thesis:

• Brendan Fong, The Algebra of Open and Interconnected Systems. (Blog article here.)

But he went further: to understand the externally observable behavior of an open system we often want to simplify a decorated cospan and get another sort of structure, which he calls a ‘decorated corelation’.

In this talk, Brendan explained decorated corelations and what they’re good for:

• Brendan Fong, Modelling interconnected systems with decorated corelations. (Talk slides here.)

Abstract. Hypergraph categories are monoidal categories in which every object is equipped with a special commutative Frobenius monoid. Morphisms in a hypergraph category can hence be represented by string diagrams in which strings can branch and split: diagrams that are reminiscent of electrical circuit diagrams. As such they provide a framework for formalising the syntax and semantics of circuit-type diagrammatic languages. In this talk I will introduce decorated corelations as a tool for building hypergraph categories and hypergraph functors, drawing examples from linear algebra and dynamical systems.