The Theory of Devices

20 June, 2017

I’m visiting the University of Genoa and talking to two category theorists: Marco Grandis and Giuseppe Rosolini. Grandis works on algebraic topology and higher categories, while Rosolini works on the categorical semantics of programming languages.

Yesterday, Marco Grandis showed me a fascinating paper by his thesis advisor:

• Gabriele Darbo, Aspetti algebrico-categoriali della teoria dei dispotivi, Symposia Mathematica IV (1970), Istituto Nazionale di Alta Matematica, 303–336.

It’s closely connected to Brendan Fong’s thesis, but also different—and, of course, much older. According to Grandis, Darbo was the first person to work on category theory in Italy. He’s better known for other things, like ‘Darbo’s fixed point theorem’, but this piece of work is elegant, and, it seems to me, strangely ahead of its time.

The paper’s title translates as ‘Algebraic-categorical aspects of the theory of devices’, and its main concept is that of a ‘universe of devices’: a collection of devices of some kind that can be hooked up using wires to form more devices of this kind. Nowadays we might study this concept using operads—but operads didn’t exist in 1970, and Darbo did quite fine without them.

The key is the category \mathrm{FinCorel}, which has finite sets as objects and ‘corelations’ as morphisms. I explained corelations here:

Corelations in network theory, 2 February 2016.

Briefly, a corelation from a finite set X to a finite set Y is a partition of the disjoint union of X and Y. We can get such a partition from a bunch of wires connecting points of X and Y. The idea is that two points lie in the same part of the partition iff they’re connected, directly or indirectly, by a path of wires. So, if we have some wires like this:

they determine a corelation like this:

There’s an obvious way to compose corelations, giving a category \mathrm{FinCorel}.

Gabriele Darbo doesn’t call them ‘corelations’: he calls them ‘trasduttori’. A literal translation might be ‘transducers’. But he’s definitely talking about corelations, and like Fong he thinks they are basic for studying ways to connect systems.

Darbo wants a ‘universe of devices’ to assign to each finite set X a set D(X) of devices having X as their set of ‘terminals’. Given a device in D(X) and a corelation f \colon X \to Y, thought of as a bunch of wires, he wants to be able to attach these wires to the terminals in X and get a new device with Y as its set of terminals. Thus he wants a map D(f): D(X) \to D(Y). If you draw some pictures, you’ll see this should give a functor

D : \mathrm{FinCorel} \to \mathrm{Set}

Moreover, if we have device with a set X of terminals and a device with a set Y of terminals, we should be able to set them side by side and get a device whose set of terminals form the set X + Y, meaning the disjoint union of X and Y. So, Darbo wants to have maps

\delta_{X,Y} : D(X) \times D(Y) \to D(X + Y)

If you draw some more pictures you can convince yourself that \delta should be a lax symmetric monoidal functor… if you’re one of the lucky few who knows what that means. If you’re not, you can look it up in many places, such as Section 1.2 here:

• Brendan Fong, The Algebra of Open and Interconnected Systems, Ph.D. thesis, University of Oxford, 2016. (Blog article here.)

Darbo does not mention lax symmetric monoidal functors, perhaps because such concepts were first introduced by Mac Lane only in 1968. But as far as I can tell, Darbo’s definition is almost equivalent to this:

Definition. A universe of devices is a lax symmetric monoidal functor D \colon \mathrm{FinCorel} \to \mathrm{Set}.

One difference is that Darbo wants there to be exactly one device with no terminals. Thus, he assumes D(\emptyset) is a one-element set, say 1, while the definition above would only demand the existence of a map \delta \colon 1 \to D(\emptyset) obeying a couple of axioms. That gives a particular ‘favorite’ device with no terminals. I believe we get Darbo’s definition from the above one if we further assume \delta is the identity map. This makes sense if we take the attitude that ‘a device is determined by its observable behavior’, but not otherwise. This attitude is called ‘black-boxing’.

Darbo does various things in his paper, but the most exciting to me is his example connected to linear electrical circuits. He defines, for any pair of objects V and I in an abelian category C, a particular universe of devices. He calls this the universe of linear devices having V as the object of potentials and I as the object of currents.

If you don’t like abelian categories, think of C as the category of finite-dimensional real vector spaces, and let V = I = \mathbb{R}. Electric potential and electric current are described by real numbers so this makes sense.

The basic idea will be familiar to Fong fans. In an electrical circuit made of purely conductive wires, when two wires merge into one we add the currents to get the current on the wire going out. When one wire splits into two we duplicate the potential to get the potentials on the wires going out. Working this out further, any corelation f : X \to Y between finite set determines two linear relations, one

f_* : I^X \rightharpoonup I^Y

relating the currents on the wires coming in to the currents on the wires going out, and one

f^* : V^Y \rightharpoonup V^X

relating the potentials on the wires going out to the potentials on the wires coming in. Here I^X is the direct sum of X copies of I, and so on; the funky arrow indicates that we have a linear relation rather than a linear map. Note that f_* goes forward while f^* goes backward; this is mainly just conventional, since you can turn linear relations around, but we’ll see it’s sort of nice.

If we let \mathrm{Rel}(A,B) be the set of linear relations between two objects A, B \in C, we can use the above technology to get a universe of devices where

D(X) = \mathrm{Rel}(V^X, I^X)

In other words, a device of this kind is simply a linear relation between the potentials and currents at its terminals!

How does D get to be a functor D : \mathrm{FinCorel} \to \mathrm{FinSet}? That’s pretty easy. We’ve defined it on objects (that is, finite sets) by the above formula. So, suppose we have a morphism (that is, a corelation) f \colon X \to Y. How do we define D(f) : D(X) \to D(Y)?

To answer this question, we need a function

D(f) : \mathrm{Rel}(V^X, I^X) \to \mathrm{Rel}(V^Y, I^Y)

Luckily, we’ve got linear relations

f_* : I^X \rightharpoonup I^Y

and

f^* : V^Y \rightharpoonup V^X

So, given any linear relation R \in \mathrm{Rel}(V^X, I^X), we just define

D(f)(R) = f_* \circ R \circ f^*

Voilà!

People who have read Fong’s thesis, or my paper with Blake Pollard on reaction networks:

• John Baez and Blake Pollard, A compositional framework for reaction networks.

will find many of Darbo’s ideas eerily similar. In particular, the formula

D(f)(R) = f_* \circ R \circ f^*

appears in Lemma 16 of my paper with Blake, where we are defining a category of open dynamical systems. We prove that D is a lax symmetric monoidal functor, which is just what Darbo proved—though in a different context, since our R is not linear like his, and for a different purpose, since he’s trying to show D is a ‘universe of devices’, while we’re trying to construct the category of open dynamical systems as a ‘decorated cospan category’.

In short: if this work of Darbo had become more widely known, the development of network theory could have been sped up by three decades! But there was less interest in a general theory of networks at the time, lax monoidal functors were brand new, operads unknown… and, sadly, few mathematicians read Italian.

Darbo has other papers, and so do his students. We should read them and learn from them! Here are a few open-access ones:

• Franco Parodi, Costruzione di un universo di dispositivi non lineari su una coppia di gruppi abeliani , Rendiconti del Seminario Matematico della Università di Padova 58 (1977), 45–54.

• Franco Parodi, Categoria degli universi di dispositivi e categoria delle T-algebre, Rendiconti del Seminario Matematico della Università di Padova 62 (1980), 1–15.

• Stefano Testa, Su un universo di dispositivi monotoni, Rendiconti del Seminario Matematico della Università di Padova 65 (1981), 53–57.

At some point I will scan in G. Darbo’s paper and make it available here.


The Geometric McKay Correspondence (Part 1)

19 June, 2017

The ‘geometric McKay correspondence’, actually discovered by Patrick du Val in 1934, is a wonderful relation between the Platonic solids and the ADE Dynkin diagrams. In particular, it sets up a connection between two of my favorite things, the icosahedron:

and the \mathrm{E}_8 Dynkin diagram:

When I recently gave a talk on this topic, I realized I didn’t understand it as well as I’d like. Since then I’ve been making progress with the help of this book:

• Alexander Kirillov Jr., Quiver Representations and Quiver Varieties, AMS, Providence, Rhode Island, 2016.

I now think I glimpse a way forward to a very concrete and vivid understanding of the relation between the icosahedron and E8. It’s really just a matter of taking the ideas in this book and working them out concretely in this case. But it takes some thought, at least for me. I’d like to enlist your help.

The rotational symmetry group of the icosahedron is a subgroup of \mathrm{SO}(3) with 60 elements, so its double cover up in \mathrm{SU}(2) has 120. This double cover is called the binary icosahedral group, but I’ll call it \Gamma for short.

This group \Gamma is the star of the show, the link between the icosahedron and E8. To visualize this group, it’s good to think of \mathrm{SU}(2) as the unit quaternions. This lets us think of the elements of \Gamma as 120 points in the unit sphere in 4 dimensions. They are in fact the vertices of a 4-dimensional regular polytope, which looks like this:



It’s called the 600-cell.

Since \Gamma is a subgroup of \mathrm{SU}(2) it acts on \mathbb{C}^2, and we can form the quotient space

S = \mathbb{C}^2/\Gamma

This is a smooth manifold except at the origin—that is, the point coming from 0 \in \mathbb{C}^2. There’s a singularity at the origin, and this where \mathrm{E}_8 is hiding! The reason is that there’s a smooth manifold \widetilde{S} and a map

\pi : \widetilde{S} \to S

that’s one-to-one and onto except at the origin. It maps 8 spheres to the origin! There’s one of these spheres for each dot here:

Two of these spheres intersect in a point if their dots are connected by an edge; otherwise they’re disjoint.

The challenge is to find a nice concrete description of \widetilde{S}, the map \pi : \widetilde{S} \to S, and these 8 spheres.

But first it’s good to get a mental image of S. Each point in this space is a \Gamma orbit in \mathbb{C}^2, meaning a set like this:

\{g x : \; g \in \Gamma \}

for some x \in \mathbb{C}^2. For x = 0 this set is a single point, and that’s what I’ve been calling the ‘origin’. In all other cases it’s 120 points, the vertices of a 600-cell in \mathbb{C}^2. This 600-cell is centered at the point 0 \in \mathbb{C}^2, but it can be big or small, depending on the magnitude of x.

So, as we take a journey starting at the origin in S, we see a point explode into a 600-cell, which grows and perhaps also rotates as we go. The origin, the singularity in S, is a bit like the Big Bang.

Unfortunately not every 600-cell centered at the origin is of the form I’ve shown:

\{g x : \; g \in \Gamma \}

It’s easiest to see this by thinking of points in 4d space as quaternions rather than elements of \mathbb{C}^2. Then the points g \in \Gamma are unit quaternions forming the vertices of a 600-cell, and multiplying g on the right by x dilates this 600-cell and also rotates it… but we don’t get arbitrary rotations this way. To get an arbitrarily rotated 600-cell we’d have to use both a left and right multiplication, and consider

\{x g y : \; g \in \Gamma \}

for a pair of quaternions x, y.

Luckily, there’s a simpler picture of the space S. It’s the space of all regular icosahedra centered at the origin in 3d space!

To see this, we start by switching to the quaternion description, which says

S = \mathbb{H}/\Gamma

Specifying a point x \in \mathbb{H} amounts to specifying the magnitude \|x\| together with x/\|x\|, which is a unit quaternion, or equivalently an element of \mathrm{SU}(2). So, specifying a point in

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying the magnitude \|x\| together with a point in \mathrm{SU}(2)/\Gamma. But \mathrm{SU}(2) modulo the binary icosahedral group \Gamma is the same as \mathrm{SO(3)} modulo the icosahedral group (the rotational symmetry group of an icosahedron). Furthermore, \mathrm{SO(3)} modulo the icosahedral group is just the space of unit-sized icosahedra centered at the origin of \mathbb{R}^3.

So, specifying a point

\{g x : \; g \in \Gamma \} \in \mathbb{H}/\Gamma

amounts to specifying a nonnegative number \|x\| together with a unit-sized icosahedron centered at the origin of \mathbb{R}^3. But this is the same as specifying an icosahedron of arbitrary size centered at the origin of \mathbb{R}^3. There’s just one subtlety: we allow the size of this icosahedron to be zero, but then the way it’s rotated no longer matters.

So, S is the space of icosahedra centered at the origin, with the ‘icosahedron of zero size’ being a singularity in this space. When we pass to the smooth manifold \widetilde{S}, we replace this singularity with 8 spheres, intersecting in a pattern described by the \mathrm{E}_8 Dynkin diagram.

Points on these spheres are limiting cases of icosahedra centered at the origin. We can approach these points by letting an icosahedron centered at the origin shrink to zero size in a clever way, perhaps spinning about wildly as it does.

I don’t understand this last paragraph nearly as well as I’d like! I’m quite sure it’s true, and I know a lot of relevant information, but I don’t see it. There should be a vivid picture of how this works, not just an abstract argument. Next time I’ll start trying to assemble the material that I think needs to go into building this vivid picture.


The Mathematics of Open Reaction Networks

8 June, 2017

Next week, Blake Pollard and I will talk about our work on reaction networks. We’ll do this at Dynamics, Thermodynamics and Information Processing in Chemical Networks, a workshop at the University of Luxembourg organized by Massimiliano Esposito and Matteo Polettini. We’ll do it on Tuesday, 13 June 2017, from 11:00 to 13:00, in room BSC 3.03 of the Bâtiment des Sciences. If you’re around, please stop by and say hi!

Here are the slides for my talk:

The mathematics of open reaction networks.

Abstract. To describe systems composed of interacting parts, scientists and engineers draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Feynman diagrams and the like. In principle all these different diagrams fit into a common framework: the mathematics of monoidal categories. This has been known for some time. However, the details are more challenging, and ultimately more rewarding, than this basic insight. Here we explain how various applications of reaction networks and Petri nets fit into this framework.

If you see typos or other problems please let me know now!

I hope to blog a bit about the workshop… it promises to be very interesting.


The Dodecahedron, the Icosahedron and E8

16 May, 2017


Here you can see the slides of a talk I’m giving:

The dodecahedron, the icosahedron and E8, Annual General Meeting of the Hong Kong Mathematical Society, Hong Kong University of Science and Technology.

It’ll take place on 10:50 am Saturday May 20th in Lecture Theatre G. You can see the program for the whole meeting here.

The slides are in the form of webpages, and you can see references and some other information tucked away at the bottom of each page.

In preparing this talk I learned more about the geometric McKay correspondence, which is a correspondence between the simply-laced Dynkin diagrams (also known as ADE Dynkin diagrams) and the finite subgroups of \mathrm{SU}(2).

There are different ways to get your hands on this correspondence, but the geometric way is to resolve the singularity in \mathbb{C}^2/\Gamma where \Gamma \subset \mathrm{SU}(2) is such a finite subgroup. The variety \mathbb{C}^2/\Gamma has a singularity at the origin–or more precisely, the point coming from the origin in \mathbb{C}^2. To make singularities go away, we ‘resolve’ them. And when you take the ‘minimal resolution’ of this variety (a concept I explain here), you get a smooth variety S with a map

\pi \colon S \to \mathbb{C}^2/\Gamma

which is one-to-one except at the origin. The points that map to the origin lie on a bunch of Riemann spheres. There’s one of these spheres for each dot in some Dynkin diagram—and two of these spheres intersect iff their two dots are connected by an edge!

In particular, if \Gamma is the double cover of the rotational symmetry group of the dodecahedron, the Dynkin diagram we get this way is E_8:

The basic reason \mathrm{E}_8 is connected to the icosahedron is that the icosahedral group is generated by rotations of orders 2, 3 and 5 while the \mathrm{E}_8 Dynkin diagram has ‘legs’ of length 2, 3, and 5 if you count right:

In general, whenever you have a triple of natural numbers a,b,c obeying

\displaystyle{ \frac{1}{a} + \frac{1}{b} + \frac{1}{c} > 1}

you get a finite subgroup of \mathrm{SU}(2) that contains rotations of orders a,b,c, and a simply-laced Dynkin diagram with legs of length a,b,c. The three most exciting cases are:

(a,b,c) = (2,3,3): the tetrahedron, and E_6,

(a,b,c) = (2,3,4): the octahedron, and E_7,

(a,b,c) = (2,3,5): the icosahedron, and E_8.

But the puzzle is this: why does resolving the singular variety \mathbb{C}^2/\Gamma gives a smooth variety with a bunch of copies of the Riemann sphere \mathbb{C}\mathrm{P}^1 sitting over the singular point at the origin, with these copies intersecting in a pattern given by a Dynkin diagram?

It turns out the best explanation is in here:

• Klaus Lamotke, Regular Solids and Isolated Singularities, Vieweg & Sohn, Braunschweig, 1986.

In a nutshell, you need to start by blowing up \mathbb{C}^2 at the origin, getting a space X containing a copy of \mathbb{C}\mathrm{P}^1 on which \Gamma acts. The space X/\Gamma has further singularities coming from the rotations of orders a, b and c in \Gamma. When you resolve these, you get more copies of \mathbb{C}\mathrm{P}^1, which intersect in the pattern given by a Dynkin diagram with legs of length a,b and c.

I would like to understand this better, and more vividly. I want a really clear understanding of the minimal resolution S. For this I should keep rereading Lamotke’s book, and doing more calculations.

I do, however, have a nice vivid picture of the singular space \mathbb{C}^2/\Gamma. For that, read my talk! I’m hoping this will lead, someday, to an equally appealing picture of its minimal resolution.


Periodic Patterns in Peptide Masses

6 April, 2017

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers m_1, ... , m_{20}, how many ways can you write any natural number N as a finite ordered sum of these numbers? Call it F(N) and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for F(N). It obeys a linear recurrence:

F(N) = F(N - m_1) + \cdots + F(N - m_{20})

and we can solve this using the ansatz

F(N) = x^N

Then the recurrence relation will hold if

x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}

for all N. But this is fairly easy to achieve! If m_{20} is the biggest mass, we just need this polynomial equation to hold:

x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1

There will be a bunch of solutions, about m_{20} of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for F(N) we need to find the right linear combination of functions x^N where x ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root x with the largest magnitude will make x^N grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses m_1 = 1, m_2 = 2. Then the numbers F(N) are the Fibonacci numbers. I hope you know this: the Nth Fibonacci number is the number of ways to write N as the sum of an ordered list of 1’s and 2’s!

1

1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

x^2 = x + 1

and the root with largest magnitude is the golden ratio:

\Phi = 1.6180339...

The other root is

1 - \Phi = -0.6180339...

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers m_1 < \cdots < m_k are all the roots of

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

real?

I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that F(N) is asymptotically proportional to x^N where x is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of F(N). The red roots have the second largest magnitude. These give the main oscillations in F(N), which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers m_1 < \cdots < m_k and draw all the roots of the equations

x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The beauty of roots.

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:


But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses m_i are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!


Applied Category Theory

6 April, 2017

The American Mathematical Society is having a meeting here at U. C. Riverside during the weekend of November 4th and 5th, 2017. I’m organizing a session on Applied Category Theory, and I’m looking for people to give talks.

The goal is to start a conversation about applications of category theory, not within pure math or fundamental physics, but to other branches of science and engineering—especially those where the use of category theory is not already well-established! For example, my students and I have been applying category theory to chemistry, electrical engineering, control theory and Markov processes.

Alas, we have no funds for travel and lodging. If you’re interested in giving a talk, please submit an abstract here:

General information about abstracts, American Mathematical Society.

More precisely, please read the information there and then click on the link on that page to submit an abstract. It should then magically fly through cyberspace to me! Abstracts are due September 12th, but the sooner you submit one, the greater the chance that we’ll have space.

For the program of the whole conference, go here:

Fall Western Sectional Meeting, U. C. Riverside, Riverside, California, 4–5 November 2017.

We’ll be having some interesting plenary talks:

• Paul Balmer, UCLA, An invitation to tensor-triangular geometry.

• Pavel Etingof, MIT, Double affine Hecke algebras and their applications.

• Monica Vazirani, U.C. Davis, Combinatorics, categorification, and crystals.


Pi and the Golden Ratio

7 March, 2017

Two of my favorite numbers are pi:

\pi = 3.14159...

and the golden ratio:

\displaystyle{ \Phi = \frac{\sqrt{5} + 1}{2} } = 1.6180339...

They’re related:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Greg Egan and I came up with this formula last weekend. It’s probably not new, and it certainly wouldn’t surprise experts, but it’s still fun coming up with a formula like this. Let me explain how we did it.

History has a fractal texture. It’s not exactly self-similar, but the closer you look at any incident, the more fine-grained detail you see. The simplified stories we learn about the history of math and physics in school are like blurry pictures of the Mandelbrot set. You can see the overall shape, but the really exciting stuff is hidden.

François Viète is a French mathematician who doesn’t show up in those simplified stories. He studied law at Poitiers, graduating in 1559. He began his career as an attorney at a quite high level, with cases involving the widow of King Francis I of France and also Mary, Queen of Scots. But his true interest was always mathematics. A friend said he could think about a single question for up to three days, his elbow on the desk, feeding himself without changing position.

Nonetheless, he was highly successful in law. By 1590 he was working for King Henry IV. The king admired his mathematical talents, and Viète soon confirmed his worth by cracking a Spanish cipher, thus allowing the French to read all the Spanish communications they were able to obtain.

In 1591, François Viète came out with an important book, introducing what is called the new algebra: a symbolic method for dealing with polynomial equations. This deserves to be much better known; it was very familiar to Descartes and others, and it was an important precursor to our modern notation and methods. For example, he emphasized care with the use of variables, and advocated denoting known quantities by consonants and unknown quantities by vowels. (Later people switched to using letters near the beginning of the alphabet for known quantities and letters near the end like x,y,z for unknowns.)

In 1593 he came out with another book, Variorum De Rebus Mathematicis Responsorum, Liber VIII. Among other things, it includes a formula for pi. In modernized notation, it looks like this:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

This is remarkable! First of all, it looks cool. Second, it’s the earliest known example of an infinite product in mathematics. Third, it’s the earliest known formula for the exact value of pi. In fact, it seems to be the earliest formula representing a number as the result of an infinite process rather than of a finite calculation! So, Viète’s formula has been called the beginning of analysis. In his article “The life of pi”, Jonathan Borwein went even further and called Viète’s formula “the dawn of modern mathematics”.

How did Viète come up with his formula? I haven’t read his book, but the idea seems fairly clear. The area of the unit circle is pi. So, you can approximate pi better and better by computing the area of a square inscribed in this circle, and then an octagon, and then a 16-gon, and so on:

If you compute these areas in a clever way, you get this series of numbers:

\begin{array}{ccl} A_4 &=& 2 \\  \\ A_8 &=& 2 \cdot \frac{2}{\sqrt{2}} \\  \\ A_{16} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}}  \\  \\ A_{32} &=& 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}}  \end{array}

and so on, where A_n is the area of a regular n-gon inscribed in the unit circle. So, it was only a small step for Viète (though an infinite leap for mankind) to conclude that

\displaystyle{ \pi = 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

or, if square roots in a denominator make you uncomfortable:

\displaystyle{ \frac2\pi = \frac{\sqrt 2}2 \cdot \frac{\sqrt{2+\sqrt 2}}2 \cdot \frac{\sqrt{2+\sqrt{2+\sqrt 2}}}{2} \cdots}

The basic idea here would not have surprised Archimedes, who rigorously proved that

223/71 < \pi < 22/7

by approximating the circumference of a circle using a regular 96-gon. Since 96 = 2^5 \times 3, you can draw a regular 96-gon with ruler and compass by taking an equilateral triangle and bisecting its edges to get a hexagon, bisecting the edges of that to get a 12-gon, and so on up to 96. In a more modern way of thinking, you can figure out everything you need to know by starting with the angle \pi/3 and using half-angle formulas 4 times to work out the sine or cosine of \pi/96. And indeed, before Viète came along, Ludolph van Ceulen had computed pi to 35 digits using a regular polygon with 2^{62} sides! So Viète’s daring new idea was to give an exact formula for pi that involved an infinite process.

Now let’s see in detail how Viète’s formula works. Since there’s no need to start with a square, we might as well start with a regular n-gon inscribed in the circle and repeatedly bisect its sides, getting better and better approximations to pi. If we start with a pentagon, we’ll get a formula for pi that involves the golden ratio!

We have

\displaystyle{ \pi = \lim_{k \to \infty} A_k }

so we can also compute pi by starting with a regular n-gon and repeatedly doubling the number of vertices:

\displaystyle{ \pi = \lim_{k \to \infty} A_{2^k n} }

The key trick is to write A_{2^k}{n} as a ‘telescoping product’:

A_{2^k n} = A_n \cdot \frac{A_{2n}}{A_n} \cdot  \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}}

Thus, taking the limit as k \to \infty we get

\displaystyle{ \pi = A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }

where we start with the area of the n-gon and keep ‘correcting’ it to get the area of the 2n-gon, the 4n-gon, the 8n-gon and so on.

There’s a simple formula for the area of a regular n-gon inscribed in a circle. You can chop it into 2 n right triangles, each of which has base \sin(\pi/n) and height \cos(\pi/n), and thus area n \sin(\pi/n) \cos(\pi/n):

Thus,

A_n = n \sin(\pi/n) \cos(\pi/n) = \displaystyle{\frac{n}{2} \sin(2 \pi / n)}

This lets us understand how the area changes when we double the number of vertices:

\displaystyle{ \frac{A_{n}}{A_{2n}} = \frac{\frac{n}{2} \sin(2 \pi / n)}{n \sin(\pi / n)} = \frac{n \sin( \pi / n) \cos(\pi/n)}{n \sin(\pi / n)} = \cos(\pi/n) }

This is nice and simple, but we really need a recursive formula for this quantity. Let’s define

\displaystyle{ R_n = 2\frac{A_{n}}{A_{2n}} = 2 \cos(\pi/n) }

Why the factor of 2? It simplifies our calculations slightly. We can express R_{2n} in terms of R_n using the half-angle formula for the cosine:

\displaystyle{ R_{2n} = 2 \cos(\pi/2n) = 2\sqrt{\frac{1 + \cos(\pi/n)}{2}} = \sqrt{2 + R_n} }

Now we’re ready for some fun! We have

\begin{array}{ccl} \pi &=& \displaystyle{ A_n \cdot \frac{A_{2n}}{A_n} \cdot \frac{A_{4n}}{A_{2n}} \cdot \frac{A_{8n}}{A_{4n}} \cdots }  \\ \\ & = &\displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{R_{2n}} \cdot \frac{2}{R_{4n}} \cdots } \end{array}

so using our recursive formula R_{2n} = \sqrt{2 + R_n}, which holds for any n, we get

\pi =  \displaystyle{ A_n \cdot \frac{2}{R_n} \cdot \frac{2}{\sqrt{2 + R_n}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_n}}} \cdots }

I think this deserves to be called the generalized Viète formula. And indeed, if we start with a square, we get

A_4 = \displaystyle{\frac{4}{2} \sin(2 \pi / 4)} = 2

and

R_4 = 2 \cos(\pi/4) = \sqrt{2}

giving Viète’s formula:

\pi = \displaystyle{ 2 \cdot \frac{2}{\sqrt{2}} \cdot \frac{2}{\sqrt{2 + \sqrt{2}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2}}}} \cdots }

as desired!

But what if we start with a pentagon? For this it helps to remember a beautiful but slightly obscure trig fact:

\cos(\pi / 5) = \Phi/2

and a slightly less beautiful one:

\displaystyle{ \sin(2\pi / 5) = \frac{1}{2} \sqrt{2 + \Phi} }

It’s easy to prove these, and I’ll show you how later. For now, note that they imply

A_5 = \displaystyle{\frac{5}{2} \sin(2 \pi / 5)} = \frac{5}{4} \sqrt{2 + \Phi}

and

R_5 = 2 \cos(\pi/5) = \Phi

Thus, the formula

\pi =  \displaystyle{ A_5 \cdot \frac{2}{R_5} \cdot \frac{2}{\sqrt{2 + R_5}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + R_5}}} \cdots }

gives us

\pi =  \displaystyle{ \frac{5}{4} \sqrt{2 + \Phi} \cdot \frac{2}{\Phi} \cdot \frac{2}{\sqrt{2 + \Phi}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdots }

or, cleaning it up a bit, the formula we want:

\pi = \frac{5}{\Phi} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \Phi}}} \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}  \cdot \frac{2}{\sqrt{2 + \sqrt{2 + \sqrt{2 + \sqrt{2 + \Phi}}}}} \cdots

Voilà!

There’s a lot more to say, but let me just explain the slightly obscure trigonometry facts we needed. To derive these, I find it nice to remember that a regular pentagon, and the pentagram inside it, contain lots of similar triangles:



Using the fact that all these triangles are similar, it’s easy to show that for any one, the ratio of the long side to the short side is \Phi to 1, since

\displaystyle{\Phi = 1 + \frac{1}{\Phi} }

Another important fact is that the pentagram trisects the interior angle of the regular pentagon, breaking the interior angle of 108^\circ = 3\pi/5 into 3 angles of 36^\circ = \pi/5:



Again this is easy and fun to show.

Combining these facts, we can prove that

\displaystyle{ \cos(2\pi/5) = \frac{1}{2\Phi}  }

and

\displaystyle{ \cos(\pi/5) = \frac{\Phi}{2} }

To prove the first equation, chop one of those golden triangles into two right triangles and do things you learned in high school. To prove the second, do the same things to one of the short squat isosceles triangles:

Starting from these equations and using \cos^2 \theta + \sin^2 \theta = 1, we can show

\displaystyle{ \sin(2\pi/5) = \frac{1}{2}\sqrt{2 + \Phi}}

and, just for completeness (we don’t need it here):

\displaystyle{ \sin(\pi/5) = \frac{1}{2}\sqrt{3 - \Phi}}

These require some mildly annoying calculations, where it helps to use the identity

\displaystyle{\frac{1}{\Phi^2} = 2 - \Phi }

Okay, that’s all for now! But if you want more fun, try a couple of puzzles:

Puzzle 1. We’ve gotten formulas for pi starting from a square or a regular pentagon. What formula do you get starting from an equilateral triangle?

Puzzle 2. Using the generalized Viète formula, prove Euler’s formula

\displaystyle{  \frac{\sin x}{x} = \cos\frac{x}{2} \cdot \cos\frac{x}{4} \cdot \cos\frac{x}{8} \cdots }

Conversely, use Euler’s formula to prove the generalized Viète formula.

So, one might say that the real point of Viète’s formula, and its generalized version, is not any special property of pi, but Euler’s formula.