Geometry Puzzles

12 January, 2014

Here are four puzzles about areas, in approximate order of increasing difficulty.

Mysteries of the equilateral triangle


Puzzle: Show the area of the orange circle equals the total area of the two blue regions.

In case you’re wondering, the picture above shows an equilateral triangle with a small circle inscribed in it and a big circle circumscribed around it.

The puzzle is easy if you think about it the right way. I never knew this cool fact until last month, when I read this fun free book:

• Brian McCartin, Mysteries of the equilateral triangle.

I’ve given talks on the numbers 5, 8, and 24. I’ve imagined writing a book that had chapters on all my favorite natural numbers, starting with Chapter 0. But it would be very hard to write Chapter 3, since this number has too many interesting properties! McCartin’s book covers a lot of them.

Square the lune to the tune of Claire de Lune


Puzzle: Show the crescent has the same area as the triangle.

Greek mathematicians really wanted to square the circle, by which I mean: use straightedge and compass to first draw a circle and then construct a square with the same area.

In 440 BC, Hippocrates of Chios figured out how to square the above crescent-shaped region, which lies between the circle centered at O and the smaller circle centered at D. So, this region is called the Lune of Hippocrates.

I’ve heard it said that this result gave some Greek geometers hope that the circle could be squared. I’m not sure this is true; Hippocrates himself was probably too smart to be fooled. But in any event, it would have been a false hope. Much later, around 1885, Lindemann and Weierstrass proved that squaring the circle was impossible.

Any crescent-shaped region formed by two circular arcs is called a lune. It’s widely believed that there are only 5 squarable lunes. In other words, there are only 5 shapes of lune constructible by straightedge and compass whose area equals that of a square constructible using straightedge and compass. (Obviously these lunes can come in many different sizes.)

Hippocrates discovered three squarable lunes. Two more were discovered by Martin Johan Wallenius in 1766. A proof that these are the only squarable lunes was given by Tchebatorew and Dorodnow, and summarized by the famous topologist Postnikov:

• M. M. Postnikov, The problem of squarable Lunes, translated from the Russian by Abe Shenitzer, American Mathematical Monthly, 107 (Aug.-Sep. 2000), 645–651.

However, there may be a loophole in this proof: Will Jagy claims that these Russians assumed without proof that the ratio of two angles involved in the construction of the lune is rational. With this assumption, finding a squarable lune amounts to finding a rational number u and a constructible number \sin a such that

\sin ( u a) = \sqrt{u} \sin a

Puzzle: Why should u be rational? Do you know the true state of the art here?

(My puzzles include hard questions I don’t know the answer to, and this is one.)

For a nice discussion of the 5 squarable lunes, with pictures, see this page:

The five squarable lunes, MathPages.

Twice in a blue lune


Puzzle: Show the area of this right triangle equals the total area inside the blue lunes. The outside of each lune is a semicircle. The inside of each lune is part of the circle containing the points A, B, and C.

The circle through all 3 corners of a triangle is called its circumcircle. You can construct the circumcircle using a straightedge and compass, if you want.

Again this is a famous old problem. The two blue lunes here are called the Lunes of Alhazen. This problem was later posed and solve by Leonardo da Vinci!

This puzzle is a lot of fun, so I urge you not to give up—but if you do, you can see da Vinci’s solution here.

The arbelos


Puzzle: show the area of the green shape equals the area of the circle.

The green shape is called an arbelos, which means ‘shoemaker’s knife’ in Greek, since it looks a bit like that. It shows up in Propositions 4-8 of the Book of Lemmas. This book goes back at least to Thābit ibn Qurra, a mathematician, astronomer and physician who live in Baghdad from 826 to 901 AD. Ibn Qurra said the book was written by Archimedes! But nobody is sure.

So, when you do this puzzle, you may be matching wits with Archimedes. But you’ll certainly be sharing some thoughts with Thābit ibn Qurra.

Math has been called the world’s longest conversation. Perhaps this is an exaggeration: people have been passing on stories for a long time. But Babylonians were computing the square root of two back in 1700 BC, probably using techniques that are still interesting today… so it really is remarkable how old mathematics still makes good sense, unlike the old creation myths.

For more on the arbelos try:

Arbelos, Wikipedia.

and

• Thomas Schoch, Arbelos references, 2013.

For the 15 propositions in the Book of Lemmas, see:

Book of Lemmas, Wikipedia.

Two semicircles is half a circle


Puzzle: show the total area of the two semicircles is half the area of the large circle.

Amazingly, it seems this fact was noticed only in 2011:

• Andrew K. Jobbings, Two semicircles fill half a circle, The Mathematical Gazette 95 (Nov. 2011), 538–540.

However, Andrew Jobbings is a genius when it comes to ‘recreational mathematics’. For more of his work, check out this page:

• Andrew Jobbings, Arbelos.

I learned about this puzzle from Alexander Bogomolny, who has a wonderful website full of Javascript geometry demonstrations:

• Alexander Bogomolny, A property of semicircles.

You’ll get a scary warning asking “Do you want to run this application?” Say yes. You’ll get an applet that lets you slide the point where the semicircles touch: no matter where it is, the semicircles have the same total area! Click “hint” and you’ll get a hint. If you’re still stuck, and too impatient to solve the puzzle yourself, scroll down and see a proof!

However, that proof is long and confusing. With geometry I like demonstrations where after some thought you can simply see that the result is true. For this puzzle, such a demonstration was provided by Greg Egan.

Click here if you give up on this puzzle and want to see Egan’s solution. You may need to ponder it a bit, but when you’re done you should say “yes, this result is clear!

As a separate hint: the answers to this puzzle and the previous one are similar, in a very nice way!


Lyapunov Functions for Complex-Balanced Systems

7 January, 2014

guest post by Manoj Gopalkrishnan

A few weeks back, I promised to tell you more about a long-standing open problem in reaction networks, the ‘global attractor conjecture’. I am not going to quite get there today, but we shall take one step in that direction.

Today’s plan is to help you make friends with a very useful function we will call the ‘free energy’ which comes up all the time in the study of chemical reaction networks. We will see that for complex-balanced systems, the free energy function decreases along trajectories of the rate equation. I’m going to explain this statement, and give you most of the proof!

The point of doing all this work is that we will then be able to invoke Lyapunov’s theorem which implies stability of the dynamics. In Greek mythology, Sisyphus was cursed to roll a boulder up a hill only to have it roll down again, so that he had to keep repeating the task for all eternity. When I think of an unstable equilibrium, I imagine a boulder delicately balanced on top of a hill, which will fall off if given the slightest push:

or, more abstractly:

On the other hand, I picture a stable equilibrium as a pebble at the very bottom of a hill. Whichever way a perturbation takes it is up, so it will roll down again to the bottom:

Lyapunov’s theorem guarantees stability provided we can exhibit a nice enough function V that decreases along trajectories. ‘Nice enough’ means that, viewing V as a height function for the hill, the equilibrium configuration should be at the bottom, and every direction from there should be up. If Sisyphus had dug a pit at the top of the hill for the boulder to rest in, Lyapunov’s theorem would have applied, and he could have gone home to rest. The moral of the story is that it pays to learn dynamical systems theory!

Because of the connection to Lyapunov’s theorem, such functions that decrease along trajectories are also called Lyapunov functions. A similar situation is seen in Boltzmann’s H-theorem, and hence such functions are sometimes called H-functions by physicists.

Another reason for me to talk about these ideas now is that I have posted a new article on the arXiv:

• Manoj Gopalkrishnan, On the Lyapunov function for complex-balanced mass-action systems.

The free energy function in chemical reaction networks goes back at least to 1972, to this paper:

• Friedrich Horn and Roy Jackson, General mass action kinetics, Arch. Rational Mech. Analysis 49 (1972), 81–116.

Many of us credit Horn and Jackson’s paper with starting the mathematical study of reaction networks. My paper is an exposition of the main result of Horn and Jackson, with a shorter and simpler proof. The gain comes because Horn and Jackson proved all their results from scratch, whereas I’m using some easy results from graph theory, and the log-sum inequality.

We shall be talking about reaction networks. Remember the idea from the network theory series. We have a set S whose elements are called species, for example

S = \{ \mathrm{H}_2\mathrm{O}, \mathrm{H}^+, \mathrm{OH}^- \}

A complex is a vector of natural numbers saying how many items of each species we have. For example, we could have a complex (2,3,1). But chemists would usually write this as

2 \mathrm{H}_2\mathrm{O} + 3 \mathrm{H}^+ + \mathrm{OH}^-

A reaction network is a set S of species and a set T of transitions or reactions, where each transition \tau \in T goes from some complex m(\tau) to some complex n(\tau). For example, we could have a transition \tau with

m(\tau) = \mathrm{H}_2\mathrm{O}

and

n(\tau) = \mathrm{H}^+ + \mathrm{OH}^-

In this situation chemists usually write

\mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

but we want names like \tau for our transitions, so we might write

\tau : \mathrm{H}_2\mathrm{O} \to \mathrm{H}^+ + \mathrm{OH}^-

or

\mathrm{H}_2\mathrm{O} \stackrel{\tau}{\longrightarrow} \mathrm{H}^+ + \mathrm{OH}^-

As John explained in Part 3 of the network theory series, chemists like to work with a vector of nonnegative real numbers x(t) saying the concentration of each species at time t. If we know a rate constant r(\tau) > 0 for each transition \tau, we can write down an equation saying how these concentrations change with time:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T} r(\tau) (n(\tau) - m(\tau)) x^{m(\tau)} }

This is called the rate equation. It’s really a system of ODEs describing how the concentration of each species change with time. Here an expression like x^m is shorthand for the monomial {x_1}^{m_1} \cdots {x_k}^{m_k}.

John and Brendan talked about complex balance in Part 9. I’m going to recall this definition, from a slightly different point of view that will be helpful for the result we are trying to prove.

We can draw a reaction network as a graph! The vertices of this graph are all the complexes m(\tau), n(\tau) where \tau \in T. The edges are all the transitions \tau\in T. We think of each edge \tau as directed, going from m(\tau) to n(\tau).

We will call the map that sends each transition \tau to the positive real number r(\tau) x^{m(\tau)} the flow f_x(\tau) on this graph. The rate equation can be rewritten very simply in terms of this flow as:

\displaystyle{ \frac{d x}{d t} = \sum_{\tau \in T}(n(\tau) - m(\tau)) \, f_x(\tau) }

where the right-hand side is now a linear expression in the flow f_x.

Flows of water, or electric current, obey a version of Kirchhoff’s current law. Such flows are called conservative flows. The following two lemmas from graph theory are immediate for conservative flows:

Lemma 1. If f is a conservative flow then the net flow across every cut is zero.

A cut is a way of chopping the graph in two, like this:


It’s easy to prove Lemma 1 by induction, moving one vertex across the cut at a time.

Lemma 2. If a conservative flow exists then every edge \tau\in T is part of a directed cycle.

Why is Lemma 2 true? Suppose there exists an edge \tau : m \to n that is not part of any directed cycle. We will exhibit a cut with non-zero net flow. By Lemma 1, this will imply that the flow is not conservative.

One side of the cut will consist of all vertices from which m is reachable by a directed path in the reaction network. The other side of the cut contains at least n, since m is not reachable from n, by the assumption that \tau is not part of a directed cycle. There is flow going from left to right of the cut, across the transition \tau. Since there can be no flow coming back, this cut has nonzero net flow, and we’re done.     ▮

Now, back to the rate equation! We can ask if the flow f_x is conservative. That is, we can ask if, for every complex n:

\displaystyle{ \sum_{\tau : m \to n} f_x(m,n) = \sum_{\tau : n \to p} f_x(n,p). }

In words, we are asking if the sum of the flow through all transitions coming in to n equals the sum of the flow through all transitions going out of n. If this condition is satisfied at a vector of concentrations x = \alpha, so that the flow f_\alpha is conservative, then we call \alpha a point of complex balance. If in addition, every component of \alpha is strictly positive, then we say that the system is complex balanced.

Clearly if \alpha is a point of complex balance, it’s an equilibrium solution of the rate equation. In other words, x(t) = \alpha is a solution of the rate equation, where x(t) never changes.

I’m using ‘equilibrium’ the way mathematicians do. But I should warn you that chemists use ‘equilibrium’ to mean something more than merely a solution that doesn’t change with time. They often also mean it’s a point of complex balance, or even more. People actually get into arguments about this at conferences.

Complex balance implies more than mere equilibrium. For starters, if a reaction network is such that every edge belongs to a directed cycle, then one says that the reaction network is weakly reversible. So Lemmas 1 and 2 establish that complex-balanced systems must be weakly reversible!

From here on, we fix a complex-balanced system, with \alpha a strictly positive point of complex balance.

Definition. The free energy function is the function

g_\alpha(x) = \sum_{s\in S} x_s \log x_s - x_s - x_s \log \alpha_s

where the sum is over all species in S.

The whole point of defining the function this way is because it is the unique function, up to an additive constant, whose partial derivative with respect to x_s is \log x_s/\alpha_s. This is important enough that we write it as a lemma. To state it in a pithy way, it is helpful to introduce vector notation for division and logarithms. If x and y are two vectors, we will understand x/y to mean the vector z such that z_s = x_s/ y_s coordinate-wise. Similarly \log x is defined in a coordinate-wise sense as the vector with coordinates (\log x)_s = \log x_s.

Lemma 3. The gradient \nabla g_\alpha(x) of g_\alpha(x) equals \log(x/\alpha).

We’re ready to state our main theorem!

Theorem. Fix a trajectory x(t) of the rate equation. Then g_\alpha(x(t)) is a decreasing function of time t. Further, it is strictly decreasing unless x(t) is an equilibrium solution of the rate equation.

I find precise mathematical statements reassuring. You can often make up your mind about the truth value from a few examples. Very often, though not always, a few well-chosen examples are all you need to get the general idea for the proof. Such is the case for the above theorem. There are three key examples: the two-cycle, the three-cycle, and the figure-eight.

The two-cycle. The two-cycle is this reaction network:

It has two complexes m and n and two transitions \tau_1 = m\to n and \tau_2 = n\to m, with rates r_1 = r(\tau_1) and r_2 = r(\tau_2) respectively.

Fix a solution x(t) of the rate equation. Then the flow from m to n equals r_1 x^m and the backward flow equals r_2 x^n. The condition for f_\alpha to be a conservative flow requires that f_\alpha = r_1 \alpha^m = r_2 \alpha^n. This is one binomial equation in at least one variable, and clearly has a solution in the positive reals. We have just shown that every two-cycle is complex balanced.

The derivative d g_\alpha(x(t))/d t can now be computed by the chain rule, using Lemma 3. It works out to f_\alpha times

\displaystyle{ \left((x/\alpha)^m - (x/\alpha)^n\right) \, \log\frac{(x/\alpha)^n}{(x/\alpha)^m} }

This is never positive, and it’s zero if and only if

(x/\alpha)^m = (x/\alpha)^n

Why is this? Simply because the logarithm of something greater than 1 is positive, while the log of something less than 1 is negative, so that the sign of (x/\alpha)^m - (x/\alpha)^n is always opposite the sign of \log \frac{(x/\alpha)^n}{(x/\alpha)^m}. We have verified our theorem for this example.

(Note that (x/\alpha)^m = (x/\alpha)^n occurs when x = \alpha, but also at other points: in this example, there is a whole hypersurface consisting of points of complex balance.)

In fact, this simple calculation achieves much more.

Definition. A reaction network is reversible if for every transition \tau  : m \to n there is a transition \tau' : m \to n going back, called the reverse of \tau. Suppose we have a reversible reaction network and a vector of concentrations \alpha such that the flow along each edge equals that along the edge going back:

f_\alpha(\tau) = f_\alpha(\tau')

whenever \tau' is the reverse \tau. Then we say the reaction network is detailed balanced, and \alpha is a point of detailed balance.

For a detailed-balanced system, the time derivative of g_\alpha is a sum over the contributions of pairs consisting of an edge and its reverse. Hence, the two-cycle calculation shows that the theorem holds for all detailed balanced systems!

This linearity trick is going to prove very valuable. It will allow us to treat the general case of complex balanced systems one cycle at a time. The proof for a single cycle is essentially contained in the example of a three-cycle, which we treat next:

The three-cycle. The three-cycle is this reaction network:

We assume that the system is complex balanced, so that

f_\alpha(m_1\to m_2) = f_\alpha(m_2\to m_3) = f_\alpha(m_3\to m_1)

Let us call this nonnegative number f_\alpha. A small calculation employing the chain rule shows that d g_\alpha(x(t))/d t equals f_\alpha times

\displaystyle{ (x/\alpha)^{m_1}\, \log\frac{(x/\alpha)^{m_2}}{(x/\alpha)^{m_1}} \; + }

\displaystyle{ (x/\alpha)^{m_2} \, \log\frac{(x/\alpha)^{m_3}}{(x/\alpha)^{m_2}} \; + }

\displaystyle{  (x/\alpha)^{m_3}\, \log\frac{(x/\alpha)^{m_1}}{(x/\alpha)^{m_3}} }

We need to think about the sign of this quantity:

Lemma 3. Let a,b,c be positive numbers. Then a \log b/a + b\log c/b + c\log a/c is less than or equal to zero, with equality precisely when a=b=c.

The proof is a direct application of the log sum inequality. In fact, this holds not just for three numbers, but for any finite list of numbers. Indeed, that is precisely how one obtains the proof for cycles of arbitrary length. Even the two-cycle proof is a special case! If you are wondering how the log sum inequality is proved, it is an application of Jensen’s inequality, that workhorse of convex analysis.

The three-cycle calculation extends to a proof for the theorem so long as there is no directed edge that is shared between two directed cycles. When there are such edges, we need to argue that the flows f_\alpha and f_x can be split between the cycles sharing that edge in a consistent manner, so that the cycles can be analyzed independently. We will need the following simple lemma about conservative flows from graph theory. We will apply this lemma to the flow f_\alpha.

Lemma 4. Let f be a conservative flow on a graph G. Then there exist directed cycles C_1, C_2,\dots, C_k in G, and nonnegative real ‘flows’ f_1,f_2,\dots,f_k \in [0,\infty] such that for each directed edge e in G, the flow f(e) equals the sum of f_i over i such the cycle C_i contains the edge e.

Intuitively, this lemma says that conservative flows come from constant flows on the directed cycles of the graph. How does one show this lemma? I’m sure there are several proofs, and I hope some of you can share some of the really neat ones with me. The one I employed was algorithmic. The idea is to pick a cycle, any cycle, and subtract the maximum constant flow that this cycle allows, and repeat. This is most easily understood by looking at the example of the figure-eight:

The figure-eight. This reaction network consists of two three-cycles sharing an edge:

Here’s the proof for Lemma 4. Let f be a conservative flow on this graph. We want to exhibit cycles and flows on this graph according to Lemma 4. We arbitrarily pick any cycle in the graph. For example, in the figure-eight, suppose we pick the cycle u_0\to u_1\to u_2\to u_0. We pick an edge in this cycle on which the flow is minimum. In this case, f(u_0\to u_1) = f(u_2\to u_0) is the minimum. We define a remainder flow by subtracting from f this constant flow which was restricted to one cycle. So the remainder flow is the same as f on edges that don’t belong to the picked cycle. For edges that belong to the cycle, the remainder flow is f minus the minimum of f on this cycle. We observe that this remainder flow satisfies the conditions of Lemma 4 on a graph with strictly fewer edges. Continuing in this way, since the lemma is trivially true for the empty graph, we are done by infinite descent.

Now that we know how to split the flow f_\alpha across cycles, we can figure out how to split the rates across the different cycles. This will tell us how to split the flow f_x across cycles. Again, this is best illustrated by an example.

The figure-eight. Again, this reaction network looks like

Suppose as in Lemma 4, we obtain the cycles

C_1 = u_0\to u_1\to u_2\to u_0

with constant flow f_\alpha^1

and

C_2 = u_3\to u_1\to u_2\to u_3 with constant flow f_\alpha^2 such that

f_\alpha^1 + f_\alpha^2 = f_\alpha(u_1\to u_2)

Here’s the picture:

Then we obtain rates r^1(u_1\to u_2) and r^2(u_1\to u_2) by solving the equations

f^1_\alpha = r^1(u_1\to u_2) \alpha^{u_1}

f^2_\alpha = r^2(u_1\to u_2) \alpha^{u_2}

Using these rates, we can define non-constant flows f^1_x on C_1 and f^2_x on C_2 by the usual formulas:

f^1_x(u_1\to u_2) = r^1(u_1\to u_2) x^{u_1}

and similarly for f^2_x. In particular, this gives us

f^1_x(u_1\to u_2)/f^1_\alpha = (x/\alpha)^{u_1}

and similarly for f^2_x.

Using this, we obtain the proof of the Theorem! The time derivative of g_\alpha along a trajectory has a contribution from each cycle C as in Lemma 4, where each cycle is treated as a separate system with the new rates r^C, and the new flows f^C_\alpha and f^C_x. So, we’ve reduced the problem to the case of a cycle, which we’ve already done.

Let’s review what happened. The time derivative of the function g_\alpha has a very nice form, which is linear in the flow f_x. The reaction network can be broken up into cycles. Th e conservative flow f_\alpha for a complex balanced system can be split into conservative flows on cycles by Lemma 4. This informs us how to split the non-conservative flow f_x across cycles. By linearity of the time derivative, we can separately treat the case for every cycle. For each cycle, we get an expression to which the log sum inequality applies, giving us the final result that g_\alpha decreases along trajectories of the rate equation.

Now that we have a Lyapunov function, we will put it to use to obtain some nice theorems about the dynamics, and finally state the global attractor conjecture. All that and more, in the next blog post!


2014 on Azimuth

31 December, 2013

 

Happy New Year! We’ve got some fun guest posts lined up for next year, including:

Marc Harper, Relative entropy in evolutionary dynamics.

Marc Harper uses ideas from information theory in his work on bioinformatics and evolutionary game theory. This article explains some of his new work. And as a warmup, it explains how relative entropy can serve as a Lyapunov function in evolution!

This includes answering the question:

“What is a Lyapunov function, and why should I care?”

The brief answer, in case you’re eager to know, is this. A Lyapunov function is something that always increases—or always decreases—as time goes on. Examples include entropy and free energy. So, a Lyapunov function can be a way of making the 2nd law of thermodynamics mathematically precise! It’s also a way to show things are approaching equilibrium.

The overall goal here is applying entropy and information theory to better understand the behavior of biological and ecological systems. And in April 2015, Marc Harper and I are helping run a workshop on this topic! We’re doing this with John Harte, an ecologist who uses maximum entropy methods to predict the distribution, abundance and energy usage of species. It should be really interesting!

But back to blog articles:

Manoj Gopalkrishnan, Lyapunov functions for complex-balanced systems.

Manoj Gopalkrishnan is a mathematician at the Tata Institute of Fundamental Research in Mumbai who works on problems coming from chemistry and biology. This post will explain his recent paper on a Lyapunov function for chemical reactions. This function is closely related to free energy, a concept from thermodynamics. So again, one of the overall goals is to apply entropy to better understand living systems.

Since some evolutionary games are isomorphic to chemical reaction networks, this post should be connected to Marc’s. But there’s some mental work left to make the connection—for me, at least. It should be really cool when it all fits together!

Alastair Jamieson-Lane, Stochastic cross impact balance analysis.

Alastair Jamieson-Lane is a mathematician in the master’s program at the University of British Columbia. Very roughly, this post is about a method for determining which economic scenarios are more likely. The likely scenarios get fed into things like the IPCC climate models, so this is important.

This blog article has an interesting origin. Vanessa Schweizer has a bachelor’s degree in physics, a masters in environmental studies, and a PhD in engineering and public policy. She now works at the University of Waterloo on long-term decision-making problems.

A while back, I met Vanessa at a workshop called What Is Climate Change and What To Do About It?, at the Balsillie School of International Affairs, which is in Waterloo. She described her work with Alastair Jamieson-Lane and the physicist Matteo Smerlak on stochastic cross impact balance analysis. It sounded really interesting, something I’d like to work on. So I solicited some blog articles from them. I hope this is just the first!

So: Happy New Year, and good reading!

Also: we’re always looking for good guest posts here on Azimuth, and we have a system for helping you write them. So, if you know something interesting about environmental or energy issues, ecology, biology or chemistry, consider giving it a try!

If you read some posts here, especially guest posts, you’ll get an idea of what we’re looking for. David Tanzer, a software developer in New York who is very active in the Azimuth Project these days, made an organized list of Azimuth blog posts here:

Azimuth Blog Overview.

You can see the guest posts listed by author. This overview is also great for catching up on old posts!


Logic, Probability and Reflection

26 December, 2013

 

Last week I attended the Machine Intelligence Research Institute’s sixth Workshop on Logic, Probability, and Reflection. This one was in Berkeley, where the institute has its headquarters.

You may know this institute under their previous name: the Singularity Institute. It seems to be the brainchild of Eliezer Yudkowsky, a well-known advocate of ‘friendly artificial intelligence’, whom I interviewed in week311, week312 and week313 of This Week’s Finds. He takes an approach to artificial intelligence that’s heavily influenced by mathematical logic, and I got invited to the workshop because I blogged about a paper he wrote with Mihaly Barasz, Paul Christiano and Marcello Herresho ff on probability theory and logic.

I only have the energy to lay the groundwork for a good explanation of what happened in the workshop. So, after you read my post, please read this:

• Benja Fallenstein, Results from MIRI’s December workshop, Less Wrong, 28 December 2013.

The workshop had two main themes, so let me tell you what they were.

Scientific induction in mathematics

The first theme is related to that paper I just mentioned. How should a rational agent assign probabilities to statements in mathematics? Of course an omniscient being could assign

probability 1 to every mathematical statement that’s provable,

probability 0 to every statement whose negation is provable,

and

to every statement that is neither provable nor disprovable.

But a real-world rational agent will never have time to check all proofs, so there will always be lots of statements it’s not sure about. Actual mathematicians always have conjectures, like the Twin Prime Conjecture, that we consider plausible even though nobody has proved them. And whenever we do research, we’re constantly estimating how likely it is for statements to be true, and changing our estimates as new evidence comes in. In other words, we use scientific induction in mathematics.

How could we automate this? Most of us don’t consciously assign numerical probabilities to mathematical statements. But maybe an AI mathematician should. If so, what rules should it follow?

It’s natural to try a version of Solomonoff induction, where our probability estimate, before any evidence comes in, favors statements that are simple. However, this runs up against problems. If you’re interested in learning more about this, try:

• Jeremy Hahn, Scientific induction in probabilistic mathematics.

It’s a summary of ideas people came up with during the workshop. I would like to explain them sometime, but for now I should move on.

The Löbian obstacle

The second main theme was the ‘Löbian obstacle’. Löb’s theorem is the flip side of Gödel’s first incompleteness theorem, less famous but just as shocking. It seems to put limitations on how much a perfectly rational being can trust itself.

Since it’s the day after Christmas, let’s ease our way into these deep waters with the Santa Claus paradox, also known as Curry’s paradox.

If you have a child who is worried that Santa Claus might not exist, you can reassure them using this sentence:

If this sentence is true, Santa Claus exists.

Call it P, for short.

Assume, for the sake of argument, that P is true. Then what it says is true: “If P is true, Santa Claus exists.” And we’re assuming P is true. So, Santa Claus exists.

So, we’ve proved that if P is true, Santa Claus exists.

But that’s just what P says!

So, P is true.

So, Santa Claus exists!

There must be something wrong about this argument, even if Santa Claus does exist, because if it were valid you could you use it to prove anything at all. The self-reference is obviously suspicious. The sentence in question is a variant of the Liar Paradox:

This sentence is false.

since we can rewrite the Liar Paradox as

If this sentence is true, 0 = 1.

and then replace “0=1” by any false statement you like.

However, Gödel figured out a way to squeeze solid insights from these dubious self-referential sentences. He did this by creating a statement in the language of arithmetic, referring to nothing but numbers, which nonetheless manages to effectively say

This sentence is unprovable.

If it were provable, you’d get a contradiction! So, either arithmetic is inconsistent or this sentence is unprovable. But if it’s unprovable, it’s true. So, there are true but unprovable statements in arithmetic… unless arithmetic is inconsistent! This discovery shook the world of mathematics.

Here I’m being quite sloppy, just to get the idea across.

For one thing, when I’m saying ‘provable’, I mean provable given some specific axioms for arithmetic, like the Peano axioms. If we change our axioms, different statements will be provable.

For another, the concept of ‘true’ statements in arithmetic is often shunned by logicians. That may sound shocking, but there are many reasons for this: for example, Tarski showed that the truth of statements about arithmetic is undefinable in arithmetic. ‘Provability’ is much easier to deal with.

So, a better way of thinking about Gödel’s result is that he constructed a statement that is neither provable nor disprovable from Peano’s axioms of arithmetic, unless those axioms are inconsistent (in which case we can prove everything, but it’s all worthless).

Furthermore, this result applies not just to Peano’s axioms but to any stronger set of axioms, as long as you can write a computer program to list those axioms.

In 1952, the logician Leon Henkin flipped Gödel’s idea around and asked about a sentence in the language of arithmetic that says:

This sentence is provable.

He asked: is this provable or not? The answer is much less obvious than for Gödel’s sentence. Play around with it and see what I mean.

But in 1954, Martin Hugo Löb showed that Henkin’s sentence is provable!

And Henkin noticed something amazing: Löb’s proof shows much more.

At this point it pays to become a bit more precise. Let us write \mathrm{PA} \vdash P to mean the statement P is provable from the Peano axioms of arithmetic. Gödel figured out how to encode statements in arithmetic as numbers, so let’s write \# P for the Gödel number of any statement P. And Gödel figured out how to write a statement in arithmetic, say

\mathrm{Provable}(n)

which says that the statement with Gödel number n is provable using the Peano axioms.

Using this terminology, what Henkin originally did was find a number n such that the sentence

\mathrm{Provable}(n)

has Gödel number n. So, this sentence says

This sentence is provable from the Peano axioms of arithmetic.

What Löb did was show

\mathrm{PA} \vdash \mathrm{Provable}(n)

In other words, he showed that Henkin sentence really is provable from the Peano axioms!

What Henkin then did is prove that for any sentence P in the language of arithmetic, if

\mathrm{PA} \vdash \mathrm{Provable}(\# P) \implies P

then

\mathrm{PA} \vdash P

In other words, suppose we can prove that the provability of P implies P. Then we can prove P!

At first this merely sounds nightmarishly complicated. But if you think about it long enough, you’ll see it’s downright terrifying! For example, suppose P is some famous open question in arithmetic, like the Twin Prime Conjecture. You might hope to prove

The provability of the Twin Prime Conjecture implies the Twin Prime Conjecture.

Indeed, that seems like a perfectly reasonable thing to want. But it turns out that proving this is as hard as proving the Twin Prime Conjecture! Why? Because if we can prove the boldface sentence above, Löb and Henkin’s work instantly gives us a proof of Twin Prime Conjecture!

What does all this have to do with artificial intelligence?

Well, what I just said is true not only for Peano arithmetic, but any set of axioms including Peano arithmetic that a computer program can list. Suppose your highly logical AI mathematician has some such set of axioms, say \mathrm{AI}. You might want it to trust itself. In other words, you might want

\mathrm{AI} \vdash \mathrm{Provable}(\# P) \implies P

for every sentence P. This says, roughly, that whatever the AI can prove it can prove, it can prove.

But then Löb’s theorem would kick in and give

\mathrm{AI} \vdash P

for every sentence P. And this would be disastrous: our AI would be inconsistent, because it could prove everything!

This is just the beginning of the problems. It gets more involved when we consider AI’s that spawn new AI’s and want to trust them. For more see:

• Eliezer Yudkowsky and Marcello Herreshoff, Tiling agents for self-modifying AI, and the Löbian obstacle.

At workshop various people made progress on this issue, which is recorded in these summaries:

• Eliezer Yudkowsky, The procrastination paradox.

Abstract. A theorem by Marcello Herresho , Benja Fallenstein, and Stuart Armstrong shows that if there exists an infinite series of theories T_i extending \mathrm{PA} where each T_i proves the soundness of T_{i+1}, then all the T_i must have only nonstandard models. We call this the Procrastination Theorem for reasons which will become apparent.

• Benja Fallenstein, An in finitely descending sequence of sound theories each proving the next consistent.

Here Fallenstein constructs a di fferent sequence of theories T_i extending Peano arithmetic such that each T_i proves the consistency of T_{i+1}, and all the theories are sound for \Pi_1 sentences—that is, sentences with only one \forall quantifier outside the rest of the stuff.

The following summaries would take more work to explain:

• Nate Soares, Fallenstein’s monster.

• Nisan Stiennon, Recursively-defined logical theories are well-defined.

• Benja Fallenstein, The 5-and-10 problem and the tiling agents formalism.

• Benja Fallenstein, Decreasing mathematical strength in one formalization of parametric polymorphism.

Again: before reading any of these summaries, I urge you to read Benja Fallenstein’s post, which will help you understand them!


Relative Entropy (Part 3)

25 December, 2013

Holidays are great. There’s nothing I need to do! Everybody is celebrating! So, I can finally get some real work done.

In the last couple of days I’ve finished a paper with Jamie Vicary on wormholes and entanglement… subject to his approval and corrections. More on that later. And now I’ve returned to working on a paper with Tobias Fritz where we give a Bayesian characterization of the concept of ‘relative entropy’. This summer I wrote two blog articles about this paper:

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, \mathrm{FinStat}, and how relative entropy defines a functor on this category.

But then Tobias Fritz noticed a big problem. Our characterization of relative entropy was inspired by this paper:

• D. Petz, Characterization of the relative entropy of states of matrix algebras, Acta Math. Hungar. 59 (1992), 449–455.

Here Petz sought to characterize relative entropy both in the ‘classical’ case we are concerned with and in the more general ‘quantum’ setting. Our original goal was merely to express his results in a more category-theoretic framework! Unfortunately Petz’s proof contained a significant flaw. Tobias noticed this and spent a lot of time fixing it, with no help from me.

Our paper is now self-contained, and considerably longer. My job now is to polish it up and make it pretty. What follows is the introduction, which should explain the basic ideas.

A Bayesian characterization of relative entropy

This paper gives a new characterization of the concept of relative entropy, also known as ‘relative information’, ‘information gain’ or ‘Kullback-Leibler divergence’. Whenever we have two probability distributions p and q on the same finite set X, we can define the entropy of q relative to p:

S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right)

Here we set

q_x \ln\left( \frac{q_x}{p_x} \right)

equal to \infty when p_x = 0, unless q_x is also zero, in which case we set it equal to 0. Relative entropy thus takes values in [0,\infty].

Intuitively speaking, S(q,p) is the expected amount of information gained when we discover the probability distribution is really q, when we had thought it was p. We should think of p as a ‘prior’ and q as a ‘posterior’. When we take p to be the uniform distribution on X, relative entropy reduces to the ordinary Shannon entropy, up to an additive constant. The advantage of relative entropy is that it makes the role of the prior explicit.

Since Bayesian probability theory emphasizes the role of the prior, relative entropy naturally lends itself to a Bayesian interpretation: it measures how much information we gain given a certain prior. Our goal here is to make this precise in a mathematical characterization of relative entropy. We do this using a category \mathrm{FinStat} where:

• an object (X,q) consists of a finite set X and a probability distribution x \mapsto q_x on that set;

• a morphism $(f,s) : (X,q) \to (Y,r)$ consists of a measure-preserving function $f$ from $X$ to $Y,$ together with a probability distribution $x \mapsto s_{x y}$ on $X$ for each element y \in Y, with the property that s_{xy} = 0 unless f(x) = y.

We can think of an object of \mathrm{FinStat} as a system with some finite set of states together with a probability distribution on its states. A morphism

(f,s) : (X,q) \to (Y,r)

then consists of two parts. First, there is a deterministic measurement process f : X \to Y mapping states of the system being measured, X, to states of the measurement apparatus, Y. The condition that f be measure-preserving says that, after the measurement, the probability that the apparatus be in any state y \in Y is the sum of the probabilities of all states of X leading to that outcome:

\displaystyle{  r_y = \sum_{x \in f^{-1}(y)} q_x }

Second, there is a hypothesis s: an assumption about the probability s_{xy} that the system being measured is in the state x \in X given any measurement outcome y \in Y.

Suppose we have any morphism

(f,s) : (X,q) \to (Y,r)

in \mathrm{FinStat}. From this we obtain two probability distributions on the states of the system being measured. First, we have the probability distribution p given by

\displaystyle{   p_x = \sum_{y \in Y} s_{xy} r_y } \qquad \qquad (1)

This is our prior, given our hypothesis and the probability distribution of measurement results. Second we have the ‘true’ probability distribution q, which would be the posterior if we updated our prior using complete direct knowledge of the system being measured.

It follows that any morphism in \mathrm{FinStat} has a relative entropy S(q,p) associated to it. This is the expected amount of information we gain when we update our prior p to the posterior q.

In fact, this way of assigning relative entropies to morphisms defines a functor

F_0 : \mathrm{FinStat} \to [0,\infty]

where we use [0,\infty] to denote the category with one object, the numbers 0 \le x \le \infty as morphisms, and addition as composition. More precisely, if

(f,s) : (X,q) \to (Y,r)

is any morphism in \mathrm{FinStat}, we define

F_0(f,s) = S(q,p)

where the prior p is defined as in the equation (1).

The fact that F_0 is a functor is nontrivial and rather interesting. It says that given any composable pair of measurement processes:

(X,q) \stackrel{(f,s)}{\longrightarrow} (Y,r) \stackrel{(g,t)}{\longrightarrow} (Z,u)

the relative entropy of their composite is the sum of the relative entropies of the two parts:

F_0((g,t) \circ (f,s)) = F_0(g,t) + F_0(f,s) .

We prove that F_0 is a functor. However, we go much further: we characterize relative entropy by saying that up to a constant multiple, F_0 is the unique functor from \mathrm{FinStat} to [0,\infty] obeying three reasonable conditions.

The first condition is that F_0 vanishes on morphisms (f,s) : (X,q) \to (Y,r) where the hypothesis s is optimal. By this, we mean that Equation (1) gives a prior p equal to the ‘true’ probability distribution q on the states of the system being measured.

The second condition is that F_0 is lower semicontinuous. The set P(X) of probability distibutions on a finite set X naturally has the topology of an (n-1)-simplex when X has n elements. The set [0,\infty] has an obvious topology where it’s homeomorphic to a closed interval. However, with these topologies, the relative entropy does not define a continuous function

\begin{array}{rcl}         S : P(X) \times P(X) &\to& [0,\infty]  \\                                            (q,p) &\mapsto & S(q,p) .  \end{array}

The problem is that

\displaystyle{ S(q,p) = \sum_{x\in X} q_x \ln\left( \frac{q_x}{p_x} \right) }

and we define q_x \ln(q_x/p_x) to be \infty when p_x = 0 and q_x > 0 but 0 when p_x = q_x = 0. So, it turns out that S is only lower semicontinuous, meaning that if p^i , q^i are sequences of probability distributions on X with p^i \to p and q^i \to q then

S(q,p) \le \liminf_{i \to \infty} S(q^i, p^i)

We give the set of morphisms in \mathrm{FinStat} its most obvious topology, and show that with this topology, F_0 maps morphisms to morphisms in a lower semicontinuous way.

The third condition is that F_0 is convex linear. We describe how to take convex linear combinations of morphisms in \mathrm{FinStat}, and then the functor F_0 is convex linear in the sense that it maps any convex linear combination of morphisms in \mathrm{FinStat} to the corresponding convex linear combination of numbers in [0,\infty]. Intuitively, this means that if we take a coin with probability P of landing heads up, and flip it to decide whether to perform one measurement process or another, the expected information gained is P times the expected information gain of the first process plus 1-P times the expected information gain of the second process.

Here, then, is our main theorem:

Theorem. Any lower semicontinuous, convex-linear functor

F : \mathrm{FinStat} \to [0,\infty]

that vanishes on every morphism with an optimal hypothesis must equal some constant times the relative entropy. In other words, there exists some constant c \in [0,\infty] such that

F(f,s) = c F_0(f,s)

for any any morphism (f,s) : (X,p) \to (Y,q) in \mathrm{FinStat}.

Remarks

If you’re a maniacally thorough reader of this blog, with a photographic memory, you’ll recall that our theorem now says ‘lower semicontinuous’, where in Part 2 of this series I’d originally said ‘continuous’.

I’ve fixed that blog article now… but it was Tobias who noticed this mistake. In the process of fixing our proof to address this issue, he eventually noticed that the proof of Petz’s theorem, which we’d been planning to use in our work, was also flawed.

Now I just need to finish polishing the rest of the paper!


Lebesgue’s Universal Covering Problem (Part 1)

8 December, 2013

I try to focus on serious problems in this blog, mostly environmental issues and the attempt to develop ‘green mathematics’. But I seem unable to resist talking about random fun stuff now and then. For example, the Lebesgue universal covering problem. It’s not important, it’s just strange… but for some reason I feel a desire to talk about it.

For starters, let’s say the diameter of a region in the plane is the maximum distance between two points in this region. You all know what a circle of diameter 1 looks like. But an equilateral triangle with edges of length 1 also has diameter 1:


After all, two points in this triangle are farthest apart when they’re at two corners. And you’ll notice, if you think about it, that this triangle doesn’t fit inside a circle of diameter 1:

In 1914, the famous mathematician Henri Lebesgue sent a letter to a pal named Pál. And in this letter he asked: what’s the smallest possible region that contains every set in the plane with diameter 1?

He was actually more precise. The phrase “smallest possible” is vague, but Lebesgue wanted the set with the least possible area. The phrase “contains” is also vague, at least as I used it. I didn’t mean that our region should literally contain every set with diameter 1. Only the whole plane does that! I meant that we can rotate or translate any set with diameter 1 until it fits in our region. So a more precise statement is:

What is the smallest possible area of a region S in the plane such that every set of diameter 1 in the plane can be rotated and translated to fit inside S?

You see why math gets a reputation of being dry: sometimes when you make a simple question precise it gets longer.

Even this second version of the question is a bit wrong, since it’s presuming there exists a region with smallest possible area that does the job. Maybe you can do it with regions whose area gets smaller and smaller, closer and closer to some number, but you can’t do it with a region whose area equals that number! Also, the word ‘region’ is a bit vague. So if I were talking to a nitpicky mathematician, I might pose the puzzle this way:

What is the greatest lower bound of the measures of closed sets S \subseteq \mathbb{R}^2 such that every set T \subseteq \mathbb{R}^2 of diameter 1 can be rotated and translated to fit inside S?

Anyway, the reason this puzzle is famous is not that it gets dry when you state it precisely. It’s that it’s hard to solve!

It’s called Lebesgue’s universal covering problem. A region in the plane is called a universal covering if it does the job: any set in the plane of diameter 1 can fit inside it, in the sense I made precise.

Pál set out to find universal coverings, and in 1920 he wrote a paper on his results. He found a very nice one: a regular hexagon circumscribed around a circle of diameter 1. Do you see why you can fit the equilateral triangle with side length 1 inside this?

This hexagon has area

\sqrt{3}/2 \approx 0.86602540

But in the same paper, Pál showed you could safely cut off two corners of this hexagon and get a smaller universal covering! This one has area

2 - 2/\sqrt{3} \approx 0.84529946

He guessed this solution was optimal.

However, in 1936, a guy named Sprague sliced two tiny pieces off Pál’s proposed solution and found a universal covering with area

\sim 0.84413770

Here’s the idea:

The big hexagon is Pál's original solution. He then inscribed a regular 12-sided polygon inside this, and showed that you can remove two of the corners, say B_1B_2B and F_1F_2F, and get a smaller universal covering. But Sprague noticed that near D you can also remove the part outside the circle with radius 1 centered at B_1, as well as the part outside the circle with radius 1 centered at F_2.

Sprague conjectured that this is the best you can do.

But in 1975, Hansen showed you could slice off very tiny corners off Sprague’s solution, each of which reduces the area by just 6 \cdot 10^{-18}.

I think this microscopic advance, more than anything else, convinced people that Lebesgue’s universal covering problem was fiendishly difficult. Victor Klee, in a parody of the usual optimistic prophecies people like to make, wrote that:

… progress on this question, which has been painfully slow in the past, may be even more painfully slow in the future.

Indeed, my whole interest in this problem is rather morbid. I don’t know any reason that it’s important. I don’t see it as connected to lots of other beautiful math. It just seems astoundingly hard compared to what you might initially think. I admire people who work on it in the same way I admire people who decide to ski across the Antarctic. It’s fun to read about—from the comfort of my living room, sitting by the fire—but I can’t imagine actually doing it!

In 1980, a guy named Duff did a lot better. All the universal coverings so far were convex subsets of the plane. But he considered nonconvex universal coverings and found one with area:

\sim 0.84413570

This changed the game a lot. If we only consider convex universal coverings, it’s possible to prove there’s one with the least possible area—though we don’t know what it is. But if we allow nonconvex ones, we don’t even know this. There could be solutions whose area gets smaller and smaller, approaching some nonzero value, but never reaching it.

So at this point, Lebesgue’s puzzle split in two: one for universal coverings, and one for convex universal coverings. I’ll only talk about the second one, since I don’t know of further progress on the first.

Remember how Hansen improved Sprague’s universal covering by chopping off two tiny pieces? In 1992 he went further. He again sliced two corners off Sprague’s solution. One, the same shape as before, reduced the area by just 6 \cdot 10^{-18}. But the other was vastly larger: it reduced the area by a whopping 4 \cdot 10^{-11}.

I believe this is the smallest convex universal covering known so far.

But there’s more to say. In 2005, Peter Brass and Mehrbod Sharifi came up with a nice lower bound. They showed any convex universal covering must have an area of least

0.832

They got this result by a rigorous computer-aided search for the smallest possible area of a convex set containing a circle, equilateral triangle and pentagon of diameter 1.

If you only want your convex set to contain a circle and equilateral triangle of diameter 1, you can get away with this:

This gives a lower bound of

0.825

But the pentagon doesn’t fit in this set! Here is a convex shape that also contains a pentagon of diameter 1, as found by Brass and Sharifi:

You’ll notice the triangle no longer has the same center as the circle!

To find this result, it was enough to keep the circle fixed, translate the triangle, and translate and rotate the pentagon. So, they searched through a 5-dimensional space, repeatedly computing the area of the convex hull of these three shapes to see how small they could make it. They considered 53,118,162 cases. Of these, 655,602 required further care—roughly speaking, they had to move the triangle and pentagon around slightly to see if they could make the area even smaller. They proved that they could not make it smaller than 0.825.

They say they could have gotten a better result if they’d also included a fourth shape, but this was computationally infeasible, since it would take them from a 5-dimensional search space to an 8-dimensional one. It’s possible that one could tackle this using a distributed computing project where a lot of people contribute computer time, like they do in the search for huge prime numbers.

If you hear of more progress on this issue, please let me know! I hope that sometime—perhaps tomorrow, perhaps decades hence—someone will report good news.

References

Pál’s paper is here:

• Julius Pál, Ueber ein elementares Variationsproblem, Danske Mat.-Fys. Meddelelser III 2 (1920).

Sprague’s paper is here:

• R. Sprague, Über ein elementares Variationsproblem, Matematiska Tidsskrift Ser. B (1936), 96–99.

Hansen’s record-holding convex universal cover is here:

• H. Hansen, Small universal covers for sets of unit diameter, Geometriae Dedicata 42 (1992), 205–213.

It’s quite readable. This is where I got the picture of Pál’s solution and Sprague’s improvement.

The paper on the current best lower bound for convex universal coverings is also quite nice:

• Peter Brass and Mehrbod Sharifi, A lower bound for Lebesgue’s universal cover problem, International Journal of Computational Geometry & Applications 15 (2005), 537–544.

The picture of the equilateral triangle in the circle comes from an earlier paper, which got a lower bound of

0.8271

by considering the circle and regular 3^n-gons of diameter 1 for all n:

• Gy. Elekes, Generalized breadths, circular Cantor sets, and the least area UCC, Discrete & Computational Geometry 12 (1994), 439–449.

I have not yet managed to get ahold of Duff’s paper on the record-holding nonconvex universal covering:

• G. F. D. Duff, A smaller universal cover for sets of unit diameter, C. R. Math. Acad. Sci. 2 (1980), 37–42.

One interesting thing is that in 1963, Eggleston found a universal covering that’s minimal in the following sense: no subset of it is a universal covering. However, it doesn’t have the least possible area!

• H. G. Eggleston, Minimal universal covers in \mathrm{E}^n, Israel J. Math. 1 (1963), 149–155.

Later, Kovalev showed that any ‘minimal’ universal covering in this sense is a star domain:

• M. D. Kovalev, A minimal Lebesgue covering exists (Russian), Mat. Zametki 40 (1986), 401–406, 430.

This means there’s a point x_0 inside the set such that for any other point x in the set, the line segment connecting x_0 and x is in the set. Any convex set is a star domain, but not conversely:




Rolling Hypocycloids

3 December, 2013

The hypocycloid with n cusps is the curve traced out by a point on a circle rolling inside a circle whose radius is n times larger.

The hypocycloid with 2 cusps is sort of strange:

It’s just a line segment! It’s called the Tusi couple.

The hypocycloid with 3 cusps is called the deltoid, because it’s shaped like the Greek letter Δ:

The hypocycloid with 4 cusps is called the astroid, because it reminds people of a star:

I got interested in hypocycloids while writing some articles here:

Rolling circles and balls.

My goal was to explain a paper John Huerta and I wrote about the truly amazing things that happen when you roll a ball on a ball 3 times as big. But I got into some interesting digressions.

While pondering hypocycloids in the ensuing discussion, the mathematician, physicist, programmer and science fiction author Greg Egan created this intriguing movie:

It’s a deltoid rolling inside an astroid. It fits in a perfectly snug way, with all three corners touching the astroid at all times!

Why does it work? It’s related to some things that physicists like: SU(3) and SU(4).

SU(n) is the group of n × n unitary matrices with determinant 1. Physicists like Pauli and Heisenberg got interested in SU(2) when they realized it describes the rotational symmetries of an electron. You need it to understand the spin of an electron. Later, Gell-Mann got interested in SU(3) because he thought it described the symmetries of quarks. He won a Nobel prize for predicting a new particle based on this theory, which was then discovered.

We now know that SU(3) does describe symmetries of quarks, but not in the way Gell-Mann thought. It turns out that quarks come in 3 ‘colors’—not real colors, but jokingly called red, green and blue. Similarly, electrons come in 2 different spin states, called up and down. Matrices in SU(3) can change the color of a quark, just as matrices in SU(2) can switch an electron’s spin from up to down, or some mix of up and down.

SU(4) would be important in physics if quarks came in 4 colors. In fact there’s a theory saying they do, with electrons and neutrinos being examples of quarks in their 4th color state! It’s called the Pati–Salam theory, and you can see an explanation here:

• John Baez and John Huerta, The algebra of grand unified theories.

It’s a lot of fun, because it unifies leptons (particles like electrons and neutrinos) and quarks. There’s even a chance that it’s true. But it’s not very popular these days, because it has some problems. It predicts that protons decay, which we haven’t seen happen yet.

Anyway, the math of SU(3) and SU(4) is perfectly well understood regardless of their applications to physics. And here’s the cool part:

If you take a matrix in SU(3) and add up its diagonal entries, you can get any number in the complex plane that lies inside a deltoid. If you take a matrix in SU(4) and add up its diagonal entries, you can get any number in the complex plane that lies inside an astroid. And using how SU(3) fits inside SU(4), you can show that a deltoid moves snugly inside an astroid!

The deltoid looks like it’s rolling, hence the title of this article. But Egan pointed out that it’s not truly ‘roll’ in the sense of mechanics—it slides a bit as it rolls.

For the details, see this article on my other blog:

Deltoid rolling on an astroid.

But the really cool part—the new thing I want to show you today—is that this pattern continues!

For example, an astroid moves snugly inside a 5-pointed shape, thanks to how SU(4) sits inside SU(5). Here’s a movie of that, again made by Greg Egan:

In general, a hypocycloid with n cusps moves snugly inside a hypocycloid with n + 1 cusps. But this implies that you can have a hypocycloid with 2 cusps moves inside one with 3 cusps moving inside one with 4 cusps, etcetera! I’ve been wanting to see this for a while, but yesterday Egan created some movies showing this:

Depending on the details, you can get different patterns:

Egan explains:

In the first movie, every n-cusped hypocycloid moves inside the enclosing (n+1)-cusped hypocycloid at the same angular velocity and in the same direction, relative to the enclosing one … which is itself rotating, except for the very largest one. So the cumulative rotational velocity is highest for the line, lower for the deltoid, lower still for the astroid, in equal steps until you hit zero for the outermost hypocycloid.

In the second movie, the magnitude of the relative angular velocity is always the same between each hypocycloid and the one that encloses it, but the direction alternates as you go up each level.

Here’s one where he went up to a hypocycloid with 10 cusps:

Note how sometimes all the cusps meet!

Yonatan Zunger is the chief architect of Google+, but before that he used to work on string theory. Reading about this stuff on Google+, he had a number of interesting ideas, like this:

I wonder, can a deltoid roll in a 5-hypocycloid? I haven’t worked through the math of just how the SU(n) → SU(n+1) embedding guarantees a fit here.

(And also, are there corresponding pictures we could draw to illustrate the embeddings of other Lie groups? This could be a lovely way to illustrate a wide range of relationships if it could be done more generally.)

Egan figured out that yes, a deltoid can move nicely in a hypocycloid with 5 cusps:

He writes:

To make this, I took the border of a deltoid in “standard position” (as per the trace theorem) to be:

\mathrm{deltoid}(t) = 2 \exp(i t) + \exp(-2 i t)

Suppose I apply the linear function:

f(s, z) = \exp(-2 i s / 5) z + 2 \exp(3 i s / 5)

to the whole deltoid. Then at the point s=t on the deltoid, we have:

f(s, \mathrm{deltoid}(s)) = 4 \exp(3 i s / 5) + \exp(-4 (3 i s / 5))

which is a point on a 5-cusped hypocycloid in “standard position”. Of course with this choice of parameters you need to take s from 0 through to 10 \pi/3, not 2 \pi, to complete a cycle.

Using the same trick, you can get a hypocyloid with n cusps to move inside one with m cusps whenever n ≤ m. As for the more general question of how different Lie groups give different ‘generalized hypocycloids’, and how fitting one Lie group in another lets you roll one generalized hypocycloid in another, the current state of the art is here:

• N. Kaiser, Mean eigenvalues for simple, simply connected, compact Lie groups.

But there is still more left to do!

On a somewhat related note, check out this fractal obtained by rolling a circle inside a circle 4 times as big that’s rolling in a circle 4 times as big… and so on: astroidae ad infinitum!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers