## Symmetry and the Fourth Dimension (Part 10)

4 June, 2013

Some people say it’s impossible to visualize 4-dimensional things. But lots of people I know can do it.

How do they do it?

Well, how do you visualize 3-dimensional things? A computer screen is just 2d, but we can draw a 3d object on it by picking some diagonal direction—southeast in the picture below—to stand for ‘forwards, towards our eyes’. Similarly, we can draw a 4d object by picking another diagonal direction—northeast in this picture—to stand for ‘forwards in the 4th dimension’.

Here we are using this trick to draw 0d, 1d, 2d, 3d and 4d cubes. The first dimension, often called the x direction, points along the red arrow. The second, called the y direction, points along the green arrow. The third, the z direction, points diagonally along the blue arrow. And the fourth, sometimes called the w direction, points diagonally along the yellow arrow.

There’s nothing sacred about these names or these particular directions; we can implement this idea in lots of different ways. It’s ‘cheating’, but that’s okay. A vague, misty image can be a lot better than no image at all.

Of course, we need to think about math to keep straight which lines in our picture point in the w direction. But that’s okay too. A mixture of equations and visualization lets mathematicians and physicists make faster progress in understanding the 4th dimension than if they used only equations.

Physicists need to understand the 4th dimension because we live in 4d spacetime. Some mathematicians study much higher-dimensional spaces, even infinite-dimensional ones, and here visualization becomes more subtle, and perhaps more limited. But many mathematicians working on 4d topology take visualization very seriously. If you’ve ever seen the elaborate, detailed gestures they make when describing 4-dimensional constructions, you’ll know what I mean.

Visualizing shapes in 4 dimensions takes practice, but it’s lots of fun! As this series of posts continues, I want to give you some practice while talking about nice 4-dimensional shapes: the 4d relatives of the Platonic and Archimedean solids. In the series so far, we’ve spent a lot of time warming up by studying the 3d Platonic and Archimedean solids and developing a technique for classifying them, called Coxeter diagrams. All that will pay off soon!

Here’s a great series of videos that explains higher dimensions:

• Jos Leys, Étienne Ghys and Aurélien Alvarez, Dimensions.

The only problem is that it’s tough to navigate them. Click on your favorite language and you’ll see part 1 of the series. After you start playing it you’ll see an arrow at the lower right of the video that lets you jump to the next one. This is good if, like me, you’re impatient for the 4th dimension! That starts in part 3.

There’s a guide to all nine parts here:

• Jos Leys, Étienne Ghys and Aurélien Alvarez, Tour.

but you can’t get to the videos from there! They need a bit of help from a good website designer.

The picture above is a shot of the glorious 120-cell… one of the six Platonic solids in 4 dimensions. But more on that later! We’ll start with a simpler one: the 4-cube.

## Symmetry and the Fourth Dimension (Part 9)

2 June, 2013

Last time in this series we completed the first phase of our quest: we got lots of polyhedra from Coxeter diagrams in a systematic way. But before we sail off into new seas, let’s review what we’ve done.

To spice things up, I’ll explain everything in a different way than before. If you don’t see how this new way relates to the old one, please ask! There’s a lot to say about this stuff that I’m not saying, so there are plenty of gaps left to fill.

And if you’re wondering what that thing is up there, read on.

### Coxeter theory

Harold Scott MacDonald Coxeter, the ‘king of geometry’, developed a beautiful theory relating groups to highly symmetrical polyhedra and their higher-dimensional generalizations, called ‘polytopes’.

His book Regular Polytopes is required reading for anyone who wants to learn about these things. There’s a copy sitting by my bed, and there should be one by yours, too. It took him 24 years to write.

In his honor, finite groups generated by reflections in n-dimensional space are called Coxeter groups. They’re described by diagrams with n dots, called Coxeter diagrams. We’ve been looking at the 3d case, where there happen to be three such diagrams, with mysterious-sounding names:

A3      o—3—o—3—o
B3      o—3—o—4—o
H3      o—3—o—5—o

I want to show you to use these diagrams. So let’s do an example, namely A3:

o—3—o—3—o

This diagram is telling us to draw a bunch of triangles on a sphere, like this:

This shape is called a Coxeter complex. As you can see, it contains lots of great circles. For each one we get a symmetry: the reflection across that circle, as if a mirror cut our Coxeter complex in half through that circle. So, the symmetries of the Coxeter complex form a finite group, generated by reflections. This is a Coxeter group!

But how do we get the Coxeter complex from the diagram? The dots in the diagram are secretly called V, E, and F:

V—3—E—3—F

Among other things, these letters are names for the edges of our favorite triangle in the Coxeter complex:

It doesn’t matter which is our favorite triangle, so I just picked one. Each edge of this triangle gives a symmetry: the reflection that reflects the Coxeter complex across that edge! We call these symmetries V, E and F.

The 3 on the line going between V and E in this diagram:

V—3—E—3—F

is a quick way of saying that

(VE)3 = 1

In other words, doing the reflections V, E, V, E, V, E gets us back where we started.

To see this, let’s call our favorite triangle 1. Reflecting this triangle across the great circle containing the edge V, we get a new triangle, which we’ll call V. Reflecting that across the great circle containing the edge E, we get a new triangle, which we’ll call VE. And so on:

By the time we get to VEVEVE = (VE)3, we’re back where we started! That’s a total of 6 reflections, so the V and E edges of each triangle must meet at a 60° angle.

Similarly, the 3 on the line going between E and V says that

(EF)3 = 1

so doing the reflections E, F, E, F, E, F also gets us back where we started:

On the other hand, there’s no line connecting the dots V and F in our Coxeter diagram:

V—3—E—3—F

But this is an abbreviation for a line with a 2 on it! You see, lines with 2 on them occur so often that Coxeter decided to save time by not drawing such lines. So, we have

(VF)2 = 1

and doing the reflections V, F, V, F also gets us back where started:

That’s a total of 4 reflections, so the V and F edges of each triangle must meet at a 90° angle.

So, I’ve sketched how the Coxeter group and the Coxeter complex arise from the Coxeter diagram. To be a bit more precise, the Coxeter group A3 has generators V, E, F obeying relations

(VE)3 = (EF)3 = (VF)2 = 1

and also

V2 = E2 = F2 = 1

since V, E, and F are reflections. If we draw a sphere with one great circle serving as the ‘mirror’ for each reflection in the Coxeter group, we get the Coxeter complex.

What makes Coxeter complexes special, compared to other ways of tiling a sphere with triangles? One thing is that they have exactly as many symmetries as triangles. If you pick any triangle and call it your favorite, there’s exactly one symmetry—that is, rotation and/or reflection—sending this triangle to any other.

So, the Coxeter complex is actually a picture of its own symmetry group!

The particular Coxeter complex we’ve been looking at has 24 triangles. So, its symmetry group has 24 elements. This group is called the tetrahedral finite reflection group, or A3 for short. There’s a lot to say about it, but not now! There’s another more urgent question.

### Polyhedra

How do we get polyhedra from our Coxeter complex?

The easiest one works like this. We take the Coxeter complex and create a polyhedron with a corner in the middle of each triangle, connecting two corners with an edge whenever their triangles touch:

The polyhedron looks better if we view it from a slightly different angle, and let a skilled artist like Tom Ruen do the drawing:

This polyhedron is called the Poincaré dual of the Coxeter complex.

There’s a notation for this particular polyhedron:

•—3—•—3—•

All the dots are black, because this is the fanciest, most interesting polyhedron that comes from our Coxeter complex. We get other polyhedra by blackening just some of the dots.

Earlier in this series, I described how to build these other polyhedra using the concept of ‘flag’. But there’s another way, using the Coxeter complex, which I’ll sketch here.

Suppose we blacken just some of the dots in our Coxeter diagram. Then our favorite triangle belongs to a bunch of triangles, all related by reflections corresponding to the dots we left white. And indeed, all the triangles can be grouped into bunches that all look the same… and there’s a polyhedron that has a corner in the middle of each bunch of triangles.

For example, suppose we leave the E dot white:

•—3—o—3—•

Then our favorite triangle belongs to a bunch of triangles—in this case, just a pair!—that are related by reflections along the E edge of our favorite triangle. If we group our triangles into pairs like this, we get a polyhedron with a corner in the middle of each pair:

Again, it looks better if we let Tom Ruen draw it:

I should do more examples, but I think I’ll wrap up by describing the procedure in more highbrow language. Skip the next paragraph if you don’t know group theory, and move on to the complete list of examples!

If we call the Coxeter group G, blackening all the dots of the Coxeter diagram gives a polyhedron with one corner for each element of G. But if we don’t blacken all of them, the reflections corresponding to the white dots generate a subgroup B of G. Then we get a polyhedron with one corner for each element of G/B. Last time I said each way of blackening some dots describes some sort of ‘flag’. In these terms, B is the subgroup fixing your favorite flag of this sort, and G/B is the set of all flags of this sort. Each of those flags corresponds to what I’m calling a ‘bunch of triangles’ in my current story.

But now let’s see what we get from all this! We get three families of polyhedra, and these are almost all the ‘Archimedean solids’ in 3 dimensions.

### The A3 family:     o—3—o—3—o

This family of polyhedra can all be gotten from the tetrahedron by chopping off vertices, edges or faces. They’re associated to the Coxeter complex we’ve just been looking at:

It’s built from triangles whose interior angles are $\pi/3, \pi/3$ and $\pi/2$. These numbers come from taking π and dividing it by the numbers on the edges of the Coxeter diagram: 3, 3, and the invisible edge labelled 2.

As mentioned, this Coxeter complex has 24 triangles, and its symmetry group, with 24 elements, is called the tetrahedral finite reflection group, or A3.

Here are all the polyhedra in this family. The list has some repeats, because this Coxeter diagram is its own mirror image!

 tetrahedron •—3—o—3—o truncated tetrahedron •—3—•—3—o octahedron o—3—•—3—o truncated tetrahedron o—3—•—3—• tetrahedron o—3—o—3—• cuboctahedron •—3—o—3—• truncated octahedron •—3—•—3—•

### The B3 family:     o—3—o—4—o

This family of polyhedra can all be gotten by taking the cube or the octahedron and chopping off vertices, edges or faces. They come from the Coxeter complex whose triangles have interior angles $\pi/3, \pi/4$ and $\pi/2$:

This Coxeter complex has 48 triangles, and its symmetry group is a Coxeter group with 48 elements, called the octahedral finite reflection group, or B3.

Here are all the polyhedra in this family. We’ve seen some of these already in the A3 family, and there’s a reason for that: the group B3 happens to contain A3 as a subgroup! It’s twice as big.

 cube •—4—o—3—o truncated cube •—4—•—3—o cuboctahedron o—4—•—3—o truncated octahedron o—4—•—3—• octahedron o—4—o—3—• rhombicuboctahedron •—4—o—3—• truncated cuboctahedron •—4—•—3—•

### The H3 family:     o—3—o—5—o

This family of polyhedra can all be gotten by taking the dodecahedron or icosahedron and chopping off vertices, edges and faces. They come from the Coxeter complex built from triangles whose interior angles are $\pi/3, \pi/5$ and $\pi/2$:

This Coxeter complex has 120 triangles, and its symmetry group is a Coxeter group with 120 elements, called the icosahedral finite reflection group, or H3.

Here are all the polyhedra coming from this Coxeter complex. These are my favorites, because they look the most fancy, and I have a fondness for the quirky charm of 5-fold symmetry:

 dodecahedron •—5—o—3—o truncated dodecahedron •—5—•—3—o icosidodecahedron o—5—•—3—o truncated icosahedron o—5—•—3—• icosahedron o—5—o—3—• rhombicosidodecahedron •—5—o—3—• truncated icosidodecahedron •—5—•—3—•

### Afterword

The picture at the start of this post was taken by my friend Allen Knutson:

It’s the H3 Coxeter complex, and for some reason it’s in the music library at Cornell University, where Allen teaches math.

The picture of H. S. M. Coxeter is from the cover of a book about him:

• Siobhan Roberts, King of Infinite Space, Walker & Company, 2006.

As usual, the pretty pictures of solids with brass balls at the vertices were made by Tom Ruen using Robert Webb’s Stella software. Tom Ruen also drew the Coxeter complexes; I’m to blame for the crude extra stuff added on.

You can see the previous episodes of this series here:

Part 1: Platonic solids and Coxeter complexes.

Part 2: Coxeter groups.

Part 3: Coxeter diagrams.

Part 4: duals of Platonic solids.

Part 5: Interpolating between a Platonic solid and its dual, and how to describe this using Coxeter diagrams. Example: the cube/octahedron family.

Part 6: Interpolating between a Platonic solid and its dual. Example: the dodecahedron/icosahedron family.

Part 7: Interpolating between a Platonic solid and its dual. Example: the tetrahedron family.

Part 8: The missing solids, coming from Coxeter diagrams with both ends blackened.

## 42

25 May, 2013

In The Hitchhiker’s Guide to the Galaxy by Douglas Adams, the number 42 is the “Answer to the Ultimate Question of Life, the Universe, and Everything”. But he didn’t say what the question was!

Since today is Towel Day, let me reveal that now.

If you try to get several regular polygons to meet snugly at a point in the plane, what’s the most sides any of the polygons can have? The answer is 42.

The picture shows an equilateral triangle, a regular heptagon and a regular 42-gon meeting snugly at a point. If you do the math, you’ll see the reason this works is that

$\displaystyle{ \frac{1}{3} + \frac{1}{7} + \frac{1}{42} = \frac{1}{2} }$

There are actually 10 solutions of

$\displaystyle{ \frac{1}{p} + \frac{1}{q} + \frac{1}{r} = \frac{1}{2} }$

with $p \le q \le r,$ and each of them gives a way for three regular polygons to snugly meet at a point. But this particular solution features the biggest number possible!

But why is this so important? Well, it turns out that if you look for natural numbers $a, b, c$ that make

$\displaystyle{ \frac{1}{a} + \frac{1}{b} + \frac{1}{c} }$

as close to 1 as possible, while still less than 1, the very best you can do is $1/2 + 1/3 + 1/7.$ It comes within $1/42$ of equalling 1, since

$\displaystyle{ \frac{1}{3} + \frac{1}{7} + \frac{1}{42} = \frac{1}{2} }$

And why is this important? Well, suppose you’re trying to make a doughnut with at least two holes that has the maximum number of symmetries. More precisely, suppose you’re trying to make a Riemann surface with genus $g \ge 2$ that has the maximum number of symmetries. Then you need to find a highly symmetrical tiling of the hyperbolic plane by triangles whose interior angles are $\pi/a, \pi/b$ and $\pi/c ,$ and you need

$\displaystyle{ \frac{1}{a} + \frac{1}{b} + \frac{1}{c} < 1 }$

for these triangles to fit on the hyperbolic plane.

For example, if you take $a = 2, b = 3, c = 7$ you get this tiling:

A clever trick then lets you curl up the hyperbolic plane and get a Riemann surface with at most

$\displaystyle{ \frac{2(g-1)}{1 - \frac{1}{a} - \frac{1}{b} - \frac{1}{c}} }$

symmetries.

So, to get as many symmetries as possible, you want to make $1 - \frac{1}{a} - \frac{1}{b} - \frac{1}{c}$ as small as possible! And thanks to what I said, the best you can do is

$\displaystyle{ 1 - \frac{1}{2} - \frac{1}{3} - \frac{1}{7} = \frac{1}{42} }$

So, your surface can have at most

$84(g-1)$

symmetries. This is called Hurwitz’s automorphism theorem. The number 84 looks mysterious when you first see it — but it’s there because it’s twice 42.

In particular, the famous mathematician Felix Klein studied the most symmetrical doughnut with 3 holes. It’s a really amazing thing, called Klein’s quartic curve:

It has

$84 \times 2 = 168$

symmetries. That number also looks mysterious when you first see it. Of course it’s the number of hours in a week, but the real reason it’s there is because it’s four times 42.

If you carefully count the triangles in the picture above, you’ll get 56. However, these triangles are equilateral, or at least they would be if we could embed Klein’s quartic curve in 3d space without distorting it. If we drew all the smaller triangles whose interior angles are $\pi/2, \pi/3$ and $\pi/7,$ each equilateral triangle would get subdivided into 6 smaller triangles, and there would be a total of $6 \times 56 = 336$ triangles. But of course

$336 = 8 \times 42$

Half of these smaller triangles would be ‘left-handed’ and half would be ‘right-handed’, and there’d be a symmetry sending a chosen triangle to any other triangle of the same handedness, for a total of

$168 = 4 \times 42$

symmetries (that is, conformal transformations, not counting reflections).

But why is this stuff the answer to the ultimate question of life, the universe, and everything? I’m not sure, but I have a crazy theory. Maybe all matter and forces are made of tiny little strings! As they move around, they trace out Riemann surfaces in spacetime. And when these surfaces are as symmetrical as possible, reaching the limit set by Hurwitz’s automorphism theorem, the size of their symmetry group is a multiple of 42, thanks to the math I just described.

### Puzzles

Puzzle 1. Consider solutions of

$\displaystyle{ \frac{1}{p} + \frac{1}{q} + \frac{1}{r} = \frac{1}{2} }$

with positive integers $p \le q \le r,$ and show that the largest possible value of $r$ is $42.$

Puzzle 2. Consider solutions of

$\displaystyle{ \frac{1}{a} + \frac{1}{b} + \frac{1}{c} < 1}$

with positive integers $a, b, c,$ and show that the largest possible value of $\frac{1}{a} + \frac{1}{b} + \frac{1}{c}$ is $1 - \frac{1}{42}.$

### Acknowledgments and references

For more details, see my page on Klein’s quartic curve, and especially the section on incredibly symmetrical surfaces. Sums of reciprocals of natural numbers are called ‘Egyptian fractions’, and they have deep connections to geometry; for more on this see my article on Archimedean tilings and Egyptian fractions.

The picture of a triangle, heptagon and 42-gon (also known as a tetracontakaidigon) was made by Tyler, and you can see all 17 ways to get 3 regular polygons to meet snugly at a vertex on Wikipedia. Of these, only 11 can occur in a uniform tiling of the plane. The triangle, heptagon and 42-gon do not tile the plane, but you can see some charming attempts to do something with them on Kevin Jardine’s website Imperfect Congruence:

The picture of Klein’s quartic curve was made by Greg Egan, and you should also check out his page on Klein’s quartic curve.

“Good Morning,” said Deep Thought at last.
“Er…good morning, O Deep Thought” said Loonquawl nervously, “do you have…er, that is…”
“An Answer for you?” interrupted Deep Thought majestically. “Yes, I have.”
The two men shivered with expectancy. Their waiting had not been in vain.
“There really is one?” breathed Phouchg.
“There really is one,” confirmed Deep Thought.
“To Everything? To the great Question of Life, the Universe and everything?”
“Yes.”
Both of the men had been trained for this moment, their lives had been a preparation for it, they had been selected at birth as those who would witness the answer, but even so they found themselves gasping and squirming like excited children.
“And you’re ready to give it to us?” urged Loonsuawl.
“I am.”
“Now?”
“Now,” said Deep Thought.
They both licked their dry lips.
“Though I don’t think,” added Deep Thought. “that you’re going to like it.”
“Doesn’t matter!” said Phouchg. “We must know it! Now!”
“Now?” inquired Deep Thought.
“Yes! Now…”
“All right,” said the computer, and settled into silence again. The two men fidgeted. The tension was unbearable.
“You’re really not going to like it,” observed Deep Thought.
“Tell us!”
“All right,” said Deep Thought. “The Answer to the Great Question…”
“Yes…!”
“Of Life, the Universe and Everything…” said Deep Thought.
“Yes…!”
“Is…” said Deep Thought, and paused.
“Yes…!”
“Is…”
“Yes…!!!…?”
“Forty-two,” said Deep Thought, with infinite majesty and calm.

## Category Theory for Scientists

23 May, 2013

At last—a textbook on category theory for scientists! And it’s free!

• David Spivak, Category Theory for Scientists.

It’s based on a course the author taught:

This course is an attempt to extol the virtues of a new branch of mathematics, called category theory, which was invented for powerful communication of ideas between different fields and subfields within mathematics. By powerful communication of ideas I actually mean something precise. Different branches of mathematics can be formalized into categories. These categories can then be connected together by functors. And the sense in which these functors provide powerful communication of ideas is that facts and theorems proven in one category can be transferred through a connecting functor to yield proofs of an analogous theorem in another category. A functor is like a conductor of mathematical truth.

I believe that the language and toolset of category theory can be useful throughout science. We build scientific understanding by developing models, and category theory is the study of basic conceptual building blocks and how they cleanly fit together to make such models. Certain structures and conceptual frameworks show up again and again in our understanding of reality. No one would dispute that vector spaces are ubiquitous. But so are hierarchies, symmetries, actions of agents on objects, data models, global behavior emerging as the aggregate of local behavior, self-similarity, and the effect of methodological context.

Some ideas are so common that our use of them goes virtually undetected, such as set-theoretic intersections. For example, when we speak of a material that is both lightweight and ductile, we are intersecting two sets. But what is the use of even mentioning this set-theoretic fact? The answer is that when we formalize our ideas, our understanding is almost always clarified. Our ability to communicate with others is enhanced, and the possibility for developing new insights expands. And if we are ever to get to the point that we can input our ideas into computers, we will need to be able to formalize these ideas first.

It is my hope that this course will offer scientists a new vocabulary in which to think and communicate, and a new pipeline to the vast array of theorems that exist and are considered immensely powerful within mathematics. These theorems have not made their way out into the world of science, but they are directly applicable there. Hierarchies are partial orders, symmetries are group elements, data models are categories, agent actions are monoid actions, local-to-global principles are sheaves, self-similarity is modeled by operads, context can be modeled by monads.

He asks readers from different subjects for help in finding new ways to apply category theory to those subjects. And that’s the right attitude to take when reading this book. I’ve found categories immensely valuable in my work. But it took effort to learn category theory and see how it can apply to different subjects. People are just starting to figure out these things, so don’t expect instant solutions to the problems in your own favorite field.

But Spivak does the best job I’ve seen so far at explaining category theory as a general-purpose tool for thinking clearly. Since I’m busy using category theory to clarify the relationships between fields like chemistry, population biology, electrical engineering and control theory, this subject is very much on my mind.

## Graph Laplacians

19 May, 2013

There’s been some new progress on graph Laplacians!

As a mathematical physicist, I’ve always been in love with the Laplacian:

$\displaystyle{ \nabla^2 = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2} + \frac{\partial^2}{\partial z^2} }$

It shows up in many of the most fundamental equations of physics: the wave equation, the heat equation, Schrödinger’s equation… and Poisson’s equation:

$\nabla^2 \phi = \rho$

which says how the density of matter, $\rho,$ affects the gravitational potential $\phi.$

As I’ve grown interested in network theory, I’ve gotten more and more interested in ‘graph Laplacians’. These are discretized versions of the Laplacian, where we replace 3-dimensional space by a ‘graph’, meaning something like this:

You can get a lot of interesting information about a graph from its Laplacian. You can also set up discretized versions of all the famous equations I mentioned.

The new progress is a simple algorithm for very efficiently solving Poisson’s equation for graph Laplacians:

• Jonathan A. Kelner, Lorenzo Orecchia, Aaron Sidford, Zeyuan Allen Zhu, A simple, combinatorial algorithm for solving SDD systems in nearly-linear time.

Here’s a very clear explanation of the general idea, conveying some sense of why it’s so important, without any nasty equations:

• Larry Hardesty, Short algorithm, long-range consequences, MIT News, 1 March 2013.

It begins:

In the last decade, theoretical computer science has seen remarkable progress on the problem of solving graph Laplacians — the esoteric name for a calculation with hordes of familiar applications in scheduling, image processing, online product recommendation, network analysis, and scientific computing, to name just a few. Only in 2004 did researchers first propose an algorithm that solved graph Laplacians in “nearly linear time,” meaning that the algorithm’s running time didn’t increase exponentially with the size of the problem.

At this year’s ACM Symposium on the Theory of Computing, MIT researchers will present a new algorithm for solving graph Laplacians that is not only faster than its predecessors, but also drastically simpler.

This animation shows two different “spanning trees” for a simple graph, a grid like those used in much scientific computing. The speedups promised by a new MIT algorithm require “low-stretch” spanning trees (green), in which the paths between neighboring nodes don’t become excessively long (red).

I can’t beat this article at its own game… except to clarify that ‘solving graph Laplacians’ means solving Poisson’s equation with a graph Laplacian replacing the usual Laplacian.

So, let me just supplement this article with the nasty equations saying what a graph Laplacian actually is. Start with a graph. More precisely, start with a simple graph. Such a graph has a set of vertices $V$ and a set of edges $E \subseteq V \times V,$ such that

$(x,y) \in E \implies (y,x) \in E$

which says the edges are undirected, and

$(x,x) \notin E$

which says there are no loops.

The graph Laplacian is an operator $H$ that takes a function on the vertices of our graph,

$\phi : V \to \mathbb{R}$

and gives a new such function $H\phi,$ as follows:

$\displaystyle{ (H \phi)(x) = \sum_{y \,\, \textrm{such that} \, \,(x,y) \in E} \!\!\!\!\!\!\!\!\!\!\! (\phi(y) - \phi(x)) }$

The version of Poisson’s equation for this graph Laplacian is thus

$H \phi = \rho$

But I should warn you: this operator $H$ has eigenvalues that are less than equal to zero, like the usual Laplacian $\nabla^2.$ People often insert a minus sign to make the eigenvalues ≥ 0.

There is a huge amount to say about graph Laplacians! If you want, you can learn more here:

• Michael William Newman, The Laplacian Spectrum of Graphs, Masters Thesis, Department of Mathematics, University of Manitoba, 2000.

I’ve been learning about some of their applications here:

• Ernesto Estrada, The Structure of Complex Networks: Theory and Applications, Oxford University Press, Oxford, 2011.

I hope sometime to summarize a bit of this material and push the math forward a bit. So, it was nice to see new progress on efficient algorithms for doing computations with graph Laplacians!

## Quantum Techniques for Studying Equilibrium in Reaction Networks

16 May, 2013

The summer before last, I invited Brendan Fong to Singapore to work with me on my new ‘network theory’ project. He quickly came up with a nice new proof of a result about mathematical chemistry. We blogged about it, and I added it to my book, but then he became a grad student at Oxford and got distracted by other kinds of networks—namely, Bayesian networks.

So, we’ve just now finally written up this result as a self-contained paper:

• John Baez and Brendan Fong, Quantum techniques for studying equilibrium in reaction networks.

Check it out and let us know if you spot mistakes or stuff that’s not clear!

The idea, in brief, is to use math from quantum field theory to give a somewhat new proof of the Anderson–Craciun–Kurtz theorem.

This remarkable result says that in many cases, we can start with an equilibrium solution of the ‘rate equation’ which describes the behavior of chemical reactions in a deterministic way in the limit of a large numbers of molecules, and get an equilibrium solution of the ‘master equation’ which describes chemical reactions probabilistically for any number of molecules.

The trick, in our approach, is to start with a chemical reaction network, which is something like this:

and use it to write down a Hamiltonian describing the time evolution of the probability that you have various numbers of each kind of molecule: A, B, C, D, E, … Using ideas from quantum mechanics, we can write this Hamiltonian in terms of annihilation and creation operators—even though our problem involves probability theory, not quantum mechanics! Then we can write down the equilibrium solution as a ‘coherent state’. In quantum mechanics, that’s a quantum state that approximates a classical one as well as possible.

All this is part of a larger plan to take tricks from quantum mechanics and apply them to ‘stochastic mechanics’, simply by working with real numbers representing probabilities instead of complex numbers representing amplitudes!

I should add that Brendan’s work on Bayesian networks is also very cool, and I plan to talk about it here and even work it into the grand network theory project I have in mind. But this may take quite a long time, so for now you should read his paper:

• Brendan Fong, Causal theories: a categorical perspective on Bayesian networks.

## The Foundations of Applied Mathematics

1 May, 2013

Suppose we take “applied mathematics” in an extremely broad sense that includes math developed for use in electrical engineering, population biology, epidemiology, chemistry, and many other fields. Suppose we look for mathematical structures that repeatedly appear in these diverse contexts — especially structures that aren’t familiar to pure mathematicians. What do we find? The answers may give us some clues about how to improve the foundations of mathematics!

This is what I’m talking about at the Category-Theoretic Foundations of Mathematics Workshop at U.C. Irvine this weekend.

You can see my talk slides here. You can click on any picture or anything written in blue in these slides to get more information — for example, references.

## Petri Net Programming (Part 3)

19 April, 2013

guest post by David Tanzer

### The role of differential equations

Last time we looked at stochastic Petri nets, which use a random event model for the reactions. Individual entities are represented by tokens that flow through the network. When the token counts get large, we observed that they can be approximated by continuous quantities, which opens the door to the application of continuous mathematics to the analysis of network dynamics.

A key result of this approach is the “rate equation,” which gives a law of motion for the expected sizes of the populations. Equilibrium can then be obtained by solving for zero motion. The rate equations are applied in chemistry, where they give the rates of change of the concentrations of the various species in a reaction.

But before discussing the rate equation, here I will talk about the mathematical form of this law of motion, which consists of differential equations. This form is naturally associated with deterministic systems involving continuous magnitudes. This includes the equations of motion for the sine wave:

and the graceful ellipses that are traced out by the orbits of the planets around the sun:

This post provides some mathematical context to programmers who have not worked on scientific applications. My goal is to get as many of you on board as possible, before setting sail with Petri net programming.

### Three approaches to equations: theoretical, formula-based, and computational

Let’s first consider the major approaches to equations in general. We’ll illustrate with a Diophantine equation

$x^9 + y^9 + z^9 = 2$

where $x, y$ and $z$ are integer variables.

In the theoretical approach (aka “qualitative analysis”), we start with the meaning of the equation and then proceed to reason about its solutions. Here are some simple consequences of this equation. They can’t all be zero, can’t all be positive, can’t all be negative, can’t all be even, and can’t all be odd.

In the formula-based approach, we seek formulas to describe the solutions. Here is an example of a formula (which does not solve our equation):

$\{(x,y,z) | x = n^3, y = 2n - 4, z = 4 n | 1 \leq n \leq 5 \}$

Such formulas are nice to have, but the pursuit of them is diabolically difficult. In fact, for Diophantine equations, even the question of whether an arbitrarily chosen equation has any solutions whatsoever has been proven to be algorithmically undecidable.

Finally, in the computational approach, we seek algorithms to enumerate or numerically approximate the solutions to the equations.

### The three approaches to differential equations

Let’s apply the preceding classification to differential equations.

#### Theoretical approach

A differential equation is one that constrains the rates at which the variables are changing. This can include constraints on the rates at which the rates are changing (second-order equations), etc. The equation is ordinary if there is a single independent variable, such as time, otherwise it is partial.

Consider the equation stating that a variable increases at a rate equal to its current value. The bigger it gets, the faster it increases. Given a starting value, this determines a process — the solution to the equation — which here is exponential growth.

Let $X(t)$ be the value at time $t,$ and let’s initialize it to 1 at time 0. So we have:

$X(0) = 1$

$X'(t) = X(t)$

These are first-order equations, because the derivative is applied at most once to any variable. They are linear equations, because the terms on each side of the equations are linear combinations of either individual variables or derivatives (in this case all of the coefficients are 1). Note also that a system of differential equations may in general have zero, one, or multiple solutions. This example belongs to a class of equations which are proven to have a unique solution for each initial condition.

You could imagine more complex systems of equations, involving multiple dependent variables, all still depending on time. That includes the rate equations for a Petri net, which have one dependent variable for each of the population sizes. The ideas for such systems are an extension of the ideas for a single-variable system. Then, a state of the system is a vector of values, with one component for each of the dependent variables. For first-order systems, such as the rate equations, where the derivatives appear on the left-hand sides, the equations determine, for each possible state of the system, a “direction” and rate of change for the state of the system.

Now here is a simple illustration of what I called the theoretical approach. Can $X(t)$ ever become negative? No, because it starts out positive at time 0, and in order to later become negative, it must be decreasing at a time $t_1$ when it is still positive. That is to say, $X(t_1) > 0$, and $X'(t_1) < 0$. But that contradicts the assumption $X'(t) = X(t)$. The general lesson here is that we don’t need a solution formula in order to make such inferences.

For the rate equations, the theoretical approach leads to substantial theorems about the existence and structure of equilibrium solutions.

#### Formula-based approach

It is natural to look for concise formulas to solve our equations, but the results of this overall quest are largely negative. The exponential differential equation cannot be solved by any formula that involves a finite combination of simple operations. So the solution function must be treated as a new primitive, and given a name, say $\exp(t)$. But even when we extend our language to include this new symbol, there are many differential equations that remain beyond the reach of finite formulas. So an endless collection of primitive functions is called for. (As standard practice, we always include $exp(t),$ and its complex extensions to the trigonometric functions, as primitives in our toolbox.)

But the hard mathematical reality does not end here, because even when solution formulas do exist, finding them may call for an ace detective. Only for certain classes of differential equations, such as the linear ones, do we have systematic solution methods.

The picture changes, however, if we let the formulas contain an infinite number of operations. Then the arithmetic operators give a far-reaching base for defining new functions. In fact, as you can verify, the power series

$X(t) = 1 + t + t^2/2! + t^3/3! + ...$

which we view as an “infinite polynomial” over the time parameter t, exactly satisfies our equations for exponential motion, $X(0) = 1$ and $X'(t) = X(t).$ This power series therefore defines $\exp(t).$ By the way, applying it to the input 1 produces a definition for the transcendental number $e$:

$e = X(1) = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + ... \approx 2.71828$

#### Computational approach

Let’s leave aside our troubles with formulas, and consider the computational approach. For broad classes of differential equations, there are approximation algorithms that be successfully applied.

For starters, any power series that satisfies a differential equation may work for a simple approximation method. If a series is known to converge over some range of inputs, then one can approximate the value at those points by stopping the computation after a finite number of terms.

But the standard methods work directly with the equations, provided that they can be put into the right form. The simplest one is called Euler’s method. It works over a sampling grid of points separated by some small number $\epsilon$. Let’s take the case where we have a first-order equation in explicit form, which means that $X'(t) = f(X(t))$ for some function $f.$

We begin with the initial value $X(0)$. Applying $f,$ we get $X'(0) = f(X(0)).$ Then for the interval from 0 to $\epsilon$,we use a linear approximation for $X(t)$, by assuming that the derivative remains constant at $X'(0).$ That gives $X(\epsilon) = X(0) + \epsilon \cdot X'(0).$ Next, $X'(\epsilon) = f(X(\epsilon)),$ and $X(2 \epsilon) = X(\epsilon) + \epsilon \cdot X'(\epsilon),$ etc. Formally,

$X(0) = \textrm{initial}$

$X((n+1) \epsilon) = X(n \epsilon) + \epsilon f(X(n \epsilon))$

Applying this to our exponential equation, where $f(X(t)) = X(t),$ we get:

$X(0) = 1$

$X((n+1) \epsilon) = X(n \epsilon) + \epsilon X(n \epsilon) = X(n \epsilon) (1 + \epsilon)$

Hence:

$X(n \epsilon) = (1 + \epsilon) ^ n$

So the approximation method gives a discrete exponential growth, which converges to a continuous exponential in the limit as the mesh size goes to zero.

Note, the case we just considered has more generality than might appear at first, because (1) the ideas here are easily extended to systems of explicit first order equations, and (2) higher-order equations that are “explicit” in an extended sense—meaning that the highest-order derivative is expressed as a function of time, of the variable, and of the lower-order derivatives—can be converted into an equivalent system of explicit first-order equations.

### The challenging world of differential equations

So, is our cup half-empty or half-full? We have no toolbox of primitive formulas for building the solutions to all differential equations by finite compositions. And even for those which can be solved by formulas, there is no general method for finding the solutions. That is how the cookie crumbles. But on the positive side, there is an array of theoretical tools for analyzing and solving important classes of differential equations, and numerical methods can be applied in many cases.

The study of differential equations leads to some challenging problems, such as the Navier-Stokes equations, which describe the flow of fluids.

These are partial differential equations involving flow velocity, pressure, density and external forces (such as gravity), all of which vary over space and time. There are non-linear (multiplicative) interactions between these variables and their spatial and temporal derivatives, which leads to complexity in the solutions.

At high flow rates, this complexity can produce chaotic solutions, which involve complex behavior at a wide range of resolution scales. This is turbulence. Here is an insightful portrait of turbulence, by Leonardo da Vinci, whose studies in turbulence date back to the 15th Century.

Turbulence, which has been described by Richard Feynman as the most important unsolved problem of classical physics, also presents a mathematical puzzle. The general existence of solutions to the Navier-Stokes equations remains unsettled. This is one of the “Millennium Prize Problems”, for which a one million dollar prize is offered: in three dimensions, given initial values for the velocity and scalar fields, does there exist a solution that is smooth and everywhere defined? There are also complications with grid-based numerical methods, which will fail to produce globally accurate results if the solutions contain details at a smaller scale than the grid mesh. So the ubiquitous phenomenon of turbulence, which is so basic to the movements of the atmosphere and the seas, remains an open case.

But fortunately we have enough traction with differential equations to proceed directly with the rate equations for Petri nets. There we will find illuminating equations, which are the subject of both major theorems and open problems. They are non-linear and intractable by formula-based methods, yet, as we will see, they are well handled by numerical methods.

## Probability Theory and the Undefinability of Truth

31 March, 2013

In 1936 Tarski proved a fundamental theorem of logic: the undefinability of truth. Roughly speaking, this says there’s no consistent way to extend arithmetic so that it talks about ‘truth’ for statements about arithmetic. Why not? Because if we could, we could cook up a statement that says “I am not true.” This would lead to a contradiction, the Liar Paradox: if this sentence is true then it’s not, and if it’s not then it is.

This is why the concept of ‘truth’ plays a limited role in most modern work on logic… surprising as that might seem to novices!

However, suppose we relax a bit and allow probability theory into our study of arithmetic. Could there be a consistent way to say, within arithmetic, that a statement about arithmetic has a certain probability of being true?

We can’t let ourselves say a statement has a 100% probability of being true, or a 0% probability of being true, or we’ll get in trouble with the undefinability of truth. But suppose we only let ourselves say that a statement has some probability greater than $a$ and less than $b$, where $0 < a < b < 1.$ Is that okay?

Yes it is, according to this draft of a paper:

• Paul Christiano, Eliezer Yudkowsky, Marcello Herresho ff and Mihaly Barasz, De finability of “Truth” in Probabilistic Logic
(Early draft)
, 28 March 2013.

But there’s a catch, or two. First there are many self-consistent ways to assess the probability of truth of arithmetic statements. This suggests that the probability is somewhat ‘subjective’ . But that’s fine if you think probabilities are inherently subjective—for example, if you’re a subjective Bayesian.

A bit more problematic is this: their proof that there exists a self-consistent way to assess probabilities is not constructive. In other words, you can’t use it to actually get your hands on a consistent assessment.

Fans of game theory will be amused to hear why: the proof uses Kakutani’s fixed point theorem! This is the result that John Nash used to prove games have equilibrium solutions, where nobody can improve their expected payoff by changing their strategy. And this result is not constructive.

In game theory, we use Kakutani’s fixed point theorem by letting each player update their strategy, improving it based on everyone else’s, and showing this process has a fixed point. In probabilistic logic, the process is instead that the thinker reflects on what they know, and updates their assessment of probabilities.

### The statement

I have not yet carefully checked the proof of Barasz, Christiano, Herreshoff and Yudkowsky’s result. Some details have changed in the draft since I last checked, so it’s probably premature to become very nitpicky. But just to encourage technical discussions of this subject, let me try stating the result a bit more precisely. If you don’t know Tarski’s theorem, go here:

Tarski’s undefinability theorem, Wikipedia.

I’ll assume you know that and are ready for the new stuff!

The context of this work is first-order logic. So, consider any language $L$ in first-order logic that lets us talk about natural numbers and also rational numbers. Let $L'$ be the language $L$ with an additional function symbol $\mathbb{P}$ thrown in. We require that $\mathbb{P}(n)$ be a rational number whenever $n$ is a natural number. We want $\mathbb{P}(n)$ to stand for the probability of the truth of the sentence whose Gödel number is $n.$ This will give a system that can reflect about probability that what it’s saying is true.

So, suppose $T$ is some theory in the language $L'.$ How can we say that the probability function $\mathbb{P}$ has ‘reasonable opinions’ about truth, assuming that the axioms of $T$ are true?

The authors have a nice way of answering this. First they consider any function $P$ assigning a probability to each sentence of $L'.$ They say that $P$ is coherent if there is a probability measure on the set of models of $L'$ such that $P(\phi)$ is the measure of the set of models in which $\phi$ is satisfied. They show that $P$ is coherent iff these three conditions hold:

1) $P(\phi) = P(\phi \wedge \psi) + P(\phi \wedge \lnot \psi)$ for all sentences $\phi, \psi.$

2) $P(\phi) = 1$ for each tautology.

3) $P(\phi) = 0$ for each contradiction.

(By the way, it seems to me that 1) and 2) imply $P(\phi) + P(\lnot \phi) = 1$ and thus 3). So either they’re giving a slightly redundant list of conditions because they feel in the mood for it, or they didn’t notice this list was redundant, or it’s not and I’m confused. It’s good to always say a list of conditions is redundant if you know it is. You may be trying to help your readers a bit, and it may seem obvious to you, but it you don’t come out and admit the redundancy, you’ll make some of your readers doubt their sanity.)

(Also by the way, they don’t say how they’re making the set of all models into a measurable space. But I bet they’re using the σ-algebra where all subsets are measurable, and I think there’s no problem with the fact that this set is very large: a proper class, I guess! If you believe in the axiom of universes, you can just restrict attention to ‘small’ models… and your probability measure will be supported on a countable set of models, since an uncountable sum of positive numbers always diverges, so the largeness of the set of these models is largely irrelevant.)

So, let’s demand that $P$ be coherent. And let’s demand that $P(\phi) = 1$ whenever the sentence $\phi$ is one of the axioms of $T.$

At this point, we’ve got this thing $P$ that assigns a probability to each sentence in our language. We’ve also got this thing $\mathbb{P}$ in our language, such that $\mathbb{P}(n)$ is trying to be the probability of the truth of the sentence whose Gödel number is $n.$ But so far these two things aren’t connected.

To connect them, they demand a reflection principle: for any sentence $\phi$ and any rational numbers $0 < a < b < 1,$

$a < P(\phi) < b \implies P(a < \mathbb{P}(\ulcorner \phi \urcorner) < b) = 1$

Here $\ulcorner \phi \urcorner$ is the Gödel number of the sentence $\phi.$ So, this principle says that if a sentence has some approximate probability of being true, the thinker—as described by $\mathbb{P}$—knows this. They can’t know precise probabilities, or we’ll get in trouble. Also, making the reflection principle into an if and only if statement:

$a < P(\phi) < b \iff P(a < \mathbb{P}(\ulcorner \phi \urcorner) < b) = 1$

is too strong. It leads to a contradictions, very much as in Tarski’s original theorem on the undefinability of truth! However, in the latest draft of the paper, the authors seem to have added a weak version of the converse to their formulation of the reflection principle.

Anyway, the main theorem they’re claiming is this:

Theorem (Barasz, Christiano, Herresho ff and Yudkowsky). There exists a function $P$ assigning a probability to each sentence of $L',$ such that

1) $P$ is coherent,

2) $P(\phi) = 1$ whenever the sentence $\phi$ is one of the axioms of $T,$

and

3) the reflection principle holds. 

## Game Theory (Part 20)

11 March, 2013

Last time we tackled von Neumann’s minimax theorem:

Theorem. For every zero-sum 2-player normal form game,

$\displaystyle{\min_{q'} \max_{p'} \; p' \cdot A q' = \max_{p'} \min_{q'} \; p' \cdot A q'}$

where $p'$ ranges over player A’s mixed strategies and $q'$ ranges over player B’s mixed strategies.

We reduced the proof to two geometrical lemmas. Now let’s prove those… and finish up the course!

But first, let me chat a bit about this theorem. Von Neumann first proved it in 1928. He later wrote:

As far as I can see, there could be no theory of games … without that theorem … I thought there was nothing worth publishing until the Minimax Theorem was proved.

Von Neumann’s gave several proofs of this result:

• Tinne Hoff Kjeldesen, John von Neumann’s conception of the minimax theorem: a journey through different mathematical contexts, Arch. Hist. Exact Sci. 56 (2001) 39–68.

In 1937 he gave a proof which became quite famous, based on an important result in topology: Brouwer’s fixed point theorem. This says that if you have a ball

$B = \{ x \in \mathbb{R}^n : \|x\| \le 1 \}$

and a continuous function

$f: B \to B$

then this function has a fixed point, meaning a point $x \in B$ with

$f(x) = x$

You’ll often seen Brouwer’s fixed point theorem in a first course on algebraic topology, though John Milnor came up with a proof using just multivariable calculus and a bit more.

After von Neumann proved his minimax theorem using Brouwer’s fixed point theorem, the mathematician Shizuo Kakutani proved another fixed-point theorem in 1941, which let him get the minimax theorem in a different way. This is now called Kakutani fixed-point theorem.

In 1949, John Nash generalized von Neumann’s result to nonzero-sum games with any number of players: they all have Nash equilibria if we let ourselves use mixed strategies! His proof is just one page long, and it won him the Nobel prize!

Nash’s proof used the Kakutani fixed-point theorem. There is also a proof of Nash’s theorem using Brouwer’s fixed-point theorem; see here for the 2-player case and here for the n-player case.

Apparently when Nash explained his result to von Neumann, the latter said:

That’s trivial, you know. That’s just a fixed point theorem.

Maybe von Neumann was a bit jealous?

I don’t know a proof of Nash’s theorem that doesn’t use a fixed-point theorem. But von Neumann’s original minimax theorem seems to be easier. The proof I showed you last time comes from Andrew Colman’s book Game Theory and its Applications in the Social and Biological Sciences. In it, he writes:

In common with many people, I first encountered game theory in non-mathematical books, and I soon became intrigued by the minimax theorem but frustrated by the way the books tiptoed around it without proving it. It seems reasonable to suppose that I am not the only person who has encountered this problem, but I have not found any source to which mathematically unsophisticated readers can turn for a proper understanding of the theorem, so I have attempted in the pages that follow to provide a simple, self-contained proof with each step spelt out as clearly as possible both in symbols and words.

There are other proofs that avoid fixed-point theorems: for example, there’s one in Ken Binmore’s book Playing for Real. But this one uses transfinite induction, which seems a bit scary and distracting! So far, Colman’s proof seems simplest, but I’ll keep trying to do better.

### The lemmas

Now let’s prove the two lemmas from last time. A lemma is an unglamorous result which we use to prove a theorem we’re interested in. The mathematician Paul Taylor has written:

Lemmas do the work in mathematics: theorems, like management, just take the credit.

Let’s remember what we were doing. We had a zero-sum 2-player normal-form game with an $m \times n$ payoff matrix $A$. The entry $A_{ij}$ of this matrix says A’s payoff when player A makes choice $i$ and player B makes choice $j$. We defined this set:

$C = \{ A q' : \; q' \textrm{ is a mixed strategy for B} \} \subseteq \mathbb{R}^m$

For example, if

$\displaystyle{ A = \left( \begin{array}{rrr} 2 & 10 & 4 \\-2 & 1 & 6 \end{array} \right) }$

then $C$ looks like this:

We assumed that

$\displaystyle{ \min_{q'} \max_{p'} \; p' \cdot A q' > 0}$

This means there exists $p'$ with

$\displaystyle{ p' \cdot A q' > 0}$

and this implies that at least one of the numbers $(Aq')_i$ must be positive. So, if we define a set $N$ by

$\displaystyle{ N = \{(x_1, \dots, x_m) : x_i \le 0 \textrm{ for all } i\} \subseteq \mathbb{R}^m }$

then $Aq'$ can’t be in this set:

$\displaystyle{ Aq' \notin N }$

In other words, the set $C \cap N$ is empty.

Here’s what $C$ and $N$ look like in our example:

Next, we choose a point in $N$ and a point in $C$:

• let $r$ be a point in $N$ that’s as close as possible to $C,$

and

• let $s$ be a point in $C$ that’s as close as possible to $r,$

These points $r$ and $s$ need to be different, since $C \cap N$ is empty. Here’s what these points and the vector $s - r$ look like in our example:

To finish the job, we need to prove two lemmas:

Lemma 1. $r \cdot (s-r) = 0,$ $s_i - r_i \ge 0$ for all $i,$ and $s_i - r_i > 0$ for at least one $i.$

Proof. Suppose $r'$ is any point in $N$ whose coordinates are all the same those of $r,$ except perhaps one, namely the $i$th coordinate for one particular choice of $i.$ By the way we’ve defined $s$ and $r$, this point $r'$ can’t be closer to $s$ than $r$ is:

$\| r' - s \| \ge \| r - s \|$

This means that

$\displaystyle{ \sum_{j = 1}^m (r_j' - s_j)^2 \ge \sum_{j = 1}^m (r_j - s_j)^2 }$

But since $r_j' = r_j$ except when $j = i,$ this implies

$(r_i' - s_i)^2 \ge (r_i - s_i)^2$

Now, if $s_i \le 0$ we can take $r'_i = s_i.$ In this case we get

$0 \ge (r_i - s_i)^2$

so $r_i = s_i.$ On the other hand, if $s_i > 0$ we can take $r'_i = 0$ and get

$s_i^2 \ge (r_i - s_i)^2$

which simplifies to

$2 r_i s_i \ge r_i^2$

But $r_i \le 0$ and $s_i > 0,$ so this can only be true if $r_i = 0.$

In short, we know that either

$r_i = s_i$

or

$s_i > 0$ and $r_i = 0.$

So, either way we get

$(s_i - r_i) r_i = 0$

Since $i$ was arbitrary, this implies

$\displaystyle{ (s - r) \cdot r = \sum_{i = 1}^m (s_i - r_i) r_i = 0 }$

which is the first thing we wanted to show. Also, either way we get

$s_i - r_i \ge 0$

which is the second thing we wanted. Finally, $s_i - r_i \ge 0$ but we know $s \ne r,$ so

$s_i - r_i > 0$

for at least one choice of $i.$ And this is the third thing we wanted!   █

Lemma 2. If $Aq'$ is any point in $C$, then

$(s-r) \cdot Aq' \ge 0$

Proof. Let’s write

$Aq' = a$

for short. For any number $t$ between $0$ and $1$, the point

$ta + (1-t)s$

is on the line segment connecting the points $a$ and $s.$ Since both these points are in $C,$, so is this point $ta + (1-t)s,$ because the set $C$ is convex. So, by the way we’ve defined $s$ and $r$, this point can’t be closer to $r$ than $s$ is:

$\| r - (ta + (1-t)s) \| \ge \| r - s \|$

This means that

$\displaystyle{ (r + (1-t)s - ta) \cdot (r + (1-t)s - ta) \ge (r - s) \cdot (r - s) }$

With some algebra, this gives

$\displaystyle{ 2 (a - s)\cdot (s - r) \ge -t (a - s) \cdot (a - s) }$

Since we can make $t$ as small as we want, this implies that

$\displaystyle{ (a - s)\cdot (s - r) \ge 0 }$

or

$\displaystyle{ a \cdot (s - r) \ge s \cdot (s - r)}$

or

$\displaystyle{ a \cdot (s - r) \ge (s - r) \cdot (s - r) + r \cdot (s - r)}$

By Lemma 1 we have $r \cdot (s - r) \ge 0,$ and the dot product of any vector with itself is nonnegative, so it follows that

$\displaystyle{ a \cdot (s - r) \ge 0}$

And this is what we wanted to show!   █

### Conclusion

Proving lemmas is hard work, and unglamorous. But if you remember the big picture, you’ll see how great this stuff is.

We started with a very general concept of two-person game. Then we introduced probability theory and the concept of ‘mixed strategy’. Then we realized that the expected payoff of each player could be computed using a dot product! This brings geometry into the subject. Using geometry, we’ve seen that every zero-sum game has at least one ‘Nash equilibrium’, where neither player is motivated to change what they do—at least if they’re rational agents.

And this is how math works: by taking a simple concept and thinking about it very hard, over a long time, we can figure out things that are not at all obvious.

For game theory, the story goes much further than we went in this course. For starters, we should look at nonzero-sum games, and games with more than two players. John Nash showed these more general games still have Nash equilibria!

Then we should think about how to actually find these equilibria. Merely knowing that they exist is not good enough! For zero-sum games, finding the equilibria uses a subject called linear programming. This is a way to maximize a linear function given a bunch of linear constraints. It’s used all over the place—in planning, routing, scheduling, and so on.

Game theory is used a lot by economists, for example in studying competition between firms, and in setting up antitrust regulations. For that, try this book:

• Lynne Pepall, Dan Richards and George Norman, Industrial Organization: Contemporary Theory and Empirical Applications, Blackwell, 2008.

For these applications, we need to think about how people actually play games and make economic decisions. We aren’t always rational agents! So, psychologists, sociologists and economists do experiments to study what people actually do. The book above has a lot of case studies, and you can learn more here:

• Andrew Colman, Game Theory and its Applications in the Social and Biological Sciences, Routledge, London, 1982.

As this book title hints, we should also think about how game theory enters into biology. Evolution can be seen as a game where the winning genes reproduce and the losers don’t. But it’s not all about competition: there’s a lot of cooperation involved. Life is not a zero-sum game! Here’s a good introduction to some of the math:

• William H. Sandholm, Evolutionary game theory, 12 November 2007.

For more on the biology, get ahold of this classic text:

• John Maynard Smith, Evolution and the Theory of Games, Cambridge University Press, 1982.

And so on. We’ve just scratched the surface!