Game Theory (Part 2)

13 January, 2013

Last time we classified games in a few ways. This time we’ll start by looking at a very simple class of games: simultaneous noncooperative two-player games.

Simultaneous games

Remember that in a simultaneous game, each player makes their moves without knowing anything about the other player’s move. Thanks to this, we can condense each player’s move into a single move. For example, in a card game, if one player lays down a king and then an ace, we can mathematically treat this as a single move, called “lay down a king and then an ace”. So, we’ll say each player makes just one move—and they make it without knowing the other player’s move.

In class we’ll play these games like this. I will decide on my move and write it down on a piece of paper. You’ll make your move and click either A,B,C,D, or E on your clicker.

Then I’ll reveal my piece of paper! At that point, we’ll each know what both of us did… but neither of us can change our move.

So, we each make our move without knowing each other’s move.

Two-player games

Since lots of you will be clicking your clicker at once, you could say there are more than two players in this game. But your payoff—the number of points you win or lose—will depend only on what you did and what I did. So, we can treat this game as a bunch of independent two-player games—and that’s what we’ll do.

Noncooperative games

Remember, we use words in funny ways in mathematics! An ‘imaginary’ number is not imaginary in the usual sense; a ‘partial’ differential equation isn’t just part of a differential equation, and so on. In game theory we use the word ‘noncooperative’ in a funny way. We say a game is noncooperative if the players aren’t able to form binding commitments. This means that when we play our games, you and I can’t talk before the game and promise to do certain things.

There will, however, be games where both of us win if we make the right choice, and both of us lose if we don’t! In games like this, if we can figure out how to cooperate without communicating ahead of time and making promises, that’s allowed!

Chicken

Now let’s actually look at an example: the game of chicken. In this game we drive toward each other at high speed along a one-lane road in the desert. The one who swerves off the road at the last minute gets called a chicken, and the other driver gets called a hero. If we both swerve off the road at the last minute, we’re both called chickens. But if neither of us does, our cars crash and we both die!

Sounds fun, eh?

In real life we could each wait as long as possible and see if the other driver starts to swerve. This makes chicken into a sequential rather than simultaneous game! You could also promise that you wouldn’t swerve. This makes chicken into a cooperative game!

Indeed there are all sorts of variations and complications in real life. You can see some in the famous movie Rebel Without a Cause, starring James Dean. Take a look at what happens:

This movie version actually involves driving toward a cliff and jumping out at the last possible moment.

But mathematics, as usual, is about finding problems that are simple enough to state precisely. So in our simple mathematical version of chicken, we’ll say each player has just two choices:

1: stay on the road.
2: swerve off the road at the last second.

Also, we’ll express our payoffs in terms of numbers. A negative payoff is bad, a positive one is good:

• If either player swerves off the road they get called a chicken, which is bad, so let’s say they get -1 points.

• If one player stays on the road and other swerves off the road, the one who stays on the road gets called a hero, so let’s say they get 1 point.

• If both players stay on the road they both die, so let’s say they both get -10 points.

We can summarize all this in a little table:

1 2
1    (-10,-10)   (1,-1)
2 (-1,1) (-1,-1)

Let’s say the players are you and me. Your choices 1 and 2 are shown in in black: you get to pick which row of the table we use. My choices 1 and 2 are in red: I get to pick which column of the table we use.

There are four possible ways we can play the game. For each of the four possibilities we get a pair of numbers. The first number, in black, is your payoff. The second, in red, is my payoff.

For example, suppose you choose 1 and I choose 2. Then you’re a hero and I’m a chicken. So, your payoff is 1 and mine is -1. That’s why we get the pair of numbers (1,-1) in the 1st row and 2nd column of this table.

Now let’s play this game a bit! Later we’ll study it in different ways.

Rock-paper-scissors

Here’s another famous game: rock-paper-scissors.

Each player can choose either rock, paper or scissors. Paper beats rock, scissors beats paper, and rock beats scissors. In these cases let’s say the winner gets a payoff of 1, while the loser gets a payoff of -1. If both players make the same choice, it’s a tie, so let’s say both players get a payoff of 0.

Here’s a table that describes this game:

rock paper scissors
rock    (0,0)   (-1,1)   (1,-1)  
paper     (1,-1)   (0,0)   (-1,1)  
scissors     (-1,1)   (1,-1)   (0,0)  

Your choices and payoffs are in black, while mine are in red.

For example, if you choose rock and I choose paper, we can look up what happens, and it’s (-1,1). That means your payoff is -1 while mine is 1. So I win!

To make this table look more mathematical, we can make up numbers for our choices:

1: rock
2: paper
3: scissors

Then the table looks like this:

1 2 3
1    (0,0)   (-1,1)   (1,-1)  
2 (1,-1)   (0,0)   (-1,1)  
3 (-1,1)   (1,-1)   (0,0)  

Let’s play this game a bit, and then discuss it!

Normal form

In the games we’re studying now, each player can make various choices. In game theory these choices are often called pure strategies. We’ll see why later on in this course.

In our examples so far, each player has the same set of pure strategies. But this is not required! You could have some set of pure strategies and I could have some other set.

For now let’s only think about games where we both have a finite set of pure strategies. For example, you could have 4 pure strategies and I could have 2. Then we could have a game like this:

1 2
1    (0,0)   (-1,1)  
2 (2,-1)   (0,0)  
3 (-2,1)   (1,-1)  
4 (0,1)   (2,0)  

This way of describing a game using a table of pairs of numbers is called normal form, and you can read about it here:

Normal-form game, Wikipedia.

There are other ways to describe the same information. For example, instead of writing

1 2
1    (0,0)   (-1,1)  
2 (2,-1)   (0,0)  
3 (-2,1)   (1,-1)  
4 (0,1)   (2,0)  

we can write everything in black:

1 2
1    (0,0)   (-1,1)  
2 (2,-1)   (0,0)  
3 (-2,1)   (1,-1)  
4 (0,1)   (2,0)  

All the information is still there! It’s just a bit harder to see. The colors are just to make it easier on you.

Mathematicians like matrices, which are rectangular boxes of numbers. So, it’s good to use these to describe normal-form games. To do this we take our table and chop it into two. We write one matrix for your payoffs:

A = \left( \begin{array}{rr} 0 & -1 \\ 2 & 0 \\ -2 & 1 \\ 0 & 2 \end{array} \right)

and one for mine:

B = \left( \begin{array}{rr} 0 & 1 \\ -1 & 0 \\ 1 & -1 \\ 1 & 0 \end{array} \right)

The number in the ith row and jth column of the matrix A is called A_{i j}, and similarly for B. For example, if you pick choice 3 in this game and I pick choice 2, your payoff is

A_{32} = 1

and my payoff is

B_{32} = -1

Definition

Let’s summarize everything we’ve learned today! Remember, an m \times n matrix has m rows and n columns. So, we can say:

Definition. A 2-player normal-form game consists of two m \times n matrices of real numbers, A and B.

This definition is very terse and abstract. That’s what mathematicians like! But we have to unfold it a bit to understand it.

Let’s call you ‘player A’ and me ‘player B’. Then the idea here is that player A can choose among pure strategies i = 1,2,\dots , m while player B can choose among pure strategies j = 1,2,\dots, n. Suppose player A makes choice i and player B makes choice j. Then the payoff to player A is A_{i j}, and the payoff to player B is B_{i j}.


A Bet Concerning Neutrinos (Part 5)

7 January, 2013

It’s a little-known spinoff of Heisenberg’s uncertainty principle. When you accurately measure the velocity of neutrinos, they can turn into ham!

I observed this myself. It came in the mail along with some sausages, bacon, and peach and blueberry syrup. They’re from Heather Vandagriff. Thanks, Heather!

These are the first of my winnings on some bets concerning the famous OPERA experiment that seemed to detect neutrinos going faster than light. I bet that this experiment would be shown wrong. Heather bet me some Tennessee ham against some nice cloth from Singapore.

The OPERA team announced that they’d detected faster-than-light neutrinos back in September 2011. But later, they discovered two flaws in their experimental setup.

First, a fiber optic cable wasn’t screwed in right. This made it take about 70 nanoseconds longer than it should have for a signal from a global positioning system to the so-called ‘master clock’:

Since the clock got its signal late, the neutrinos seemed to show up early. Click on the picture for a more detailed explanation.

On top of this, the clock was poorly calibrated! This had a roughly opposite effect: it tended to make the neutrinos seem to show up late… but only some of the time. However, this effect was not big enough, on average, to cancel the other mistake.

The OPERA team fixed these problems and repeated the experiment in May 2012. The neutrinos came in slower than light:

• OPERA, Measurement of the neutrino velocity with the OPERA detector in the CNGS beam, 12 July 2012.

Three other experiments using the same neutrino source—Borexino, ICARUS, and LVD—also got the same result! For a more detailed post-mortem, with lots of references, see:

Faster-than-light neutrino anomaly, Wikipedia.

My wife Lisa has a saying from her days in the computer business: when in doubt, check the cables.


Game Theory (Part 1)

6 January, 2013

I’m teaching an undergraduate course on game theory and I thought I’d try writing my course notes on this blog. I invite students (and everyone else in the universe) to ask questions, correct my mistakes, add useful extra information and references, and so on.

However, I should warn the geniuses who usually read this blog that these notes will not be filled with deep insights: it’s just a introductory course and it’s the first time I’ve taught it.

I should also warn the students in my class that these notes are not a substitute for taking notes in class! I’ll tell you a lot of other stuff in class!

Let’s get started.

Examples of games

Mathematicians use words differently from normal people. When you hear the word ‘game’ you probably think about examples like:

• board games like checkers, chess, go, Scrabble, Monopoly, Risk and so on.

• card games like solitaire, poker, bridge, blackjack, and so on.

• sports involving athletic skill like football, basketball, baseball, hockey, tennis, squash, golf, croquet, horseshoes and so on.

• video games and computer games—I’m too much of an old fogey to even bother trying to list some currently popular ones.

• game shows on TV—ditto, though Jeopardy is still on the air.

• role-playing games—ditto, though some people still play Dungeons and Dragons.

• ‘war games’ used by armies to prepare for wars.

Game theory is relevant to all these games, but more generally to any situation where one or more players interact, each one trying to achieve their own goal. The players can be people but they don’t need to be: they can also be animals or even other organisms! So, game theory is also used to study

• biology

• economics

• politics

• sociology

• psychology

For example, here is a ‘game’ I often play with my students. Some students want to get a passing grade while coming to class as little as possible. I want to make them come to every class. But, I want to spend as little time on this task as possible.

What should I do? I don’t want to take attendance every day, because that takes a lot of time in a big class. I give quizzes on random days and make it hard for students to pass unless they take most of these quizzes. I only need to give a few quizzes, but the students need to come to every class, since they don’t know when a quiz will happen.

How do the students respond? Lots of them come to class every time. But some try to ‘bend the rules’ of the game, mainly trying to get my sympathy. They invent clever excuses for why they missed the quizzes: dying grandmothers, etc. They try to persuade me to ‘drop the lowest quiz score’. They try to convince me that it’s unreasonable to make quizzes count for so much of the grade. After all, it’s easy to get a bad score on a quiz, when there’s not much time to answer a question about something you just learned recently.

How do I respond? Only the last idea moves me. So, I set things up so missing a quiz gives you a much worse score than taking the quiz and getting it completely wrong. In fact, just to dramatize this, I give students a negative score if they miss a quiz.

How the students respond? Some of them argue that it’s somehow evil to give people negative scores. But at this point I bare my fangs, smile, and nod, and the game ends.

It’s important to emphasize that other students have different goals: some want to come to class every time and learn as much as possible! In this case my goals and the students’ goals don’t conflict very much. As we’ll see, in mathematical game theory there are plenty of games where the players cooperate as well as compete… or even just cooperate.

Mathematical game theory tends to focus on games where the most important aspect of the game is choosing among different strategies. Games where the most important aspect is physical ability are harder to analyze using mathematics. So are games where it’s possible to ‘bend the rules’ in a huge number of different ways.

Game theory works best when we can:

• list the set of choices each player can make,

• clearly describe what happens when each player makes a given choice.

• clearly describe the ‘payoff’ or ‘winnings’ for each player, which will of course depend on the choices all the players make.

Classifying games

We can classify games in many different ways. For example:

The number of players. There are single-player games (like solitaire), two-player games (like chess or ‘rock, paper, scissors’), and multi-player games (like poker or Monopoly).

Simultaneous versus sequential games. There are games where all players make their decisions simultaneously, or if they do not move simultaneously, the later players are unaware of the earlier players’ actions, making them effectively simultaneous. These are called simultaneous games. There are also games
where some players make decisions after knowing something about what other players have decided. These are called sequential games.

The games people play for fun are very often sequential, but a surprisingly large part of the game theory we’ll discuss in class focuses on simultaneous games. ‘Rock, paper, scissors’ is an example of a simultaneous game, but we’ll see many more.

Zero-sum versus nonzero-sum games. A zero-sum game is one where the total payoff to all the players is zero. Thus, any player benefits only at the expense of others.

An example of a zero-sum game is poker, because each player wins exactly the total amount their opponents lose (ignoring the possibility of the house’s cut). Chess, or any other two-player game with one winner and one loser, can also be seen as a zero-sum game: just say the winner wins $1 and the loser loses $1.

In nonzero-sum games, the total payoff to all players is not necessarily zero. An example is ‘chicken’, the game where two people drive their cars at each other, and both cars crash if neither pulls off the road. When this happens, both players lose. There are also games where both players can win.

In two-person zero-sum games, the players have no reason to cooperate, because whatever one wins, the other loses. In two-person nonzero-sum games, cooperation can be important.

Symmetric and non-symmetric games. In a symmetric game the same rules apply to each player. More precisely, each player has the same set of strategies to choose from, and the payoffs to each player are symmetrical when we interchange which player chooses which strategy.

In a non-symmetric game, this is not the case. For example, we can imagine a non-symmetric version of poker where my hand always contains at least two aces, while no other player’s does. This game is ‘unfair’, so people don’t play it for fun. But games in everyday life, like the teacher-student game I mentioned, are often non-symmetric, and not always fun.

Cooperative and non-cooperative games. A game is cooperative if the players are able to form binding commitments. That is, some players can promise to each that they will choose certain strategies, and these promises must be kept. In noncooperative games there is no way to make sure promises are kept.

Our legal system has the concept of a ‘contract’, which is a way people can make binding commitments.

Warning

There’s a lot more to say about these ways of classifying games. There are also other ways of classifying games that we haven’t discussed here. But you can already begin to classify games you know. Give it a try! You’ll run into some interesting puzzles.

For example, chess is a two-person zero-sum sequential non-cooperative game.

Is it symmetric? The strategies available to the first player, white, are different from those available to the second player, black. This is true of almost any sequential game where the players take turns moving. So, we can say it’s not symmetric.

Or, we can imagine that the first move of chess is flipping a coin to see which player goes first! Then the game becomes symmetric, because each player has an equal chance of becoming white or black.

Puzzle

I may put some puzzles on this blog, which are different than the homework for the course. You can answer them on the blog! If you’re a student in the course and you give a good answer, I’ll give you some extra credit.

Puzzle. What’s a ‘game of perfect information’ and what’s a ‘game of complete information’? What’s the difference?


Rolling Circles and Balls (Part 5)

2 January, 2013

Last time I promised to show you how the problem of a little ball rolling on a big stationary ball can be described using an 8-dimensional number system called the split octonions… if the big ball has a radius that’s 3 times the radius of the little one!

So, let’s get started.

First, I must admit that I lied.

Lying is an important pedagogical technique. The teacher simplifies the situation, so the student doesn’t get distracted by technicalities. Then later—and this is crucial!—the teacher admits that certain statements weren’t really true, and corrects them. It always makes me uncomfortable to do this. But it works better than dumping all the technical details on the students right away. In classes, I sometimes deal with my discomfort by telling the students: “Okay, now I’m going to lie a bit…”

What was my lie? Instead of an ordinary ball rolling on another ordinary ball, we need a ‘spinorial’ ball rolling on a ‘projective’ ball.

Let me explain that.

A spinorial ball

In physics, a spinor is a kind of particle that you need to turn around twice before it comes back to the way it was. Examples include electrons and protons.

If you give one of these particles a full 360° turn, which you can do using a magnetic field, it changes in a very subtle way. You can only detect this change using clever tricks. For example, take a polarized beam of electrons and send it through a barrier with two slits cut out. Each electron goes through both slits, because it’s a wave as well as a particle. Next, put a magnetic field next to one slit that’s precisely strong enough to rotate the electron by 360° if it goes through that slit. Then, make the beams recombine, and see how likely it is for electrons to be found at different locations. You’ll get different results than if you turn off the magnetic field that rotates the electron!

However, if you rotate a spinor by 720°—that is, two full turns—it comes back to exactly the way it was.

This may seem very odd, but when you understand the math of spinors you see it all makes sense. It’s a great example of how you have to follow the math where it leads you. If something is mathematically allowed, nature may take advantage of that possibility, regardless of whether it seems odd to you.

So, I hope you can imagine a ‘spinorial’ ball, which changes subtly when you turn it 360° around any axis, but comes back to its original orientation when you turn it around 720°. If you can’t, I’ll present the math more rigorously later on. That may or may not help.

A projective ball

What’s a ‘projective’ ball? It’s a ball whose surface is not a sphere, but a projective plane. A projective plane is a sphere that’s been modified so that diametrically opposite points count as the same point. The north pole is the same as the south pole, and so on!

In geography, the point diametrically opposite to some point on the Earth’s surface is called its antipodes, so let’s use that term. There’s a website that lets you find the antipodes of any place on Earth. Unfortunately the antipodes of most famous places are under water! But the antipodes of Madrid is in New Zealand, near Wellington:

When we roll a little ball on a big ‘projective’ ball, when the little ball reaches the antipodes of where it started, it counts as being back to its original location.

If you find this hard to visualize, imagine rolling two indistinguishable little balls on the big ball, that are always diametrically opposite each other. When one little ball rolls to the antipodes of where it started, the other one has taken its place, and the situation looks just like when you started!

A spinorial ball on a projective ball

Now let’s combine these ideas. Imagine a little spinorial ball rolling on a big projective ball. You need to turn the spinorial ball around twice to make it come back to its original orientation. But you only need to roll it halfway around the projective ball for it to come back to its original location.

These effects compensate for each other to some extent. The first makes it twice as hard to get back to where you started. The second makes it twice as easy!

But something really great happens when the big ball is 3 times as big as the little one. And that’s what I want you to understand.

For starters, consider an ordinary ball rolling on another ordinary ball that’s the same size. How many times does the rolling ball turn as it makes a round trip around the stationary one? If you watch this you can see the answer:


Follow the line drawn on the little ball. It turns around not once, but twice!

Next, consider one ball rolling on another whose radius is 2 times as big. How many times does the rolling ball turn as it makes a round trip?

It turns around 3 times.

And this pattern continues! I don’t have animations proving it, so either take my word for it, read our paper, or show it yourself.

In particular, a ball rolling on a ball whose radius is 3 times as big will turn 4 times as it makes a round trip.

So, by the time the little ball rolls halfway around the big one, it will have turned around twice!

But now suppose it’s a spinorial ball rolling on a projective plane. This is perfect. Now when the little ball goes halfway around big ball, it returns to its original location! And turning around the little ball twice gets it back to its original orientation!

So, there is something very neat about a spinorial ball rolling on a projective ball whose radius is exactly 3 times as big. And this is just the start. Now the split octonions get involved!

The rolling ball geometry

The key is to ponder a curious sort of geometry, which I’ll call the rolling ball geometry. This has ‘points’ and ‘lines’ which are defined in a funny way.

A point is any way a little spinorial ball can touch a projective ball that is 3 times as big. The lines are certain sets of points. A line consists of all the points we reach as the little ball rolls along some great circle on the big one, without slipping or twisting.

Of course these aren’t ‘points’ and ‘lines’ in the usual sense. But ever since the late 1800s, when mathematicians got excited about projective geometry—which is the geometry of the projective plane—we’ve enjoyed studying all sorts of strange variations on Euclidean geometry, with weirdly defined ‘points’ and ‘lines’. The rolling ball geometry fits very nicely into this tradition.

But the amazing thing is that we can describe points and lines of the rolling ball geometry in a completely different way, using the split octonions.

Split octonions

How does it work? As I said last time, the split octonions are an 8-dimensional number system. We build them as follows. We start with the ordinary real numbers. Then we throw in 3 square roots of -1, called i, j, and k, obeying

ij = -ji = k
jk = -kj = i
ki = -ik = j

At this point we have a famous 4-dimensional number system called the quaternions. Quaternions are numbers like

a + bi + cj + dk

where a,b,c,d are real numbers and i, j, k are the square roots of -1 we just created.

To build the octonions, we would now throw in another square root of -1. But to build the split octonions, we instead throw in a square root of +1. Let’s call it \ell. The hard part is saying what rules it obeys when we start multiplying it with other numbers in our system.

For starters, we get three more numbers \ell i, \ell j, \ell k. We decree these to be square roots of +1. But what happens when we multiply these with other things? For example, what is \ell i times j, and so on?

Since I don’t want to bore you, I’ll just get this over with quickly by showing you the multiplication table:

This says that \ell i (read down) times j (read across) is -\ell k, and so on.

Of course, this table is completely indigestible. I could never remember it, and you shouldn’t try. This is not the good way to explain how to multiply split octonions! It’s the lazy way. To really work with the split octonions you need a more conceptual approach, which John Huerta and I explain in our paper. But this is just a quick tour… so, on with the tour!

A split octonion is any number like

a + bi + cj + dk + e \ell + f \ell i + g \ell j + h \ell k

where a,b,c,d,e,f,g,h are real numbers. Since it takes 8 real numbers to specify a split octonion, we say they’re an 8-dimensional number system. But to describe the rolling ball geometry, we only need the imaginary split octonions, which are numbers like

x = bi + cj + dk + e \ell + f \ell i + g \ell j + h \ell k

The imaginary split octonions are 7-dimensional. 3 dimensions come from square roots of -1, while 4 come from square roots of 1.

We can use them to make up a far-out variant of special relativity: a universe with 3 time dimensions and 4 space dimensions! To do this, define the length of an imaginary split octonion x to be the number \|x \| with

\|x\|^2 = -b^2 - c^2 - d^2 + e^2 + f^2 + g^2 + h^2

This is a mutant version of the Pythagorean formula. The length \|x\| is real, in fact positive, for split octonions that point in the space directions. But it’s imaginary for those that point in the time directions!

This should not sound weird if you know special relativity. In special relativity we have spacelike vectors, whose length squared is positive, and timelike ones, whose length squared is negative.

If you don’t know special relativity—well, now you see how revolutionary Einstein’s ideas really are.

We also have vectors whose length squared is zero! These are called null. They’re also called lightlike, because light rays point along null vectors. In other words: light moves just as far in the space directions as it does in the time direction, so it’s poised at the brink between being spacelike and timelike.

The punchline

I’m sure you’re wondering where all this is going. Luckily, we’re there. We can describe the rolling ball geometry using the imaginary split octonions! Let me state it and then chat about it:

Theorem. There is a one-to-one correspondence between points in the rolling ball geometry and light rays through the point 0 in the imaginary split octonions. Under this correspondence, lines in the rolling ball geometry correspond to planes containing the point 0 in the imaginary split octonions with the property that whenever x and y lie in this plane, then xy = 0.

Even if you don’t get this, you can see it’s describing the rolling ball geometry in terms of stuff about the split octonions. An immediate consequence is that any symmetry of the split octonions is a symmetry of the rolling ball geometry.

The symmetries of the split octonions form a group called ‘the split form of G2’. With more work, we can show the converse: any symmetry of the rolling ball geometry is a symmetry of the split octonions. So, the symmetry group of the rolling ball geometry is precisely the split form of G2.

So what?

Well, G2 is an ‘exceptional group’—one of five groups that were discovered only when mathematicians like Killing and Cartan systematically started trying to classify groups in the late 1800s. The exceptional groups didn’t fit in the lists of groups mathematicians already knew.

If, as Tim Gowers has argued, some math is invented while some is discovered, the exceptional groups were discovered. Finding them was like going to the bottom of the ocean and finding weird creatures you never expected. These groups were—and are—hard to understand! They have dry, technical sounding names: E6, E7, E8, F4, and G2. They’re important in string theory—but again, just because the structure of mathematics forces it, not because people wanted it.

The exceptional groups can all be described using the octonions, or split octonions. But the octonions are also rather hard to understand. The rolling ball geometry, on the other hand, is rather simple and easy to visualize. So, it’s a way of bringing some exotic mathematics a bit closer to ordinary life.

Well, okay—in ordinary life you’d probably never thought about a spinorial ball rolling on a projective ball. But still: spinors and projective planes are far less exotic than split octonions and exceptional Lie groups. Any mathematician worth their salt knows about spinors and projective planes. They’re things that make plenty of sense.

I think now is a good time for most of you nonmathematicians to stop reading. I’ll leave off with a New Year’s puzzle:

Puzzle: Relative to the fixed stars, how many times does the Earth turn around its axis in a year?

Bye! It was nice seeing you!

The gory details

Still here? Cool. I want to wrap up by presenting the theorem in a more precise way, and then telling the detailed history of the rolling ball problem.

How can we specify a point in the rolling ball geometry? We need to say the location where the little ball touches the big ball, and we need to describe the ‘orientation’ of the little ball: that is, how it’s been rotated.

The point where the little ball touches the big one is just any point on the surface of the big ball. If the big ball were an ordinary ball this would be a point on the sphere, S^2. But since it’s a projective ball, we need a point on the projective plane, \mathbb{R}\mathrm{P}^2.

To describe the orientation of the little ball we need to say how it’s been rotated from some standard orientation. If the little ball were an ordinary ball we’d need to give an element of the rotation group, \mathrm{SO}(3). But since it’s a spinorial ball we need an element of the double cover of the rotation group, namely the special unitary group \mathrm{SU}(2).

So, the space of points in the rolling ball geometry is

X = \mathbb{R}\mathrm{P}^2 \times \mathrm{SU}(2)

This makes it easy to see how the imaginary split octonions get into the game. For starters, \mathrm{SU}(2) is isomorphic to the group of unit quaternions. We can define the absolute values of quaternions in a way that copies the usual formula for complex numbers:

\displaystyle{ |a + bi + cj + dk| = \sqrt{a^2 + b^2 + c^2 + d^2} }

The great thing about the quaternions is that if we multiply two of them, their absolute values multiply. In other words, if p and q are two quaternions,

|pq| = |p| |q|

This implies that the quaternions with absolute value 1—the unit quaternions—are closed under multiplication. In fact, they form a group. And in fact, this group is just SU(2) in mild disguise!

The unit quaternions form a sphere. Not an ordinary ‘2-sphere’ of the sort we’ve been talking about so far, but a ‘3-sphere’ in the 4-dimensional space of quaternions. We call that S^3.

So, the space of points in the rolling ball is isomorphic to a projective plane times a 3-sphere:

X \cong \mathbb{R}\mathrm{P}^2 \times S^3

But since the projective plane \mathbb{R}\mathrm{P}^2 is the sphere S^2 with antipodal points counted as the same point, our space of points is

\displaystyle{ X \cong \frac{S^2 \times S^3}{(p,q) \sim (-p,q)}}

Here dividing or ‘modding out’ by that stuff on the bottom says that we count any point (p,q) in S^2 \times S^3 as the same as (-p,q).

The cool part is that while S^3 is the unit quaternions, we can think of S^2 as the unit imaginary quaternions, where an imaginary quaternion is a number like

bi + cj + dk

So, we’re describing a point in the rolling ball geometry using a unit quaternion and a unit imaginary quaternion. That’s cool. But we can improve this description a bit using a nonobvious fact:

\displaystyle{ X \cong \frac{S^2 \times S^3}{(p,q) \sim (-p,-q)}}

There’s an extra minus sign here! Now we’re counting any point (p,q) in S^2 \times S^3 as the same as (-p,-q). In Proposition 2 in our paper we give an explicit formula for this isomorphism, which is important.

But never mind. Here’s the point of this improvement: we can now describe a point in the rolling ball geometry as a light ray through the origin in the imaginary split octonions! After all, any split octonion is of the form

a + bi + cj + dk + e \ell + f \ell i + g \ell j + h \ell k =  p + \ell q

where p and q are arbitrary quaternions. So, any imaginary split octonion is of the form

bi + cj + dk + e \ell + f \ell i + g \ell j + h \ell k =  p + \ell q

where q is a quaternion and p is an imaginary quaternion. This imaginary split octonion is lightlike if

-b^2 - c^2 - d^2 + e^2 + f^2 + g^2 + h^2 = 0

But this just says

|p|^2 = |q|^2

Given any light ray through the origin in the imaginary split octonions, it consists of all real multiples of some lightlike imaginary split octonion. So, we can describe it using a pair (p,q) where p is an imaginary quaternion and q is a quaternion with the same absolute value as p. We can normalize them so they’re both unit quaternions… but (p,q) and its negative (-p,-q) still determine the same light ray.

So, the space of light rays through the origin in the imaginary split octonions is

\displaystyle{\frac{S^2 \times S^3}{(p,q) \sim (-p,-q)}}

But this is the space of points in the rolling ball geometry!

Yay!

So far nothing relies on knowing how to multiply imaginary split octonions: we only need the formula for their length, which is much simpler. It’s the lines in the rolling ball geometry that require multiplication. In our paper, we show in Theorem 5 that lines correspond to 2-dimensional subspaces of the imaginary split octonions with the property that whenever x, y lie in the subspace, then x y = 0. In particular this implies that x^2 = 0, which turns out to say that x is lightlike. So, these 2d subspaces consist of lightlike elements. But the property that x y = 0 whenever x, y lie in the subspace is actually stronger! And this is how the full strength of the split octonions gets used.

One last comment. What if we hadn’t used a spinorial ball rolling on a projective ball? What if we had used an ordinary ball rolling on another ordinary ball? Then our space of points would be S^2 \times \mathrm{SO}(3). This is a lot like the space X we’ve been looking at. The only difference is a slight change in where we put the minus sign:

\displaystyle{ S^2 \times \mathrm{SO}(3) \cong \frac{S^2 \times S^3}{(p,q) \sim (p,-q)}}

But this space is different than X. We could go ahead and define lines and look for symmetries of this geometry, but we wouldn’t get G2. We’d get a much smaller group. We’d also get a smaller symmetry group if we worked with X but the big ball were anything other than 3 times the size of the small one. For proofs, see:

• Gil Bor and Richard Montgomery, G2 and the “rolling distribution”.

The history

I will resist telling you how to use geometric quantization to get back from the rolling ball geometry to the split octonions. I will also resist telling you about ‘null triples’, which give specially nice bases for the split octonions. This is where John Huerta really pulled out all the stops and used his octonionic expertise to its full extent. For these things, you’ll just have to read our paper.

Instead, I want to tell you about the history of this problem. This part is mainly for math history buffs, so I’ll freely fling around jargon that I’d been suppressing up to now. This part is also where I give credit to all the great mathematicians who figured out most of the stuff I just explained! I won’t include references, except for papers that are free online. You can find them all in our paper.

On May 23, 1887, Wilhelm Killing wrote a letter to Friedrich Engel saying that he had found a 14-dimensional simple Lie algebra. This is now called \mathfrak{g}_2, because it’s the Lie algebra corresponding to the group G2. By October he had completed classifying the simple Lie algebras, and in the next three years he published this work in a series of papers.

Besides the already known classical simple Lie algebras, he claimed to have found six ‘exceptional’ ones. In fact he only gave a rigorous construction of the smallest, \mathfrak{g}_2. Later, in his famous 1894 thesis, Élie Cartan constructed all of them and noticed that two of them were isomorphic, so that there are really only five.

But already in 1893, Cartan had published a note describing an open set in \mathbb{C}^5 equipped with a 2-dimensional ‘distribution’—a smoothly varying field of 2d spaces of tangent vectors—for which the Lie algebra \mathfrak{g}_2 appears as the infinitesimal symmetries. In the same year, and actually in the same journal, Engel noticed the same thing.

In fact, this 2-dimensional distribution is closely related to the rolling ball problem. The point is that the space of configurations of the rolling ball is 5-dimensional, with a 2-dimensional distibution that describes motions of the ball where it rolls without slipping or twisting.

Both Cartan and Engel returned to this theme in later work. In particular, Engel discovered in 1900 that a generic antisymmetic trilinear form on \mathbb{C}^7 is preserved by a group isomorphic to the complex form of G2. Furthermore, starting from this 3-form he constructed a nondegenerate symmetric bilinear form on \mathbb{C}^7. This implies that the complex form of G2. is contained in a group isomorphic to \mathrm{SO}(7,\mathbb{C}). He also noticed that the vectors x \in \mathbb{C}^7 that are null—meaning x \cdot x = 0, where we write the bilinear form as a dot product—define a 5-dimensional projective variety on which G2 acts.

In fact, this variety is the complexification of the configuration space of a rolling fermionic ball on a projective plane! Futhermore, the space \mathbb{C}^7 is best seen as the complexification of the space of imaginary octonions. Like the space of imaginary quaternions (better known as \mathbb{R}^3), the 7-dimensional space of imaginary octonions comes with a dot product and cross product. Engel’s bilinear form on \mathbb{C}^7 arises from complexifying the dot product. His antisymmetric trilinear form arises from the dot product together with the cross product via the formula

x \cdot (y \times z).

However, all this was seen only later! It was only in 1908 that Cartan mentioned that the automorphism group of the octonions is a 14-dimensional simple Lie group. Six years later he stated something he probably had known for some time: this group is the compact real form of G2.

As I already mentioned, the octonions had been discovered long before: in fact the day after Christmas in 1843, by Hamilton’s friend John Graves. Two months before that, Hamilton had sent Graves a letter describing his dramatic discovery of the quaternions. This encouraged Graves to seek an even larger normed division algebra, and thus the octonions were born. Hamilton offered to publicize Graves’ work, but put it off or forgot until the young Arthur Cayley rediscovered the octonions in 1845. That this obscure algebra lay at the heart of all the exceptional Lie algebras and groups became clear only slowly. Cartan’s realization of its relation to \mathfrak{g}_2, and his later work on a symmetry called ‘triality’, was the first step.

In 1910, Cartan wrote a paper that studied 2-dimensional distributions in 5 dimensions. Generically such a distibution is not integrable: in other words, the Lie bracket of two vector fields lying in this distribution does not again lie in this distribution. However, it lies in a 3-dimensional distribution. The Lie bracket of vector fields lying in this 3-dimensional distibution then generically give arbitrary tangent vectors to the 5-dimensional manifold. Such a distribution is called a (2,3,5) distribution. Cartan worked out a complete system of local geometric invariants for these distributions. He showed that if all these invariants vanish, the infinitesimal symmetries of a (2,3,5) distribution in a neighborhood of a point form the Lie algebra \mathfrak{g}_2.

Again this is relevant to the rolling ball. The space of configurations of a ball rolling on a surface is 5-dimensional, and it comes equipped with a (2,3,5) distribution. The 2-dimensional distibution describes motions of the ball where it rolls without twisting or slipping. The 3-dimensional distribution describes motions where it can roll and twist, but not slip. Cartan did not discuss rolling balls, but he did consider a closely related example: curves of constant curvature 2 or 1/2 in the unit 3-sphere.

Beginning in the 1950’s, Francois Bruhat and Jacques Tits developed a very general approach to incidence geometry, eventually called the theory of ‘buildings’, which among other things gives a systematic approach to geometries having simple Lie groups as symmetries. In the case of G2, because the Dynkin diagram of this group has two dots, the relevant geometry has two types of figure: points and lines. Moreover because the Coxeter group associated to this Dynkin diagram is the symmetry group of a hexagon, a generic pair of points a and d fits into a configuration like this, called an ‘apartment’:

There is no line containing a pair of points here except when a line is actually shown, and more generally there are no ‘shortcuts’ beyond what is shown. For example, we go from a to b by following just one line, but it takes two to get from a to c, and three to get from a to d.

Betty Salzberg wrote a nice introduction to these ideas in 1982. Among other things, she noted that the points and lines in the incidence geometry of the split real form of G2 correspond to 1- and 2-dimensional null subalgebras of the imaginary split octonions. This was shown by Tits in 1955.

In 1993, Bryant and Hsu gave a detailed treatment of curves in manifolds equipped with 2-dimensional distributions, greatly extending the work of Cartan:

• Robert Bryant and Lucas Hsu, Rigidity of integral curves of rank 2 distributions.

They showed how the space of configurations of one surface rolling on another fits into this framework. However, Igor Zelenko may have been the first to explicitly mention a ball rolling on another ball in this context, and to note that something special happens when their ratio of radii is 3 or 1/3. In a 2005 paper, he considered an invariant of (2,3,5) distributions. He calculated it for the distribution arising from a ball rolling on a larger ball and showed it equals zero in these two cases.

(In our paper, John Huerta and I assume that the rolling ball is smaller than the fixed one, simply to eliminate one of these cases and focus on the case where the fixed ball is 3 times as big.)

In 2006, Bor and Montgomery's paper put many of the pieces together. They studied the (2,3,5) distribution on S^2 \times \mathrm{SO}(3) coming from a ball of radius 1 rolling on a ball of radius R, and proved a theorem which they credit to Robert Bryant. First, passing to the double cover, they showed the corresponding distribution on S^2 \times \mathrm{SU}(2) has a symmetry group whose identity component contains the split real form of G2 when R = 3 or 1/3. Second, they showed this action does not descend to original rolling ball configuration space S^2 \times \mathrm{SO}(3). Third, they showed that for any other value of R except R = 1, the symmetry group is isomorphic to \mathrm{SU}(2) \times \mathrm{SU}(2)/\pm(1,1). They also wrote:

Despite all our efforts, the ‘3’ of the ratio 1:3 remains mysterious. In this article it simply arises out of the structure constants for G2 and appears in the construction of the embedding of \mathfrak{so}(3) \times \mathfrak{so}(3) into \mathfrak{g}_2. Algebraically speaking, this ‘3’ traces back to the 3 edges in \mathfrak{g}_2's Dynkin diagram and the consequent relative positions of the long and short roots in the root diagram for \mathfrak{g}_2 which the Dynkin diagram is encoding.

Open problem. Find a geometric or dynamical interpretation for the ‘3’ of the 3:1 ratio.

As you can see from what I’ve said, John Huerta and I have offered a solution to this puzzle: the ‘3’ comes from the fact that a ball rolling once around a ball 3 times as big turns around exactly 4 times—just what you need to get a relationship to a spinorial ball rolling on a projective plane, and thus the lightcone in the split octonions! The most precise statement of this explanation comes in Theorem 3 of our paper.

While Bor and Montgomery’s paper goes into considerable detail about the connection with split octonions, most of their work uses the now standard technology of semisimple Lie algebras: roots, weights and the like. In 2006, Sagerschnig described the incidence geometry of \mathrm{G}_2 using the split octonions, and in 2008, Agrachev wrote a paper entitled ‘Rolling balls and octonions’. He emphasizes that the double cover S^2 \times \mathrm{SU}(2) can be identified with the double cover of \mathrm{PC}, the projectivization of the set of null vectors in the imaginary split octonions. He then shows that given a point \langle x \rangle \in \mathrm{PC}, the set of points \langle y \rangle connected to \langle x \rangle by a single roll is the annihilator

\{ x \in \mathbb{I} : y x = 0 \}

where \mathbb{I} is the space of imaginary split octonions.

This sketch of the history is incomplete in many ways. As usual, history resembles a fractal: the more closely you study it, the more details you see! If you want to dig deeper, I strongly recommend these:

• Ilka Agricola, Old and new on the the exceptional group G2.

• Robert Bryant, Élie Cartan and geometric duality.

This paper is also very helpful and fun:

• Aroldo Kaplan, Quaternions and octonions in mechanics.

It emphasizes the role that quaternions play in describing rotations, and the way an imaginary split octonion is built from an imaginary quaternion and a quaternion. And don’t forget this:

• Andrei Agrachev, Rolling balls and octonions.

Have fun!


Rolling Circles and Balls (Part 4)

2 January, 2013

So far in this series we’ve been looking at what happens when we roll circles on circles:

• In Part 1 we rolled a circle on a circle that’s the same size.


• In Part 2 we rolled a circle on a circle that’s twice as big.

• In Part 3 we rolled a circle inside a circle that was 2, 3, or 4 times as big.


In every case, we got lots of exciting math and pretty pictures. But all this pales in comparison to the marvels that occur when we roll a ball on another ball!

You’d never guess it, but the really amazing stuff happens when you roll a ball on another ball that’s exactly 3 times as big. In that case, the geometry of what’s going on turns out to be related to special relativity in a weird universe with 3 time dimensions and 4 space dimensions! Even more amazingly, it’s related to a strange number system called the split octonions.

The ordinary octonions are already strange enough. They’re an 8-dimensional number system where you can add, subtract, multiply and divide. They were invented in 1843 after the famous mathematician Hamilton invented a rather similar 4-dimensional number system called the quaternions. He told his college pal John Graves about it, since Graves was the one who got Hamilton interested in this stuff in the first place… though Graves had gone on to become a lawyer, not a mathematician. The day after Christmas that year, Graves sent Hamilton a letter saying he’d found an 8-dimensional number system with almost all the same properties! The one big missing property was the associative law for multiplication, namely:

(ab)c = a(bc)

The quaternions obey this, but the octonions don’t. For this and other reasons, they languished in obscurity for many years. But they eventually turned out to be the key to understanding some otherwise inexplicable symmetry groups called ‘exceptional groups’. Later still, they turned out to be important in string theory!

I’ve been fascinated by this stuff for a long time, in part because it starts out seeming crazy and impossible to understand… but eventually it makes sense. So, it’s a great example of how you can dramatically change your perspective by thinking for a long time. Also, it suggests that there could be patterns built into the structure of math, highly nonobvious patterns, which turn out to explain a lot about the universe.

About a decade ago I wrote a paper summarizing everything I’d learned so far:

The octonions.

But I knew there was much more to understand. I wanted to work on this subject with a student. But I never dared until I met John Huerta, who, rather oddly, wanted to get a Ph.D. in math but work on physics. That’s generally not a good idea. But it’s exactly what I had wanted to do as a grad student, so I felt a certain sympathy for him.


And he seemed good at thinking about how algebra and particle physics fit together. So, I decided we should start by writing a paper on ‘grand unified theories’—theories of all the forces except gravity:

The algebra of grand unified theories.

The arbitrary-looking collection of elementary particles we observe in nature turns out to contain secret patterns—patterns that jump into sharp focus using some modern algebra! Why do quarks have weird fractional charges like 2/3 and -1/3? Why does each generation of particles contain two quarks and two leptons? I can’t say we really know the answer to such questions, but the math of grand unified theories make these strange facts seem natural and inevitable.

The math turns out to involve rotations in 10 dimensions, and ‘spinors’: things that only come around back to the way they started after you turn them around twice. This turned out to be a great preparation for our later work.

As we wrote this article, I realized that John Huerta had a gift for mathematical prose. In fact, we recently won a prize for this paper! In two weeks we’ll meet at the big annual American Mathematical Society conference and pick it up.

John Huerta wound up becoming an expert on the octonions, and writing his thesis about how they make superstring theory possible in 10-dimensional spacetime:

Division algebras, supersymmetry and higher gauge theory.

The wonderful fact is that string theory works well in 10 dimensions because the octonions are 8-dimensional! Suppose that at each moment in time, a string is like a closed loop. Then as time passes, it traces out a 2-dimensional sheet in spacetime, called a worldsheet:

In this picture, 'up' means 'forwards in time'. Unfortunately this picture is just 3-dimensional: the real story happens in 10 dimensions! Don't bother trying to visualize 10 dimensions, just count: in 10-dimensional spacetime there are 10 – 2 = 8 extra dimensions besides those of the string’s worldsheet. These are the directions in which the string can vibrate. Since the octonions are 8-dimensional, we can describe the string’s vibrations using octonions! The algebraic magic of this number system then lets us cook up a beautiful equation describing these vibrations: an equation that has ‘supersymmetry’.

For a full explanation, read John Huerta’s thesis. But for an easy overview, read this paper we published in Scientific American:

The strangest numbers in string theory.

This got included in a collection called The Best Writing on Mathematics 2012, further confirming my opinion that collaborating with John Huerta was a good idea.

Anyway: string theory sounds fancy, but for many years I’d been tantalized by the relationship between the octonions and a much more prosaic physics problem: a ball rolling on another ball. I had a lot of clues saying these should be a nice relationship… though only if we work with a mutant version of the octonions called the ‘split’ octonions.

You probably know how we get the complex numbers by taking the ordinary real numbers and throwing in a square root of -1. But there’s also another number system, far less popular but still interesting, called the split complex numbers. Here we throw in a square root of 1 instead. Of course 1 already has two square roots, namely 1 and -1. But that doesn’t stop us from throwing in another!

This ‘split’ game, which is a lot more profound than it sounds at first, also works for the quaternions and octonions. We get the octonions by starting with the real numbers and throwing in seven square roots of -1, for a total of 8 dimensions. For the split octonions, we start with the real numbers and throw in three square roots of -1 and four square roots of 1. The split octonions are surprisingly similar to the octonions. There are tricks to go back and forth between the two, so you should think of them as two forms of the same underlying thing.

Anyway: I really liked the idea of finding the split octonions lurking in a concrete physics problem like a ball rolling on another ball. I hoped maybe this could shed some new light on what the octonions are really all about.

James Dolan and I tried hard to get it to work. We made a lot of progress, but then we got stuck, because we didn’t realize it only works when one ball is 3 times as big as the other! That was just too crazy for us to guess.

In fact, some mathematicians had known about this for a long time. Things would have gone a lot faster if I’d read more papers early on. By the time we caught up with the experts, I’d left for Singapore, and John Huerta, still back in Riverside, was the one talking with James Dolan about this stuff. They figured out a lot more.

Then Huerta got his Ph.D. and took a job in Australia, which is as close to Singapore as it is to almost anything. I got a grant from the Foundational Questions Institute to bring John to Singapore and figure out more stuff about the octonions and physics… and we wound up writing a paper about the rolling ball problem:

G2 and the rolling ball.

Whoops! I haven’t introduced G2 yet. It’s one of those ‘exceptional groups’ I mentioned: the smallest one, in fact. Like the octonions themselves, this group comes in a few different but closely related ‘forms’. The most famous form is the symmetry group of the octonions. But in our paper, we’re more interested in the ‘split’ form, which is the symmetry group of the split octonions. The reason is that this group is also the symmetry group of a ball rolling without slipping or twisting on another ball that’s exactly 3 times as big!

The fact that the same group shows up as the symmetries of these two different things is a huge clue that they’re deeply related. The challenge is to understand the relationship.

There are two parts to this challenge. One is to describe the rolling ball problem in terms of split octonions. The other is to reverse the story, and somehow get the split octonions to emerge naturally from the study of a rolling ball!

In our paper we tackled both parts. Describing the rolling ball problem using split octonions had already been done by other mathematicians, for example here:

• Robert Bryant and Lucas Hsu, Rigidity of integral curves of rank 2 distributions.

• Gil Bor and Richard Montgomery, G2 and the “rolling distribution”.

• Andrei Agrachev, Rolling balls and octonions.

• Aroldo Kaplan, Quaternions and octonions in mechanics.

We do however give a simpler explanation of why this description only works when one ball is 3 times as big as the other.

The other part, getting the split octonions to show up starting from the rolling ball problem, seems to be new to us. We show that in a certain sense, quantizing the rolling ball gives the split octonions! Very roughly, split octonions can been as quantum states of the rolling ball.

At this point I’ve gone almost as far as I can without laying on some heavy math. In theory I could show you pretty animations of a little ball rolling on a big one, and use these to illustrate the special thing that happens when the big one is 3 times as big. In theory I might be able to explain the whole story without many equations or much math jargon. That would be lots of fun…

… for you. But it would be a huge amount of work for me. So at this point, to make my job easier, I want to turn up the math level a notch or two. And this is a good point for both of us take a little break.

In the next and final post in this series, I’ll sketch how the problem of a little ball rolling on a big stationary ball can be described using split octonions… and why the symmetries of this problem give a group that’s the split form of G2if the big ball has a radius that’s 3 times the radius of the little one!

I will not quantize the rolling ball problem—for that, you’ll need to read our paper.


Our Galactic Environment

27 December, 2012

While I’m focused on the Earth these days, I can’t help looking up and thinking about outer space now and then.

So, let me tell you about the Kuiper Belt, the heliosphere, the Local Bubble—and what may happen when our Solar System hits the next big cloud! Could it affect the climate on Earth?

New Horizons

We’re going on a big adventure!

New Horizons has already taken great photos of volcanoes on Jupiter’s moon Io. It’s already closer to Pluto than we’ve ever been. And on 14 July 2016 it will fly by Pluto and its moons Charon, Hydra, and Nix!

But that’s just the start: then it will go to see some KBOs!

The Kuiper Belt stretches from the orbit of Neptune to almost twice as far from the Sun. It’s a bit like the asteroid belt, but much bigger: 20 times as wide and 20 – 200 times as massive. But while most asteroids are made of rock and metal, most Kuiper Belt Objects or ‘KBOs’ are composed largely of frozen methane, ammonia and water.

The Earth’s orbit has a radius of one astronomical unit, or AU. The Kuiper Belt goes from 30 AU to 50 AU out. For comparison, the heliosphere, the region dominated by the energetic fast-flowing solar wind, fizzles out around 120 AU. That’s where Voyager 1 is now.

New Horizons will fly through the Kuiper Belt from 2016 to 2020… and, according to plan, its mission will end in 2026. How far out will it be then? I don’t know! Of course it will keep going…

For more see:

• JPL, New Horizons: NASA’s Pluto-Kuiper Belt Mission.

The heliosphere

Here’s a young star zipping through the Orion Nebula. It’s called LL Orionis, and this picture was taken by the Hubble Telescope in February 1995:

The star is moving through the interstellar gas at supersonic speeds. So, when this gas hits the fast wind of particles shooting out from the star, it creates a bow shock half a light-year across. It’s a bit like when a boat moves through the water faster than the speed of water waves.

There’s also a bow shock where the solar wind hits the Earth’s magnetic field. It’s about 17 kilometers thick, and located about 90,000 kilometers from Earth:

For a long time scientists thought there was a bow shock where nearby interstellar gas hit the Sun’s solar wind. But this was called into question this year when a satellite called the Interstellar Boundary Explorer (IBEX) discovered the Solar System is moving slower relative to this gas than we thought!

IBEX isn’t actually going to the edge of the heliosphere—it’s in Earth orbit, looking out. But Voyager 1 seems close to hitting the heliopause, where the Earth’s solar wind comes to a stop. And it’s seeing strange things!

The Interstellar Boundary Explorer

The Sun shoots out a hot wind of ions moving at 300 to 800 kilometers per second. They form a kind of bubble in space: the heliosphere. These charged particles slow down and stop when they hit the hydrogen and helium atoms in interstellar space. But those atoms can penetrate the heliosphere, at least when they’re neutral—and a near-earth satellite called IBEX, the Interstellar Boundary Explorer, has been watching them! And here’s what IBEX has seen:

In December 2008, IBEX first started detecting energetic neutral atoms penetrating the heliosphere. By October 2009 it had collected enough data to see the ‘IBEX ribbon’: an unexpected arc-shaped region in the sky has many more energetic neutral atoms than expected. You can see it here!

The color shows how many hundreds of energetic neutral atoms are hitting the heliosphere per second per square centimeter per keV. A keV, or kilo-electron-volt, is a unit of energy. Different atoms are moving with different energies, so it makes sense to count them this way.

You can see how the Voyager spacecraft are close to leaving the heliosphere. You can also see how the interstellar magnetic field lines avoid this bubble. Ever since the IBEX ribbon was detected, the IBEX team has been trying to figure out what causes it. They think it’s related to the interstellar magnetic field. The ribbon has been moving and changing intensity quite a bit in the couple of years they’ve been watching it!

Recently, IBEX announced that our Solar System has no bow shock—a big surprise. Previously, scientists thought the heliosphere created a bow-shaped shock wave in the interstellar gas as it moved along, like that star in the Orion Nebula we just looked at.

The Local Bubble

Get to know the neighborhood!

I love the names of these nearby stars! Some I knew: Vega, Altair, Fomalhaut, Alpha Centauri, Sirius, Procyon, Denebola, Pollux, Castor, Mizar, Aldebaran, Algol. But many I didn’t: Rasalhague, Skat, Gaorux, Pherkad, Thuban, Phact, Alphard, Wazn, and Algieba! How come none of the science fiction I’ve read uses these great names? Or maybe I just forgot.

The Local Bubble is a bubble of hot interstellar gas 300 light years across, probably blasted out by the supernova called Geminga near the bottom of this picture.

Geminga

Here’s the sky viewed in gamma rays. A lot come from a blazar 7 billion light years away that erupted in 2005: a supermassive black hole at the center of a galaxy, firing particles in a jet that happens to be aimed straight at us. Some come from nearby pulsars: rapidly rotating neutron stars formed by the collapse of stars that went supernova. The one I want you to think about is Geminga.

Geminga is just 800 light years away from us, and it exploded only 300,000 years ago! That may seem far away and long ago to you, but not to me. The first Neanderthalers go back around 350,000 years… and they would have seen this supernova in the daytime, it was so close.

But here’s the reason I want you to think about Geminga. It seems to have blasted out the bubble of hot low-density gas our Solar System finds itself in: the Local Bubble. Astronomers have even detected micrometer-sized interstellar meteor particles coming from its direction!

We may think of interstellar space as all the same—empty and boring—but that’s far from true. The density of interstellar space varies immensely from place to place! The Local Bubble has just 0.05 atoms per cubic centimeter, but the average in our galaxy is about 20 times that, and we’re heading toward some giant clouds that are 2000 to 20,000 times as dense. The fun will start when we hit those…. but more on that later.

Nearby clouds

While we live in the Local Bubble, several thousand years ago we entered a small cloud of cooler, denser gas: the Local Fluff. We’ll leave this in at most 4 thousand years. But that’s just the beginning! As we pass the Scorpius-Centaurus Association, we’ll hit bigger, colder and denser clouds—and they’ll squash the heliosphere.

When will this happen? People seem very unsure. I’ve seen different sources saying we entered the Local Fluff sometime between 44,000 and 150,000 years ago, and that we’ll stay within it for between 4,000 and 20,000 years.

We’ll then return to the hotter, less dense gas of the Local Bubble until we hit the next cloud. That may take at least 50,000 years. Two candidates for the first cloud we’ll hit are the G Cloud and the Apex Cloud. The Apex Cloud is just 15 light years away:

• Priscilla C. Frisch, Local interstellar matter: the Apex Cloud.

When we hit a big cloud, it will squash the heliosphere. Right now, remember, this is roughly 120 AU in radius. But before we entered the Local Fluff, it was much bigger. And when we hit thicker clouds, it may shrink down to just 1 or 2 AU!

The heliosphere protects us from galactic cosmic rays. So, when we hit the next cloud, more of these cosmic rays will reach the Earth. Nobody knows for sure what the effects will be… but life on Earth has survived previous incidents like this, and other problems will hit us much sooner, so don’t stay awake at night worrying about it!

Indeed, ice core samples from the Antarctic show spikes in the concentration of the radioactive isotope beryllium-10 in two seperate events, one about 60,000 years ago and another about 33,000 years ago. These might have been caused by a sudden increase in cosmic rays. But nobody is really sure.

People have studied the possibility that cosmic rays could influence the Earth’s weather, for example by seeding clouds:

• K. Scherer, H. Fichtner et al, Interstellar-terrestrial relations: variable cosmic environments, the dynamic heliosphere, and their imprints on terrestrial archives and climate, Space Science Reviews 127 (2006), 327–465.

• Benjamin A. Laken, Enric Pallé, Jaša Čalogović and Eimear M. Dunne, A cosmic ray-climate link and cloud observations, J. Space Weather Space Clim. 2 (2012), A18.

Despite the title of the second paper, its conclusion is that “it is clear that there is no robust evidence of a widespread link between the cosmic ray flux and clouds.” That’s clouds on Earth, not clouds of interstellar gas! The first paper is much more optimistic about the existence of such a link, but it doesn’t provide a ‘smoking gun’.

And—in case you’re wondering—variations in cosmic rays this century don’t line up with global warming:

The top curves are the Earth’s temperature as estimated by GISTEMP (the brown curve), and the carbon dioxide concentration in the Earth’s atmosphere as measured by Charles David Keeling (in green). The bottom ones are galactic cosmic rays as measured by CLIMAX (the gray dots), the sunspot cycle as measured by the Solar Influences Data Analysis Center (in red), and total solar irradiance as estimated by Judith Lean (in blue).

But be careful: the galactic cosmic ray curve has been flipped upside down, since when solar activity is high, then fewer galactic cosmic rays make it to Earth! You can see that here:

I’m sorry these graphs aren’t neatly lined up, but you can see that peaks in the sunspot cycle happened near 1980, 1989 and 2002, which is when we had minima in the galactic cosmic rays.

For more on the neighborhood of the Solar System and what to expect as we pass through various interstellar clouds, try this great article:

• Priscilla Frisch, The galactic environment of the Sun, American Scientist 88 (January-February 2000).

I have lots of scientific heroes: whenever I study something, I find impressive people have already been there. This week my hero is Priscilla Frisch. She edited a book called Solar Journey: The Significance of Our Galactic Environment for the Heliosphere and Earth. The book isn’t free, but this chapter is:

• Priscilla C. Frisch and Jonathan D. Slavin, Short-term variations in the galactic environment of the Sun.

For more on how what the heliosphere might do when we hit the next big cloud, see:

• Hans-R. Mueller, Priscilla C. Frisch, Vladimir Florinski and Gary P. Zank, Heliospheric response to different possible interstellar environments.

The Aquila Rift

Just for fun, let’s conclude by leaving our immediate neighborhood and going a bit further out. Here’s a picture of the Aquila Rift, taken by Adam Block of the Mt. Lemmon SkyCenter at the University of Arizona:

The Aquila Rift is a region of molecular clouds about 600 light years away in the direction of the star Altair. Hundreds of stars are being formed in these clouds.

A molecular cloud is a region in space where the interstellar gas gets so dense that hydrogen forms molecules, instead of lone atoms. While the Local Fluff near us has about 0.3 atoms per cubic centimeter, and the Local Bubble is much less dense, a molecular cloud can easily have 100 or 1000 atoms per cubic centimeter. Molecular clouds often contain filaments, sheets, and clumps of submicrometer-sized dust particles, coated with frozen carbon monoxide and nitrogen. That’s the dark stuff here!

I don’t know what will happen to the Earth when our Solar System hits a really dense molecular cloud. It might have already happened once. But it probably won’t happen again for a long time.


Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]

So:

\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }

Conclusion

We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers