The Large-Number Limit for Reaction Networks (Part 2)

6 July, 2013

I’ve been talking a lot about ‘stochastic mechanics’, which is like quantum mechanics but with probabilities replacing amplitudes. In Part 1 of this mini-series I started telling you about the ‘large-number limit’ in stochastic mechanics. It turns out this is mathematically analogous to the ‘classical limit’ of quantum mechanics, where Planck’s constant \hbar goes to zero.

There’s a lot more I need to say about this, and lots more I need to figure out. But here’s one rather easy thing.

In quantum mechanics, ‘coherent states’ are a special class of quantum states that are very easy to calculate with. In a certain precise sense they are the best quantum approximations to classical states. This makes them good tools for studying the classical limit of quantum mechanics. As \hbar \to 0, they reduce to classical states where, for example, a particle has a definite position and momentum.

We can borrow this strategy to study the large-number limit of stochastic mechanics. We’ve run into coherent states before in our discussions here. Now let’s see how they work in the large-number limit!

Coherent states

For starters, let’s recall what coherent states are. We’ve got k different kinds of particles, and we call each kind a species. We describe the probability that we have some number of particles of each kind using a ‘stochastic state’. For starters, this is a formal power series in variables z_1, \dots, z_k. We write it as

\displaystyle{\Psi = \sum_{\ell \in \mathbb{N}^k} \psi_\ell z^\ell }

where z^\ell is an abbreviation for

z_1^{\ell_1} \cdots z_k^{\ell_k}

But for \Psi to be a stochastic state the numbers \psi_\ell need to be probabilities, so we require that

\psi_\ell \ge 0

and

\displaystyle{ \sum_{\ell \in \mathbb{N}^k} \psi_\ell = 1}

Sums of coefficients like this show up so often that it’s good to have an abbreviation for them:

\displaystyle{ \langle \Psi \rangle =  \sum_{\ell \in \mathbb{N}^k} \psi_\ell}

Now, a coherent state is a stochastic state where the numbers of particles of each species are independent random variables, and the number of the ith species is distributed according to a Poisson distribution.

Since we can pick ithe means of these Poisson distributions to be whatever we want, we get a coherent state \Psi_c for each list of numbers c \in [0,\infty)^k:

\displaystyle{ \Psi_c = \frac{e^{c \cdot z}}{e^c} }

Here I’m using another abbreviation:

e^{c} = e^{c_1 + \cdots + c_k}

If you calculate a bit, you’ll see

\displaystyle{  \Psi_c = e^{-(c_1 + \cdots + c_k)} \, \sum_{n \in \mathbb{N}^k} \frac{c_1^{n_1} \cdots c_k^{n_k}} {n_1! \, \cdots \, n_k! } \, z_1^{n_1} \cdots z_k^{n_k} }

Thus, the probability of having n_i things of the ith species is equal to

\displaystyle{  e^{-c_i} \, \frac{c_i^{n_i}}{n_i!} }

This is precisely the definition of a Poisson distribution with mean equal to c_i.

What are the main properties of coherent states? For starters, they are indeed states:

\langle \Psi_c \rangle = 1

More interestingly, they are eigenvectors of the annihilation operators

a_i = \displaystyle{ \frac{\partial}{\partial z_i} }

since when you differentiate an exponential you get back an exponential:

\begin{array}{ccl} a_i \Psi_c &=&  \displaystyle{ \frac{\partial}{\partial z_i} \frac{e^{c \cdot z}}{e^c} } \\ \\   &=& c_i \Psi_c \end{array}

We can use this fact to check that in this coherent state, the mean number of particles of the ith species really is c_i. For this, we introduce the number operator

N_i = a_i^\dagger a_i

where a_i^\dagger is the creation operator:

(a_i^\dagger \Psi)(z) = z_i \Psi(z)

The number operator has the property that

\langle N_i \Psi \rangle

is the mean number of particles of the ith species. If we calculate this for our coherent state \Psi_c, we get

\begin{array}{ccl} \langle a_i^\dagger a_i \Psi_c \rangle &=& c_i \langle a_i^\dagger \Psi_c \rangle \\  \\ &=& c_i \langle \Psi_c \rangle \\ \\ &=& c_i \end{array}

Here in the second step we used the general rule

\langle a_i^\dagger \Phi \rangle = \langle \Phi \rangle

which is easy to check.

Rescaling

Now let’s see how coherent states work in the large-numbers limit. For this, let’s use the rescaled annihilation, creation and number operators from Part 1. They look like this:

A_i = \hbar \, a_i

C_i = a_i^\dagger

\widetilde{N}_i = C_i A_i

Since

\widetilde{N}_i = \hbar N_i

the point is that the rescaled number operator counts particles not one at a time, but in bunches of size 1/\hbar. For example, if \hbar is the reciprocal of Avogadro’s number, we are counting particles in ‘moles’. So, \hbar \to 0 corresponds to a large-number limit.

To flesh out this idea some more, let’s define rescaled coherent states:

\widetilde{\Psi}_c = \Psi_{c/\hbar}

These are eigenvectors of the rescaled annihilation operators:

\begin{array}{ccl} A_i \widetilde{\Psi}_c &=& \hbar a_i \Psi_{c/\hbar}  \\  \\  &=& c_i \Psi_{c/\hbar} \\ \\  &=& c_i \widetilde{\Psi}_c  \end{array}

This in turn means that

\begin{array}{ccl} \langle \widetilde{N}_i \widetilde{\Psi}_c \rangle &=& \langle C_i A_i \widetilde{\Psi}_c \rangle \\  \\  &=& c_i \langle  C_i \widetilde{\Psi}_c \rangle \\  \\ &=& c_i \langle \widetilde{\Psi}_c \rangle \\ \\ &=& c_i \end{array}

Here we used the general rule

\langle C_i \Phi \rangle = \langle \Phi \rangle

which holds because the ‘rescaled’ creation operator C_i is really just the usual creation operator, which obeys this rule.

What’s the point of all this fiddling around? Simply this. The equation

\langle \widetilde{N}_i \widetilde{\Psi}_c \rangle = c_i

says the expected number of particles of the ith species in the state \widetilde{\Psi}_c is c_i, if we count these particles not one at a time, but in bunches of size 1/\hbar.

A simple test

As a simple test of this idea, let’s check that as \hbar \to 0, the standard deviation of the number of particles in the state \Psi_c goes to zero… where we count particle using the rescaled number operator.

The variance of the rescaled number operator is, by definition,

\langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle -   \langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle^2

and the standard deviation is the square root of the variance.

We already know the mean of the rescaled number operator:

\langle \widetilde{N}_i \widetilde{\Psi}_c \rangle = c_i

So, the main thing we need to calculate is the mean of its square:

\langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle

For this we will use the commutation relation derived last time:

[A_i , C_i] = \hbar

This implies

\begin{array}{ccl} \widetilde{N}_i^2 &=& C_i A_i C_i A_i \\  \\  &=&  C_i (C_i A_i + \hbar) A_i \\ \\  &=&  C_i^2 A_i^2 + \hbar C_i A_i \end{array}

so

\begin{array}{ccl} \langle \widetilde{N}_i^2\widetilde{\Psi}_c \rangle &=& \langle (C_i^2 A_i^2 + \hbar C_i A_i) \Psi_c \rangle \\   \\  &=&  c_i^2 + \hbar c_i  \end{array}

where we used our friends

A_i \Psi_c = c_i \Psi_c

and

\langle C_i \Phi \rangle = \langle \Phi \rangle

So, the variance of the rescaled number of particles is

\begin{array}{ccl} \langle \widetilde{N}_i^2 \widetilde{\Psi}_c \rangle  -   \langle \widetilde{N}_i \widetilde{\Psi}_c \rangle^2  &=& c_i^2 + \hbar c_i - c_i^2 \\  \\  &=& \hbar c_i \end{array}

and the standard deviation is

(\hbar c_i)^{1/2}

Good, it goes to zero as \hbar \to 0! And the square root is just what you’d expect if you’ve thought about stuff like random walks or the central limit theorem.

A puzzle

I feel sure that in any coherent state, not only the variance but also all the higher moments of the rescaled number operators go to zero as \hbar \to 0. Can you prove this?

Here I mean the moments after the mean has been subtracted. The pth moment is then

\langle (\widetilde{N}_i - c_i)^p \; \widetilde{\Psi}_c \rangle

I want this to go to zero as \hbar \to 0.

Here’s a clue that should help. First, there’s a textbook formula for the higher moments of Poisson distributions without the mean subtracted. If I understand it correctly, it gives this:

\displaystyle{ \langle N_i^m \; \Psi_c \rangle = \sum_{j = 1}^m {c_i}^j \; \left\{ \begin{array}{c} m \\ j \end{array} \right\} }

Here

\displaystyle{ \left\{ \begin{array}{c} m \\ j \end{array} \right\} }

is the number of ways to partition an m-element set into j nonempty subsets. This is called Stirling’s number of the second kind. This suggests that there’s some fascinating combinatorics involving coherent states. That’s exactly the kind of thing I enjoy, so I would like to understand this formula someday… but not today! I just want something to go to zero!

If I rescale the above formula, I seem to get

\begin{array}{ccl} \langle \widetilde{N}_i^m \; \widetilde{\Psi}_c \rangle &=& \hbar^m \langle N_i^m \Psi_{c/\hbar} \rangle \\ \\ &=& \hbar^m \; \displaystyle{ \sum_{j = 1}^m \left(\frac{c_i}{\hbar}\right)^j \left\{ \begin{array}{c} m \\ j \end{array} \right\} } \end{array}

We could plug this formula into

\langle (\widetilde{N}_i - c_i)^p \; \widetilde{\Psi}_c \rangle =  \displaystyle{ \sum_{m = 0}^p \, \binom{m}{p} \; \langle \widetilde{N}_i^m \;  \widetilde{\Psi}_c \rangle \, (-c_i)^{p - m} }

and then try to show the result goes to zero as \hbar \to 0. But I don’t have the energy to do that… not right now, anyway!

Maybe you do. Or maybe you can think of a better approach to solving this problem. The answer must be well-known, since the large-number limit of a Poisson distribution is a very important thing.


Symmetry and the Fourth Dimension (Part 13)

5 July, 2013

 

Now let’s start thinking about 4d Platonic solids. We’ve seen the 4-cube… what else is there? Well, in 3d we can take a cube and build an octahedron as shown here. The same trick works in any dimension. In n dimensions, we get something called the n-dimensional cross-polytope, which has one corner at the center of each (n-1)-dimensional ‘face’ of the n-cube.

Puzzle 1. What’s a 2d cross-polytope?

It’s worth noting the relationship between cubes and cross-polytopes is symmetrical. In other words, we can also build an n-cube by putting one corner at the center of each face of the n-dimensional cross-polytope! For example:

But now let’s think about the 4-dimensional case. Since the 4-cube has 8 faces (each being a cube), the 4d cross-polytope must have 8 corners. And since the 4-cube has 16 corners, the 4d cross-polytope must have 16 faces. This is why it’s also called the 16-cell.

It also has other names. Amusingly, the Simple English Wikipedia says:

In four dimensional geometry, a 16-cell is a regular convex polychoron, or polytope existing in four dimensions. It is also known as the hexadecachoron. It is one of the six regular convex polychora first described by the Swiss mathematician Ludwig Schläfli in the mid-19th century. Conway calls it an orthoplex for ‘orthant complex’, as well as the entire class of cross-polytopes.

Simple English, eh? That would really demoralize me if I were a non-native speaker.

The 4d cross-polytope

But let’s sidestep the fancy words and think about what the 4d cross-polytope looks like. To draw a cross-polytope in n dimensions, we can draw the n coordinate axes and draw a dot one inch from the origin along each axis in each direction. Then connect each dot to every other one except the opposite one on the same axis. Then erase the coordinate axes.

In 3 dimensions you get this:

It may not look like much, but it’s a perspective picture of the vertices and edges of an octahedron, or 3d cross-polytope.

Puzzle 2. How many line segments going between red dots are in this picture? These are the edges of the 3d cross-polytope.

Puzzle 3. How many triangles with red corners can you see in this picture? These are the triangular faces of the 3d cross-polytope.

Now let’s do the same sort of thing in 4 dimensions! For this we can start with 4 axes in the plane, each at a 45° angle from the next. We can then draw a dot one inch from the origin along each axis in each direction… and connect each dot to each other except the opposite one on the same axis. We get this:


If we then erase the axes, we get this:

This a perspective picture of a 4d cross-polytope!

Puzzle 4. How many line segments going between red dots are in this picture? These are the edges of the 4d cross-polytope.

Puzzle 5. How many triangles with red corners can you see in this picture? These are the triangular 2-dimensional faces of the 4d cross-polytope.

Let’s say that 4d polytope has:

• 0-dimensional vertices,

• 1-dimensional edges,

• 2-dimensional faces, and

• 3-dimensional facets.

In general the facets of an n-dimensional thing are its (n-1)-dimensional parts, while the parts of every dimension below n are often called faces. But in 4d we have enough words to be completely unambiguous, so let’s use the words as above. And in 3d, let’s use face in its traditional sense, to mean a 2d face.

So, as long as I talk only about 3d and 4d geometry, you can be sure that when I say face I mean a 2-dimensional face. When I say facet, I’ll mean a 3-dimensional face.

Puzzle 6. What shape are the facets of the 4d cross-polytope?

4-cube versus 4d cross-polytope

 


On top you see the 4-cube. At right, the 4d cross-polytope. Both are projected down to the plane in the same way.

So, the 4d cross-polytope has

2 × 4 = 8

vertices: one centered at each 3d cubical face of the 4-cube. To see how this works, mentally move the cross-polytope up and put it on top of the 4-cube.

On the other hand, the 4d cross-polytope has

24 = 16

faces: one for each corner of the 4-cube.

And this is a general pattern. The n-dimensional cross-polytope has one vertex in the middle of each face of the n-cube, and vice versa. For this reason we say they are Poincaré dual to each other, or simply dual. The n-cube has

2 × n

vertices and

2n

faces, but for the n-dimensional cross-polytope it’s the other way around.

Figure credits and more

The picture of the octahedron in cube and cube in octahedron are from Frederick J. Goodman, who has written a book about this stuff called Algebra: Abstract and Concrete.

The other images are on Wikimedia Commons, and all have been released into the public domain except this one:

which was made by Markus Krötzsch.

For more on cross-polytopes, see this:

Cross-polytope, Wikipedia.


Relative Entropy (Part 2)

2 July, 2013

In the first part of this mini-series, I describe how various ideas important in probability theory arise naturally when you start doing linear algebra using only the nonnegative real numbers.

But after writing it, I got an email from a rather famous physicist saying he got “lost at line two”. So, you’ll be happy to hear that the first part is not a prerequisite for the remaining parts! I wrote it just to intimidate that guy.

Tobias Fritz and I have proved a theorem characterizing the concept of relative entropy, which is also known as ‘relative information’, ‘information gain’ or—most terrifying and least helpful of all—‘Kullback-Leibler divergence’. In this second part I’ll introduce two key players in this theorem. The first, \mathrm{FinStat}, is a category where:

• an object consists of a system with finitely many states, and a probability distribution on those states

and

• a morphism consists of a deterministic ‘measurement process’ mapping states of one system to states of another, together with a ‘hypothesis’ that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.

The second, \mathrm{FP}, is a subcategory of \mathrm{FinStat}. It has all the same objects, but only morphisms where the hypothesis is ‘optimal’. This means that if the observer measures the system many times, and uses the probability distribution of their observations together with their hypothesis to guess the probability distribution of states of the system, they get the correct answer (in the limit of many measurements).

In this part all I will really do is explain precisely what \mathrm{FinStat} and \mathrm{FP} are. But to whet your appetite, let me explain how we can use them to give a new characterization of relative entropy!

Suppose we have any morphism in \mathrm{FinStat}. In other words: suppose we have a deterministic measurement process, together with a hypothesis that lets the observer guess a probability distribution of states of the system being measured, based on what they observe.

Then we have two probability distributions on the states of the system being measured! First, the ‘true’ probability distribution. Second, the probability that the observer will guess based on their observations.

Whenever we have two probability distributions on the same set, we can compute the entropy of the first relative to to the second. This describes how surprised you’ll be if you discover the probability distribution is really the first, when you thought it was the second.

So: any morphism in \mathrm{FinStat} will have a relative entropy. It will describe how surprised the observer will be when they discover the true probability distribution, given what they had guessed.

But this amount of surprise will be zero if their hypothesis was ‘optimal’ in the sense I described. So, the relative entropy will vanish on morphisms in \mathrm{FP}.

Our theorem says this fact almost characterizes the concept of relative entropy! More precisely, it says that any convex-linear lower semicontinuous functor

F : \mathrm{FinStat} \to [0,\infty]

that vanishes on the subcategory \mathrm{FP} must equal some constant times the relative entropy.

Don’t be scared! This should not make sense to you yet, since I haven’t said how I’m thinking of [0,+\infty] as a category, nor what a ‘convex-linear lower semicontinuous functor’ is, nor how relative entropy gives one. I will explain all that later. I just want you to get a vague idea of where I’m going.

Now let me explain the categories \mathrm{FinStat} and \mathrm{FP}. We need to warm up a bit first.

FinStoch

A stochastic map f : X \leadsto Y is different from an ordinary function, because instead of assigning a unique element of Y to each element of X, it assigns a probability distribution on Y to each element of X. So you should imagine it as being like a function ‘with random noise added’, so that f(x) is not a specific element of Y, but instead has a probability of taking on different values. This is why I’m using a weird wiggly arrow to denote a stochastic map.

More formally:

Definition. Given finite sets X and Y, a stochastic map f : X \leadsto Y assigns a real number f_{yx} to each pair x \in X, y \in Y, such that fixing any element x, the numbers f_{yx} form a probability distribution on Y. We call f_{yx} the probability of y given x.

In more detail:

f_{yx} \ge 0 for all x \in X, y \in Y.

and

\displaystyle{ \sum_{y \in Y} f_{yx} = 1} for all x \in X.

Note that we can think of f : X \leadsto Y as a Y \times X-shaped matrix of numbers. A matrix obeying the two properties above is called stochastic. This viewpoint is nice because it reduces the problem of composing stochastic maps to matrix multiplication. It’s easy to check that multiplying two stochastic matrices gives a stochastic matrix. So, composing stochastic maps gives a stochastic map.

We thus get a category:

Definition. Let \mathrm{FinStoch} be the category of finite sets and stochastic maps between them.

In case you’re wondering why I’m restricting attention to finite sets, it’s merely because I want to keep things simple. I don’t want to worry about whether sums or integrals converge.

FinProb

Now take your favorite 1-element set and call it 1. A function p : 1 \to X is just a point of X. But a stochastic map p : 1 \leadsto X is something more interesting: it’s a probability distribution on X.

Why? Because it gives a probability distribution on X for each element of 1, but that set has just one element.

Last time I introduced the rather long-winded phrase finite probability measure space to mean a finite set with a probability distribution on it. But now we’ve seen a very quick way to describe such a thing within \mathrm{FinStoch}:

And this gives a quick way to think about a measure-preserving function between finite probability measure spaces! It’s just a commutative triangle like this:

Note that the horizontal arrow f:  X \to Y is not wiggly. The straight arrow means it’s an honest function, not a stochastic map. But a function is a special case of a stochastic map! So it makes sense to compose a straight arrow with a wiggly arrow—and the result is, in general, a wiggly arrow. So, it makes sense to demand that this triangle commutes, and this says that the function f: X \to Y is measure-preserving.

Let me work through the details, in case they’re not clear.

First: how is a function a special case of a stochastic map? Here’s how. If we start with a function f: X \to Y, we get a matrix of numbers

f_{yx} = \delta_{y,f(x)}

where \delta is the Kronecker delta. So, each element x \in X gives a probability distribution that’s zero except at f(x).

Given this, we can work out what this commuting triangle really says:

If use p_x to stand for the probability distribution that p: 1 \leadsto X puts on X, and similarly for q_y, the commuting triangle says

\displaystyle{ q_y = \sum_{x \in X} \delta_{y,f(x)} p_x}

or in other words:

\displaystyle{ q_y = \sum_{x \in X : f(x) = y} p_x }

or if you like:

\displaystyle{ q_y = \sum_{x \in f^{-1}(y)} p_x }

In this situation people say q is p pushed forward along f, and they say f is a measure-preserving function.

So, we’ve used \mathrm{FinStoch} to describe another important category:

Definition. Let \mathrm{FinProb} be the category of finite probability measure spaces and measure-preserving functions between them.

I can’t resist mentioning another variation:

A commuting triangle like this is a measure-preserving stochastic map. In other words, p gives a probability measure on X, q gives a probability measure on Y, and f: X \leadsto Y is a stochastic map with

\displaystyle{ q_y = \sum_{x \in X} f_{yx} p_x }

FinStat

The category we really need for relative entropy is a bit more subtle. An object is a finite probability measure space:

but a morphism looks like this:

The whole diagram doesn’t commute, but the two equations I wrote down hold. The first equation says that f: X \to Y is a measure-preserving function. In other words, this triangle, which we’ve seen before, commutes:

The second equation says that f \circ s is the identity, or in math jargon, s is a section for f.

But what does that really mean?

The idea is that X is the set of ‘states’ of some system, while Y is a set of possible ‘observations’ you might make. The function f is a ‘measurement process’. You ‘measure’ the system using f, and if the system is in the the state x you get the observation f(x). The probability distribution p says the probability that the system is any given state, while q says the probability that you get any given observation when you do your measurement.

Note: are assuming for now that that there’s no random noise in the observation process! That’s why f is a function instead of a stochastic map.

But what about s? That’s the fun part: s describes your ‘hypothesis’ about the system’s state given a particular measurement! If you measure the system and get a result y \in Y, you guess it’s in the state x with probability s_{xy}.

And we don’t want this hypothesis to be really dumb: that’s what

f \circ s = 1_Y

says. You see, this equation says that

\sum_{x \in X} \delta_{y', f(x)} s_{xy} = \delta_{y' y}

or in other words:

\sum_{x \in f^{-1}(y')} s_{xy} = \delta_{y' y}

If you think about it, this implies s_{xy} = 0 unless f(x) = y.

So, if you make an observation y, you will guess the system is in state x with probability zero unless f(x) = y. In short, you won’t make a really dumb guess about the system’s state.

Here’s how we compose morphisms:

We get a measure-preserving function g \circ f : X \to Z and a stochastic map going back, s \circ t : Z \to Z. You can check that these obey the required equations:

g \circ f \circ p = r

g \circ f \circ s \circ t = 1_Z

So, we get a category:

Definition. Let \mathrm{FinStat} be the category where an object is a finite probability measure space:

a morphism is a diagram obeying these equations:

and composition is defined as above.

FP

As we’ve just seen, a morphism in \mathrm{FinStat} consists of a ‘measurement process’ f and a ‘hypothesis’ s:

But sometimes we’re lucky and our hypothesis is optimal, in the sense that

s \circ q = p

Conceptually, this says that if you take the probability distribution q on our observations and use it to guess a probability distribution for the system’s state using our hypothesis s, you get the correct answer: p.

Mathematically, it says that this diagram commutes:

In other words, s is a measure-preserving stochastic map.

There’s a subcategory of \mathrm{FinStat} with all the same objects, but only these ‘optimal’ morphisms. It’s important, but the name we have for it is not very exciting:

Definition. Let \mathrm{FP} be the subcategory of \mathrm{FinStat} where an object is a finite probability measure space

and a morphism is a diagram obeying these equations:

Why do we call this category \mathrm{FP}? Because it’s a close relative of \mathrm{FinProb}, where a morphism, you’ll remember, looks like this:

The point is that for a morphism in \mathrm{FP}, the conditions on s are so strong that they completely determine it unless there are observations that happen with probability zero—that is, unless there are y \in Y with q_y = 0. To see this, note that

s \circ q = p

actually says

\sum_{y \in Y} s_{xy} q_y = p_x

for any choice of x \in X. But we’ve already seen s_{xy} = 0 unless f(x) = y, so the sum has just one term, and the equation says

s_{x,f(x)} q_{f(x)} = p_x

We can solve this for s_{x,f(x)}, so s is completely determined… unless q_{f(x)} = 0.

This covers the case when y = f(x). We also can’t figure out s_{x,y} if y isn’t in the image of f.

So, to be utterly precise, s is determined by p, q and f unless there’s an element y \in Y that has q_y = 0. Except for this special case, a morphism in \mathrm{FP} is just a morphism in \mathrm{FinProb}. But in this special case, a morphism in \mathrm{FP} has a little extra information: an arbitrary probability distribution on the inverse image of each point y with this property.

In short, \mathrm{FP} is the same as \mathrm{FinProb} except that our observer’s ‘optimal hypothesis’ must provide a guess about the state of the system given an observation, even in cases of observations that occur with probability zero.

I’m going into these nitpicky details for two reasons. First, we’ll need \mathrm{FP} for our characterization of relative entropy. But second, Tom Leinster already ran into this category in his work on entropy and category theory! He discussed it here:

• Tom Leinster, An operadic introduction to entropy.

Despite the common theme of entropy, he arrived at it from a very different starting-point.

Conclusion

So, I hope that next time I can show you something like this:

and you’ll say “Oh, that’s a probability distribution on the states of some system!” Intuitively, you should think of the wiggly arrow p as picking out a ‘random element’ of the set X.

I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, sending a probability distribution on the states of the measured system to a probability distribution on observations!”

I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, together with a hypothesis about the system’s state, given what is observed!”

And I hope I can show you this:

and you’ll say “Oh, that’s a deterministic measurement process, together with an optimal hypothesis about the system’s state, given what is observed!”

I don’t count on it… but I can hope.

Postscript

And speaking of unrealistic hopes, if I were really optimistic I would hope you noticed that \mathrm{FinStoch} and \mathrm{FinProb}, which underlie the more fancy categories I’ve discussed today, were themselves constructed starting from linear algebra over the nonnegative numbers [0,\infty) in Part 1. That ‘foundational’ work is not really needed for what we’re doing now. However, I like the fact that we’re ultimately getting the concept of relative entropy starting from very little: just linear algebra, using only nonnegative numbers!


The Large-Number Limit for Reaction Networks (Part 1)

1 July, 2013

Waiting for the other shoe to drop.

This is a figure of speech that means ‘waiting for the inevitable consequence of what’s come so far’. Do you know where it comes from? You have to imagine yourself in an apartment on the floor below someone who is taking off their shoes. When you hear one, you know the next is coming.

There’s even an old comedy routine about this:

A guest who checked into an inn one night was warned to be quiet because the guest in the room next to his was a light sleeper. As he undressed for bed, he dropped one shoe, which, sure enough, awakened the other guest. He managed to get the other shoe off in silence, and got into bed. An hour later, he heard a pounding on the wall and a shout: “When are you going to drop the other shoe?”

When we were working on math together, James Dolan liked to say “the other shoe has dropped” whenever an inevitable consequence of some previous realization became clear. There’s also the mostly British phrase the penny has dropped. You say this when someone finally realizes the situation they’re in.

But sometimes one realization comes after another, in a long sequence. Then it feels like it’s raining shoes!

I guess that’s a rather strained metaphor. Perhaps falling like dominoes is better for these long chains of realizations.

This is how I’ve felt in my recent research on the interplay between quantum mechanics, stochastic mechanics, statistical mechanics and extremal principles like the principle of least action. The basics of these subjects should be completely figured out by now, but they aren’t—and a lot of what’s known, nobody bothered to tell most of us.

So, I was surprised to rediscover that the Maxwell relations in thermodynamics are formally identical to Hamilton’s equations in classical mechanics… though in retrospect it’s obvious. Thermodynamics obeys the principle of maximum entropy, while classical mechanics obeys the principle of least action. Wherever there’s an extremal principle, symplectic geometry, and equations like Hamilton’s equations, are sure to follow.

I was surprised to discover (or maybe rediscover, I’m not sure yet) that just as statistical mechanics is governed by the principle of maximum entropy, quantum mechanics is governed by a principle of maximum ‘quantropy’. The analogy between statistical mechanics and quantum mechanics has been known at least since Feynman and Schwinger. But this basic aspect was never explained to me!

I was also surprised to rediscover that simply by replacing amplitudes by probabilities in the formalism of quantum field theory, we get a nice formalism for studying stochastic many-body systems. This formalism happens to perfectly match the ‘stochastic Petri nets’ and ‘reaction networks’ already used in subjects from population biology to epidemiology to chemistry. But now we can systematically borrow tools from quantum field theory! All the tricks that particle physicists like—annihilation and creation operators, coherent states and so on—can be applied to problems like the battle between the AIDS virus and human white blood cells.

And, perhaps because I’m a bit slow on the uptake, I was surprised when yet another shoe came crashing to the floor the other day.

Because quantum field theory has, at least formally, a nice limit where Planck’s constant goes to zero, the same is true for for stochastic Petri nets and reaction networks!

In quantum field theory, we call this the ‘classical limit’. For example, if you have a really huge number of photons all in the same state, quantum effects sometimes become negligible, and we can describe them using the classical equations describing electromagnetism: the classical Maxwell equations. In stochastic situations, it makes more sense to call this limit the ‘large-number limit’: the main point is that there are lots of particles in each state.

In quantum mechanics, different observables don’t commute, so the so-called commutator matters a lot:

[A,B] = AB - BA

These commutators tend to be proportional to Planck’s constant. So in the limit where Planck’s constant \hbar goes to zero, observables commute… but commutators continue to have a ghostly existence, in the form of Poisson bracket:

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

Poisson brackets are a key part of symplectic geometry—the geometry of classical mechanics. So, this sort of geometry naturally shows up in the study of stochastic Petri nets!

Let me sketch how it works. I’ll start with a section reviewing stuff you should already know if you’ve been following the network theory series.

The stochastic Fock space

Suppose we have some finite set S. We call its elements species, since we think of them as different kinds of things—e.g., kinds of chemicals, or kinds of organisms.

To describe the probability of having any number of things of each kind, we need the stochastic Fock space. This is the space of real formal power series in a bunch of variables, one for each element of S. It won’t hurt to simply say

S = \{1, \dots, k \}

Then the stochastic Fock space is

\mathbb{R}[[z_1, \dots, z_k ]]

this being math jargon for the space of formal power series with real coefficients in some variables z_1, \dots, z_k, one for each element of S.

We write

n = (n_1, \dots, n_k) \in \mathbb{N}^S

and use this abbreviation:

z^n = z_1^{n_1} \cdots z_k^{n_k}

We use z^n to describe a state where we have n_1 things of the first species, n_2 of the second species, and so on.

More generally, a stochastic state is an element \Psi of the stochastic Fock space with

\displaystyle{ \Psi = \sum_{n \in \mathbb{N}^k} \psi_n \, z^n }

where

\psi_n \ge 0

and

\displaystyle{ \sum_{n  \in \mathbb{N}^k} \psi_n = 1 }

We use \Psi to describe a state where \psi_n is the probability of having n_1 things of the first species, n_2 of the second species, and so on.

The stochastic Fock space has some important operators on it: the annihilation operators given by

\displaystyle{ a_i \Psi = \frac{\partial}{\partial z_i} \Psi }

and the creation operators given by

\displaystyle{ a_i^\dagger \Psi = z_i \Psi }

From these we can define the number operators:

N_i = a_i^\dagger a_i

Part of the point is that

N_i z^n = n_i z^n

This says the stochastic state z^n is an eigenstate of all the number operators, with eigenvalues saying how many things there are of each species.

The annihilation, creation, and number operators obey some famous commutation relations, which are easy to check for yourself:

[a_i, a_j] = 0

[a_i^\dagger, a_j^\dagger] = 0

[a_i, a_j^\dagger] = \delta_{i j}

[N_i, N_j ] = 0

[N_i , a_j^\dagger] = \delta_{i j} a_j^\dagger

[N_i , a_j] = - \delta_{i j} a_j^\dagger

The last two have easy interpretations. The first of these two implies

N_i a_i^\dagger \Psi = a_i^\dagger (N_i + 1) \Psi

This says that if we start in some state \Psi, create a thing of type i, and then count the things of that type, we get one more than if we counted the number of things before creating one. Similarly,

N_i a_i \Psi = a_i (N_i - 1) \Psi

says that if we annihilate a thing of type i and then count the things of that type, we get one less than if we counted the number of things before annihilating one.

Introducing Planck’s constant

Now let’s introduce an extra parameter into this setup. To indicate the connection to quantum physics, I’ll call it \hbar, which is the usual symbol for Planck’s constant. However, I want to emphasize that we’re not doing quantum physics here! We’ll see that the limit where \hbar \to 0 is very interesting, but it will correspond to a limit where there are many things of each kind.

We’ll start by defining

A_i = \hbar \, a_i

and

C_i = a_i^\dagger

Here A stands for ‘annihilate’ and C stands for ‘create’. Think of A as a rescaled annihilation operator. Using this we can define a rescaled number operator:

\widetilde{N}_i = C_i A_i

So, we have

\widetilde{N}_i = \hbar N_i

and this explains the meaning of the parameter \hbar. The idea is that instead of counting things one at time, we count them in bunches of size 1/\hbar.

For example, suppose \hbar = 1/12. Then we’re counting things in dozens! If we have a state \Psi with

N_i \Psi = 36 \Psi

then there are 36 things of the ith kind. But this implies

\widetilde{N}_i \Psi = 3 \Psi

so there are 3 dozen things of the ith kind.

Chemists don’t count in dozens; they count things in big bunches called moles. A mole is approximately the number of carbon atoms in 12 grams: Avogadro’s number, 6.02 × 1023. When you count things by moles, you’re taking \hbar to be 1.66 × 10-24, the reciprocal of Avogadro’s number.

So, while in quantum mechanics Planck’s constant is ‘the quantum of action’, a unit of action, here it’s ‘the quantum of quantity’: the amount that corresponds to one thing.

We can easily work out the commutation relations of our new rescaled operators:

[A_i, A_j] = 0

[C_i, C_j] = 0

[A_i, C_j] = \hbar \, \delta_{i j}

[\widetilde{N}_i, \widetilde{N}_j ] = 0

[\widetilde{N}_i , C_j] = \hbar \,  \delta_{i j} C_j

[\widetilde{N}_i , A_j] = - \hbar \, \delta_{i j} A_j

These are just what you see in quantum mechanics! The commutators are all proportional to \hbar.

Again, we can understand what these relations mean if we think a bit. For example, the commutation relation for \widetilde{N}_i and C_i says

N_i C_i \Psi = C_i (N_i + \hbar) \Psi

This says that if we start in some state \Psi, create a thing of type i, and then count the things of that type, we get \hbar more than if we counted the number of things before creating one. This is because we are counting things not one at a time, but in bunches of size 1/\hbar.

You may be wondering why I defined the rescaled annihilation operator to be \hbar times the original annihilation operator:

A_i = \hbar \, a_i

but left the creation operator unchanged:

C_i = a_i^\dagger

I’m wondering that too! I’m not sure I’m doing things the best way yet. I’ve also tried another more symmetrical scheme, taking A_k = \sqrt{\hbar} \, a_k and C_k = \sqrt{\hbar} a_k^\dagger. This gives the same commutation relations, but certain other formulas become more unpleasant. I’ll explain that some other day.

Next, we can take the limit as \hbar \to 0 and define Poisson brackets of operators by

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

To make this rigorous it’s best to proceed algebraically. For this we treat \hbar as a formal variable rather than a specific number. So, our number system becomes \mathbb{R}[\hbar], the algebra of polynomials in \hbar. We define the Weyl algebra to be the algebra over \mathbb{R}[\hbar] generated by elements A_i and C_i obeying

[A_i, A_j] = 0

[C_i, C_j] = 0

[A_i, C_j] = \hbar \, \delta_{i j}

We can set \hbar = 0 in this formalism; then the Weyl algebra reduces to the algebra of polynomials in the variables A_i and C_i. This algebra is commutative! But we can define a Poisson bracket on this algebra by

\displaystyle{ \{A,B\} = \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A,B] }

It takes a bit of work to explain to algebraists exactly what’s going on in this formula, because it involves an interplay between the algebra of polynomials in A_i and C_i, which is commutative, and the Weyl algebra, which is not. I’ll be glad to explain the details if you want. But if you’re a physicist, you can just follow your nose and figure out what the formula gives. For example:

\begin{array}{ccl}   \{A_i, C_j\} &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} [A_i, C_j] } \\  \\  &=& \displaystyle{ \lim_{\hbar \to 0} \; \frac{1}{\hbar} \, \hbar \, \delta_{i j} }  \\  \\  &=& \delta_{i j} \end{array}

Similarly, we have:

\{ A_i, A_j \} = 0

\{ C_i, C_j \} = 0

\{ A_i, C_j \} = \delta_{i j}

\{ \widetilde{N}_i, \widetilde{N}_j \}  = 0

\{ \widetilde{N}_i , C_j \} = \delta_{i j} C_j

\{ \widetilde{N}_i , A_j \} = - \delta_{i j} A_j

I should probably use different symbols for A_i, C_i and \widetilde{N}_i after we’ve set \hbar = 0, since they’re really different now, but I don’t have the patience to make up more names for things!

Now, we can think of A_i and C_i as coordinate functions on a 2k-dimensional vector space, and all the polynomials in A_i and C_i as functions on this space. This space is what physicists would call a ‘phase space’: they use this kind of space to describe the position and momentum of a particle, though here we are using it in a different way. Mathematicians would call it a ‘symplectic vector space’, because it’s equipped with a special structure, called a symplectic structure, that lets us define Poisson brackets of smooth functions on this space. We won’t need to get into that now, but it’s important—and it makes me happy to see it here.

More

There’s a lot more to do, but not today. My main goal is to understand, in a really elegant way, how the master equation for a stochastic Petri net reduces to the rate equation in the large-number limit. What we’ve done so far is start thinking of this as a \hbar \to 0 limit. This should let us borrow ideas about classical limits in quantum mechanics, and apply them to stochastic mechanics.

Stay tuned!


Symmetry and the Fourth Dimension (Part 12)

29 June, 2013

In 3d we’re used to rotations around an axis, but we need to break that habit when we start thinking about other dimensions. More fundamental is rotation in a plane.

What do I mean by ‘rotation in a plane’?

This is clearest in 2d space, which is nothing but a plane. Put your finger on a piece of paper and spin the paper around. That’s what I’m talking about!

Or look at this:

This picture lies in a plane. It’s rotating, but not around an axis in that plane.

Of course we can imagine an axis at right angles to that plane, and say the picture is rotating around that axis… but that requires introducing an extra dimension, so it’s artificial. It’s unnecessary. Worst of all, it’s misleading if you’re trying to think about how rotations work in different dimensions.

Here’s the general fact that works in any dimension of space. You can always take any rotation and break it into separate rotations in different planes that are at right angles to each other.

For example, consider 3d space. You can pick any 2d plane and get the points in that plane to rotate. But now there’s a 1d line at right angles to that plane that doesn’t move. Note the numbers here:

3 = 2+1

Next consider 4d space. Now you can pick any 2d plane and get the points in that plane to rotate. This leaves another 2d plane at right angles to the rotating one. Note the numbers:

4 = 2+2

This movie shows a 4-cube rotating in a single plane in 4d space:


We aren’t drawing the plane of rotation, just the cube. And the true 4d picture has been squashed down to a plane using perspective! So, it’s a bit hard to understand. But the rotation is taking place in the plane that contains the ‘left-right’ axis and the ‘in-out’ axis. The ‘up-down’ axis and the ‘front-back’ axis are left unmoved.

But we can also do something fancier! We can break 4d space into a rotating 2d plane and, at right angles to that, another rotating plane. This is what’s happening here:


The two rotations just happen to be going on at the same speed—otherwise it would look a lot more complicated.

The story continues in 5 dimensions. Now the equation that matters is this:

5 = 2+2+1

That means we can pick any 2d plane, get that to rotate, then pick another 2d plane at right angles to the first one, get that to rotate… and we’re left with a 1d line of points that don’t rotate.

So, rotations are very different depending on whether the dimension of space is odd or even! In odd-dimensional space, any rotation must leave some line unmoved. In even-dimensional space, that’s not true. This turns out to have huge consequences in math and physics. Even and odd dimensions work differently.

For more on rotations in 4d space, see:

Rotations in 4-dimensional Euclidean space, Wikipedia.

Also check out this:

Plane of rotation, Wikipedia.

I’ve shown you movies of rotating 4-cubes where the planes of rotation are neatly lined up with the axes of the 4-cube. For a detailed study of more tricky ways that 4-cube can rotate, read this:

• Greg Egan, Hypercube: mathematical details.

Then you’ll understand pictures like these:



Image credits

The rotating 4-cubes were created by Jason Hise using Maya and Macromedia Fireworks. He put them into the public domain, and they reside on Wikicommons here and here.

The rotating yin-yang symbol was created by Nevit Dilmen.

The other pictures were created by Greg Egan.


Quantitative Reasoning at Yale-NUS College

27 June, 2013

What mathematics should any well-educated person know? It’s rather rare that people have a chance not just to think about this question, but do something about it. But it’s happening now.

There’s a new college called Yale-NUS College starting up this fall in Singapore, jointly run by Yale College and the National University of Singapore. The buildings aren’t finished yet: the above picture shows how a bit of it should look when they are. Faculty are busily setting up the courses and indeed the whole administrative structure of the university, and I’ve had the privilege of watching some of this and even helping out a bit.

It’s interesting because you usually meet an institution when it’s already formed—and you encounter and learn about only those aspects that matter to you. But in this case, the whole institution is being created, and every aspect discussed. And this is especially interesting because Yale-NUS College is designed to be a ‘liberal arts college for Asia for the 21st century’.

As far as I can tell, there are no liberal arts colleges in Asia. Creating a good one requires rethinking the generally Eurocentric attitudes toward history, philosophy, literature, classics and so on that are built into the traditional idea of the liberal arts. Plus, the whole idea of a liberal arts education needs to be rethought for the 21st century. What should a well-educated person know, and be able to do? Luckily, the faculty of Yale-NUS College are taking a fresh look at this question, and coming up with some new answers.

I’m really excited about the Quantitative Reasoning course that all students will take in the second semester of their first year. It will cover topics like this:

• innumeracy, use of numbers in the media.
• visualizing quantitative data.
• cognitive biases, operationalization.
• qualitative heuristics, cognitive biases, formal logic and mathematical proof.
• formal logic, mathematical proofs.
• probability, conditional probability (Bayes’ rule), gambling and odds.
• decision trees, expected utility, optimal decisions and prospect theory.
• sampling, uncertainty.
• quantifying uncertainty, hypothesis testing, p-values and their limitations.
statistical power and significance levels, evaluating evidence.
• correlation and causation, regression analysis.

The idea is not to go into vast detail and not to bombard the students with sophisticated mathematical methods, but to help students:

• learn how to criticize and question claims in an informed way;

• learn to think clearly, to understand logical and intuitive reasoning, and to consider appropriate standards of proof
 in different contexts;

• develop a facility and comfort with a variety of representations of quantitative data, and practical experience in gathering data;

• understand the sources of bias and error in seemingly objective numerical data;

• become familiar with the basic concepts 
of probability and statistics, with particular emphasis on recognizing when these techniques provide reliable results and when they threaten to mislead us.

They’ll do some easy calculations using R, a programming language optimized for statistics.

Most exciting of all to me is how the course will be taught. There will be about 9 teachers. It will be ‘team-based learning’, where students are divided into (carefully chosen) groups of six. A typical class will start with a multiple choice question designed to test the students understanding of the material they’ve just studied. Then the team will discuss their answers, while professors walk around and help out; then they’ll take the quiz again; then one professor will talk about that topic.

This idea is called ‘peer instruction’. Some studies have shown this approach works better than the traditional lecture style. I’ve never seen it in action, though my friend Christopher Lee uses it in now in his bioinformatics class, and he says it’s great. You can read about its use in physics here:

• Eric Mazur, Physics Education.

I’ll be interested to see it in action starting in August, and later I hope to teach part-time at Yale-NUS College and see how it works for myself!

At the very least, it’s exciting to see people try new things.


Symmetry and the Fourth Dimension (Part 11)

21 June, 2013

This is my favorite 2d view of a 4-dimensional cube, because it’s the most symmetrical. The 4 dimensions are depicted as going north, east, northeast and northwest.

Puzzle: count the cubes in this picture.

The cubes are a bit distorted, of course: they have to be because we’re squashing 4 dimensions down to 2! But they’re not very distorted.

Here’s the same picture rotated a bit, drawn by Tilman Piesk:

If we like, we can draw a 2d perspective picture of a 4-cube that’s so vivid it really looks 3-dimensional. There are lots of ways to do this. One clever way is to make the fourth dimension point inwards, towards the center:

This was drawn using Robert Webb’s Stella software, perhaps by Webb himself. I like the steel rods and brass balls.

On Wikicommons, someone who goes by the name “Maninthemasterplan” took another approach:

Cross your eyes so the two images overlap. If you get them to lie perfectly on top of each other, they’ll lock in place and you’ll see a 4-cube in 3d. This is called a stereogram.

Personally I don’t find this very helpful for understanding the 4-cube. Crossing my eyes and looking at this 3d image puts me into an altered state of mind which is fun but not good for doing mathematics!

We can also take a 3d perspective picture and set it rotating to make a movie, as Jason Hise did here:

But this is probably worth a discussion all its own! So, next time I’ll talk more about rotations in 4d.

Hinton and the tesseract

The 4-dimensional cube has many names. I’ll call it the 4-cube since this is short and generalizes to higher dimensions. But it’s also called a 4-dimensional hypercube or simply a hypercube—as well as an 8-cell or regular octachoron, for some reason.

When I was growing up, science fiction writers of a certain ilk called the 4-cube a tesseract. That name still persists in some quarters, though not among mathematicians.


The term ‘tesseract’ has an interesting origin. It’s from a rarely used prefix tessara- meaning ‘four’ and the Ancient Greek word ἀκτίς meaning ‘ray’ (as in the element ‘actinium’). So, the term means ‘four rays’, and it comes from the fact that there are 4 edges coming out of each vertex of the tesseract.

But that’s not the interesting part. What’s interesting is that is was coined in 1888 by Charles Howard Hinton, in a book called A New Era of Thought, which espoused some curious ideas about the spiritual benefits of visualizing 4-dimensional geometry. In his later book The Fourth Dimension, he developed a system of colored cubes to help people do this.

Rumors subsequently arose that these cubes had driven more than one person insane. The only substance to this that I can find is this letter appearing in Martin Gardner’s column after he wrote about Hinton:

Dear Mr. Gardner:

A shudder ran down my spine when I read your reference to Hinton’s cubes. I nearly got hooked on them myself in the nineteen-twenties. Please believe me when I say that they are completely mind-destroying. The only person I ever met who had worked with them seriously was Francis Sedlak, a Czech neo-Hegelian Philosopher (he wrote a book called The Creation of Heaven and Earth) who lived in an Oneida-like community near Stroud, in Gloucestershire.

As you must know, the technique consists essentially in the sequential visualizing of the adjoint internal faces of the poly-colored unit cubes making up the larger cube. It is not difficult to acquire considerable facility in this, but the process is one of autohypnosis and, after a while, the sequences begin to parade themselves through one’s mind of their own accord. This is pleasurable, in a way, and it was not until I went to see Sedlak in 1929 that I realized the dangers of setting up an autonomous process in one’s own brain. For the record, the way out is to establish consciously a countersystem differing from the first in that the core cube shows different colored faces, but withdrawal is slow and I wouldn’t recommend anyone to play around with the cubes at all.

Hiram Barton

Hinton also wrote science fiction stories, which he called Scientific Romances. I haven’t read them, though they’re available online.

In 1880 Hinton married Mary Ellen Boole, daughter of the famous logician. Her sister Alicia Boole Stott became an expert on 4-dimensional geometry, and we will meet her later in this series! But Hinton also married another woman, perhaps because his father was advocated of polygamy. He was convicted of bigamy and spent 3 days in prison for it. Not very long, but he also lost his job.

He later sailed to America, joined the mathematics faculty at Princeton, and developed a gunpowder-powered baseball pitching machine for the team there! If you don’t believe me, read this old article:

The new baseball-pitching device: Prof. Hinton of Princeon explains his invention to the alumni, New York Times, 12 March 1897.

It begins:

Prof. Charles Howard Hinton, who holds the chair of mathematics at Princeton University, explained last night to members of the Princeton Club of this city the baseball gun which he has invented. About 300 Princeton men were present and manifested great interest in the operation of the machine.

It was actually able to throw curve-balls!

I’ve read claims that Hinton later resigned from Princeton due to injuries caused by this device. However, he re-introduced the machine at the University of Minnesota, where he worked as an assistant professor until 1900. He then resigned to move to the U.S. Naval Observatory in Washington, D.C.. (Why? More injured baseball players?) Near the end of his life, he worked as an examiner of chemical patents for the United States Patent Office. He died of a cerebral hemorrhage in 1907, at the age of 54.


For more on Hinton and his cubes see:

• Mark Blacklock, Cubic thought, The Fairyland of Geometry, 10 December 2009.

Hinton’s cubes redux, Banubula, 9 November 2006.

For a guide to making the cubes, see:

Hinton’s cubes.

Apparently these are based on color plates from Hinton’s book The Fourth Dimension. This book is reproduced in black and white here:

• Charles Howard Hinton, The Fourth Dimension, 1912 edition, Internet Archive.

but a modern edition including the color plates is freely available here:

• Charles Howard Hinton, The Fourth Dimension, to which is added The Language of Space, Celephaïs Press, 2004.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers