Information Geometry (Part 10)

4 June, 2012

Last time I began explaining the tight relation between three concepts:

• entropy,

• information—or more precisely, lack of information,

and

• biodiversity.

The idea is to consider n different species of ‘replicators’. A replicator is any entity that can reproduce itself, like an organism, a gene, or a meme. A replicator can come in different kinds, and a ‘species’ is just our name for one of these kinds. If P_i is the population of the ith species, we can interpret the fraction

\displaystyle{ p_i = \frac{P_i}{\sum_j P_j} }

as a probability: the probability that a randomly chosen replicator belongs to the ith species. This suggests that we define entropy just as we do in statistical mechanics:

\displaystyle{ S = - \sum_i p_i \ln(p_i) }

In the study of statistical inference, entropy is a measure of uncertainty, or lack of information. But now we can interpret it as a measure of biodiversity: it’s zero when just one species is present, and small when a few species have much larger populations than all the rest, but gets big otherwise.

Our goal here is play these viewpoints off against each other. In short, we want to think of natural selection, and even biological evolution, as a process of statistical inference—or in simple terms, learning.

To do this, let’s think about how entropy changes with time. Last time we introduced a simple model called the replicator equation:

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

where each population grows at a rate proportional to some ‘fitness functions’ f_i. We can get some intuition by looking at the pathetically simple case where these functions are actually constants, so

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

The equation then becomes trivial to solve:

\displaystyle{ P_i(t) = e^{t f_i } P_i(0)}

Last time I showed that in this case, the entropy will eventually decrease. It will go to zero as t \to +\infty whenever one species is fitter than all the rest and starts out with a nonzero population—since then this species will eventually take over.

But remember, the entropy of a probability distribution is its lack of information. So the decrease in entropy signals an increase in information. And last time I argued that this makes perfect sense. As the fittest species takes over and biodiversity drops, the population is acquiring information about its environment.

However, I never said the entropy is always decreasing, because that’s false! Even in this pathetically simple case, entropy can increase.

Suppose we start with many replicators belonging to one very unfit species, and a few belonging to various more fit species. The probability distribution p_i will start out sharply peaked, so the entropy will start out low:

Now think about what happens when time passes. At first the unfit species will rapidly die off, while the population of the other species slowly grows:

 

So the probability distribution will, for a while, become less sharply peaked. Thus, for a while, the entropy will increase!

This seems to conflict with our idea that the population’s entropy should decrease as it acquires information about its environment. But in fact this phenomenon is familiar in the study of statistical inference. If you start out with strongly held false beliefs about a situation, the first effect of learning more is to become less certain about what’s going on!

Get it? Say you start out by assigning a high probability to some wrong guess about a situation. The entropy of your probability distribution is low: you’re quite certain about what’s going on. But you’re wrong. When you first start suspecting you’re wrong, you become more uncertain about what’s going on. Your probability distribution flattens out, and the entropy goes up.

So, sometimes learning involves a decrease in information—false information. There’s nothing about the mathematical concept of information that says this information is true.

Given this, it’s good to work out a formula for the rate of change of entropy, which will let us see more clearly when it goes down and when it goes up. To do this, first let’s derive a completely general formula for the time derivative of the entropy of a probability distribution. Following Sir Isaac Newton, we’ll use a dot to stand for a time derivative:

\begin{array}{ccl} \displaystyle{  \dot{S}} &=& \displaystyle{ -  \frac{d}{dt} \sum_i p_i \ln (p_i)} \\   \\  &=& - \displaystyle{ \sum_i \dot{p}_i \ln (p_i) + \dot{p}_i }  \end{array}

In the last term we took the derivative of the logarithm and got a factor of 1/p_i which cancelled the factor of p_i. But since

\displaystyle{  \sum_i p_i = 1 }

we know

\displaystyle{ \sum_i \dot{p}_i = 0 }

so this last term vanishes:

\displaystyle{ \dot{S}= -\sum_i \dot{p}_i \ln (p_i) }

Nice! To go further, we need a formula for \dot{p}_i. For this we might as well return to the general replicator equation, dropping the pathetically special assumption that the fitness functions are actually constants. Then we saw last time that

\displaystyle{ \dot{p}_i = \Big( f_i(P) - \langle f(P) \rangle  \Big) \, p_i }

where we used the abbreviation

f_i(P) = f_i(P_1, \dots, P_n)

for the fitness of the ith species, and defined the mean fitness to be

\displaystyle{ \langle f(P) \rangle = \sum_i f_i(P) p_i  }

Using this cute formula for \dot{p}_i, we get the final result:

\displaystyle{ \dot{S} = - \sum_i \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i \ln (p_i) }

This is strikingly similar to the formula for entropy itself. But now each term in the sum includes a factor saying how much more fit than average, or less fit, that species is. The quantity - p_i \ln(p_i) is always nonnegative, since the graph of -x \ln(x) looks like this:

So, the ith term contributes positively to the change in entropy if the ith species is fitter than average, but negatively if it’s less fit than average.

This may seem counterintuitive!

Puzzle 1. How can we reconcile this fact with our earlier observations about the case when the fitness of each species is population-independent? Namely: a) if initially most of the replicators belong to one very unfit species, the entropy will rise at first, but b) in the long run, when the fittest species present take over, the entropy drops?

If this seems too tricky, look at some examples! The first illustrates observation a); the second illustrates observation b):

Puzzle 2. Suppose we have two species, one with fitness equal to 1 initially constituting 90% of the population, the other with fitness equal to 10 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 1, & &  p_1(0) = 0.9 \\ \\                            f_2 = 10 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

Puzzle 3. Suppose we have two species, one with fitness equal to 10 initially constituting 90% of the population, and the other with fitness equal to 1 initially constituting just 10% of the population:

\begin{array}{ccc} f_1 = 10, & &  p_1(0) = 0.9 \\ \\                            f_2 = 1 , & & p_2(0) = 0.1   \end{array}

At what rate does the entropy change at t = 0? Which species is responsible for most of this change?

I had to work through these examples to understand what’s going on. Now I do, and it all makes sense.

Next time

Still, it would be nice if there were some quantity that always goes down with the passage of time, reflecting our naive idea that the population gains information from its environment, and thus loses entropy, as time goes by.

Often there is such a quantity. But it’s not the naive entropy: it’s the relative entropy. I’ll talk about that next time. In the meantime, if you want to prepare, please reread Part 6 of this series, where I explained this concept. Back then, I argued that whenever you’re tempted to talk about entropy, you should talk about relative entropy. So, we should try that here.

There’s a big idea lurking here: information is relative. How much information a signal gives you depends on your prior assumptions about what that signal is likely to be. If this is true, perhaps biodiversity is relative too.


Information Geometry (Part 9)

1 June, 2012


It’s time to continue this information geometry series, because I’ve promised to give the following talk at a conference on the mathematics of biodiversity in early July… and I still need to do some of the research!

Diversity, information geometry and learning

As is well known, some measures of biodiversity are formally identical to measures of information developed by Shannon and others. Furthermore, Marc Harper has shown that the replicator equation in evolutionary game theory is formally identical to a process of Bayesian inference, which is studied in the field of machine learning using ideas from information geometry. Thus, in this simple model, a population of organisms can be thought of as a ‘hypothesis’ about how to survive, and natural selection acts to update this hypothesis according to Bayes’ rule. The question thus arises to what extent natural changes in biodiversity can be usefully seen as analogous to a form of learning. However, some of the same mathematical structures arise in the study of chemical reaction networks, where the increase of entropy, or more precisely decrease of free energy, is not usually considered a form of ‘learning’. We report on some preliminary work on these issues.

So, let’s dive in! To some extent I’ll be explaining these two papers:

• Marc Harper, Information geometry and evolutionary game theory.

• Marc Harper, The replicator equation as an inference dynamic.

However, I hope to bring in some more ideas from physics, the study of biodiversity, and the theory of stochastic Petri nets, also known as chemical reaction networks. So, this series may start to overlap with my network theory posts. We’ll see. We won’t get far today: for now, I just want to review and expand on what we did last time.

The replicator equation

The replicator equation is a simplified model of how populations change. Suppose we have n types of self-replicating entity. I’ll call these entities replicators. I’ll call the types of replicators species, but they don’t need to be species in the biological sense. For example, the replicators could be genes, and the types could be alleles. Or the replicators could be restaurants, and the types could be restaurant chains.

Let P_i(t), or just P_i for short, be the population of the ith species at time t. Then the replicator equation says

\displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) \, P_i }

So, the population P_i changes at a rate proportional to P_i, but the ‘constant of proportionality’ need not be constant: it can be any smooth function f_i of the populations of all the species. We call f_i(P_1, \dots, P_n) the fitness of the ith species.

Of course this model is absurdly general, while still leaving out lots of important effects, like the spatial variation of populations, or the ability for the population of some species to start at zero and become nonzero—which happens thanks to mutation. Nonetheless this model is worth taking a good look at.

Using the magic of vectors we can write

P = (P_1, \dots , P_n)

and

f(P) = (f_1(P), \dots, f_n(P))

This lets us write the replicator equation a wee bit more tersely as

\displaystyle{ \frac{d P}{d t} = f(P) P}

where on the right I’m multiplying vectors componentwise, the way your teachers tried to brainwash you into never doing:

f(P) P = (f(P)_1 P_1, \dots, f(P)_n P_n)

In other words, I’m thinking of P and f(P) as functions on the set \{1, \dots, n\} and multiplying them pointwise. This will be a nice way of thinking if we want to replace this finite set by some more general space.

Why would we want to do that? Well, we might be studying lizards with different length tails, and we might find it convenient to think of the set of possible tail lengths as the half-line [0,\infty) instead of a finite set.

Or, just to get started, we might want to study the pathetically simple case where f(P) doesn’t depend on P. Then we just have a fixed function f and a time-dependent function P obeying

\displaystyle{ \frac{d P}{d t} = f P}

If we’re physicists, we might write P more suggestively as \psi and write the operator multiplying by f as - H. Then our equation becomes

\displaystyle{ \frac{d \psi}{d t} = - H \psi }

This looks a lot like Schrödinger’s equation, but since there’s no factor of \sqrt{-1}, and \psi is real-valued, it’s more like the heat equation or the ‘master equation’, the basic equation of stochastic mechanics.

For an explanation of Schrödinger’s equation and the master equation, try Part 12 of the network theory series. In that post I didn’t include a minus sign in front of the H. That’s no big deal: it’s just a different convention than the one I want today. A more serious issue is that in stochastic mechanics, \psi stands for a probability distribution. This suggests that we should get probabilities into the game somehow.

The replicator equation in terms of probabilities

Luckily, that’s exactly what people usually do! Instead of talking about the population P_i of the ith species, they talk about the probability p_i that one of our organisms will belong to the ith species. This amounts to normalizing our populations:

\displaystyle{  p_i = \frac{P_i}{\sum_j P_j} }

Don’t you love it when notations work out well? Our big Population P_i has gotten normalized to give little probability p_i.

How do these probabilities p_i change with time? Now is the moment for that least loved rule of elementary calculus to come out and take a bow: the quotient rule for derivatives!

\displaystyle{ \frac{d p_i}{d t} = \left(\frac{d P_i}{d t} \sum_j P_j \quad - \quad P_i \sum_j \frac{d P_j}{d t}\right) \big{/} \left(  \sum_j P_j \right)^2 }

Using our earlier version of the replicator equation, this gives:

\displaystyle{ \frac{d p_i}{d t} =  \left(f_i(P) P_i \sum_j P_j \quad - \quad P_i \sum_j f_j(P) P_j \right) \big{/} \left(  \sum_j P_j \right)^2 }

Using the definition of p_i, this simplifies to:

\displaystyle{ \frac{d p_i}{d t} =  f_i(P) p_i \quad - \quad \left( \sum_j f_j(P) p_j \right) p_i }

The stuff in parentheses actually has a nice meaning: it’s just the mean fitness. In other words, it’s the average, or expected, fitness of an organism chosen at random from the whole population. Let’s write it like this:

\displaystyle{ \langle f(P) \rangle = \sum_j f_j(P) p_j  }

So, we get the replicator equation in its classic form:

\displaystyle{ \frac{d p_i}{d t} = \Big( f_i(P) - \langle f(P) \rangle \Big) \, p_i }

This has a nice meaning: for the fraction of organisms of the ith type to increase, their fitness must exceed the mean fitness. If you’re trying to increase market share, what matters is not how good you are, but how much better than average you are. If everyone else is lousy, you’re in luck.

Entropy

Now for something a bit new. Once we’ve gotten a probability distribution into the game, its entropy is sure to follow:

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

This says how ‘smeared-out’ the overall population is among the various different species. Alternatively, it says how much information it takes, on average, to say which species a randomly chosen organism belongs to. For example, if there are 2^N species, all with equal populations, the entropy S works out to N \ln 2. So in this case, it takes N bits of information to say which species a randomly chosen organism belongs to.

In biology, entropy is one of many ways people measure biodiversity. For a quick intro to some of the issues involved, try:

• Tom Leinster, Measuring biodiversity, Azimuth, 7 November 2011.

• Lou Jost, Entropy and diversity, Oikos 113 (2006), 363–375.

But we don’t need to understand this stuff to see how entropy is connected to the replicator equation. Marc Harper’s paper explains this in detail:

• Marc Harper, The replicator equation as an inference dynamic.

and I hope to go through quite a bit of it here. But not today! Today I just want to look at a pathetically simple, yet still interesting, example.

Exponential growth

Suppose the fitness of each species is independent of the populations of all the species. In other words, suppose each fitness f_i(P) is actually a constant, say f_i. Then the replicator equation reduces to

\displaystyle{ \frac{d P_i}{d t} = f_i \, P_i }

so it’s easy to solve:

P_i(t) = e^{t f_i} P_i(0)

You don’t need a detailed calculation to see what’s going to happen to the probabilities

\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)}}

The most fit species present will eventually take over! If one species, say the ith one, has a fitness greater than the rest, then the population of this species will eventually grow faster than all the rest, at least if its population starts out greater than zero. So as t \to +\infty, we’ll have

p_i(t) \to 1

and

p_j(t) \to 0 \quad \mathrm{for} \quad j \ne i

Thus the probability distribution p will become more sharply peaked, and its entropy will eventually approach zero.

With a bit more thought you can see that even if more than one species shares the maximum possible fitness, the entropy will eventually decrease, though not approach zero.

In other words, the biodiversity will eventually drop as all but the most fit species are overwhelmed. Of course, this is only true in our simple idealization. In reality, biodiversity behaves in more complex ways—in part because species interact, and in part because mutation tends to smear out the probability distribution p_i. We’re not looking at these effects yet. They’re extremely important… in ways we can only fully understand if we start by looking at what happens when they’re not present.

In still other words, the population will absorb information from its environment. This should make intuitive sense: the process of natural selection resembles ‘learning’. As fitter organisms become more common and less fit ones die out, the environment puts its stamp on the probability distribution p. So, this probability distribution should gain information.

While intuitively clear, this last claim also follows more rigorously from thinking of entropy as negative information. Admittedly, it’s always easy to get confused by minus signs when relating entropy and information. A while back I said the entropy

\displaystyle{ S(p) = - \sum_i p_i \, \ln(p_i) }

was the average information required to say which species a randomly chosen organism belongs to. If this entropy is going down, isn’t the population losing information?

No, this is a classic sign error. It’s like the concept of ‘work’ in physics. We can talk about the work some system does on its environment, or the work done by the environment on the system, and these are almost the same… except one is minus the other!

When you are very ignorant about some system—say, some rolled dice—your estimated probabilities p_i for its various possible states are very smeared-out, so the entropy S(p) is large. As you gain information, you revise your probabilities and they typically become more sharply peaked, so S(p) goes down. When you know as much as you possibly can, S(p) equals zero.

So, the entropy S(p) is the amount of information you have left to learn: the amount of information you lack, not the amount you have. As you gain information, this goes down. There’s no paradox here.

It works the same way with our population of replicators—at least in the special case where the fitness of each species is independent of its population. The probability distribution p is like a ‘hypothesis’ assigning to each species i the probability p_i that it’s the best at self-replicating. As some replicators die off while others prosper, they gather information their environment, and this hypothesis gets refined. So, the entropy S(p) drops.

Next time

Of course, to make closer contact to reality, we need to go beyond the special case where the fitness of each species is a constant! Marc Harper does this, and I want to talk about his work someday, but first I have a few more remarks to make about the pathetically simple special case I’ve been focusing on. I’ll save these for next time, since I’ve probably strained your patience already.


Free Access to Taxpayer-Funded Research — Act Now!

31 May, 2012

If you’re a US citizen, your taxes pay for lots of scientific research. If you sign this White House petition, you may get to see the research you paid for!

Just click this:


We need a total of 25,000 signatures before June 19th for this to land on the president’s desk. That sounds hard. But:

• On May 29th, we only needed 5825 more.

• On May 30th we only needed 4765 more.

• Right now, on May 31st, we only need 3354.

We can do it! Sign it and pass it on!

The petition says:

Require free access over the Internet to scientific journal articles arising from taxpayer-funded research. We believe in the power of the Internet to foster innovation, research, and education. Requiring the published results of taxpayer-funded research to be posted on the Internet in human and machine readable form would provide access to patients and caregivers, students and their teachers, researchers, entrepreneurs, and other taxpayers who paid for the research. Expanding access would speed the research process and increase the return on our investment in scientific research. The highly successful Public Access Policy of the National Institutes of Health proves that this can be done without disrupting the research process, and we urge President Obama to act now to implement open access policies for all federal agencies that fund scientific research.

If you want more information, read about the Federal Research Public Access Act. This is a bill that would make taxpayer-funded research freely available, while still preserving the legitimate rights of publishing companies.


Fluid Flows and Infinite Dimensional Manifolds (Part 3)

30 May, 2012

Or: Twisting on the Borderline

guest post by Tim van Beek

flow around a symmetric airfoil, lamiar to turbulent

In Part 2 of this series, I told you what ideal incompressible fluids are. Then I explained how the equation of motion for such fluids, Euler’s equation, can be seen as the equation for geodesics in \mathrm{SDiff}(M), the infinite dimensional manifold of volume preserving diffeomorphisms. Last time I promised to talk about the role of pressure in Euler’s equation. I also mentioned that Arnold used this geometric setup to put a bound on how good weather forecasts can be. I will try to fulfill both promises in the next blog post!

But last time I also mentioned that the ideal fluid has serious drawbacks as a model. This is an important topic, too, so I would like to explain this a little bit further first, in this blog post.

So, this time we will talk a little bit about how we can get viscosity, and therefore turbulence, back into the picture. This will lead us to the Navier–Stokes equation. Can ideal fluids, which solve Euler’s equation, model fluids with a very small viscosity? This depends on what happens to solutions when one lets the viscosity go to zero in the Navier–Stokes equation, so I will show you a result that answers this question in a specific context.

I’ll also throw in a few graphics that illustrate the transition from laminar flow to turbulent flow at boundaries, starting with the one above. These are all from:

• Milton Van Dyke, An Album of Fluid Motion, Parabolic Press, 12th edition, 1982.

flow around a circular cylinder, laminar to turbulent

Re-introducing viscosity: The Navier-Stokes equation

The motion of an incompressible, homogeneous, ideal fluid is described by Euler’s equation:

\displaystyle{ \partial_t u + u \cdot \nabla u = - \nabla p }

Ideal fluids are very nice mathematically. Nicest of all are potential flows, where the velocity vector field is the gradient of a potential. In two dimensions can be studied using complex analysis! One could say that a whole ‘industry’ evolved around the treatment of these kinds of fluid flows. It was even taught to some extend to engineers, before computers took over. Here’s a very nice, somewhat nostalgic book to read about that:

• L. M. Milne-Thomson, Theoretical Aerodynamics, 4th edition, Dover Publications, New York, 2011. (Reprint of the 1958 edition.)

The assumption of ‘incompressibility’ is not restrictive for most applications involving fluid flows of water and air, for example. Maybe you are a little bit surprised that I mention air, because the compressibility of air is a part of every day life, for example when you pump up a cycle tire. It is, however, not necessary to include this property when you model air flowing at velocities that are significantly lower than the speed of sound in air. The rule of thumb for engineers seems to be that one needs to include compressibility for speeds around Mach 0.3 or more:

Compressible aerodynamics, Wikipedia.

However, the concept of an ‘ideal’ fluid takes viscosity out of the picture—and therefore also turbulence, and the drag that a body immersed in fluid feels. As I mentioned last time, this is called the D’Alembert’s paradox.

The simplest way to introduce viscosity is by considering a Newtonian fluid. This is a fluid where the viscosity is a constant, and the relation of velocity differences and resulting shear forces is strictly linear. This leads to the the Navier–Stokes equation for incompressible fluids:

\displaystyle{ \partial_t u + u \cdot \nabla u - \nu \Delta u = - \nabla p }

If you think about molten plastic, or honey, you will notice that the viscosity actually depends on the temperature, and maybe also on the pressure and other parameters, of the fluid. The science that is concerned with the exploration of these effects is called rheology. This is an important research topic and the reason why producers of, say, plastic sheets, sometimes keep physicists around. But let’s stick to Newtownian fluids for now.

curved wall, lamiar to turbulent

Sending Viscosity to Zero: Boundary Layers

Since we get Euler’s equation if we set \nu = 0 in the above equation, the question is, if in some sense or another solutions of the Navier-Stokes equation converge to a solution of Euler’s equation in the limit of vanishing viscosity? If you had asked me, I would have guessed: No. The mathematical reason is that we have a transition from a second order partial differential equation to a first order one. This is usually called a singular perturbation. The physical reason is that a nonzero viscosity will give rise to phenomena like turbulence and energy dissipation that cannot occur in an ideal fluid. Well, the last argument shows that we cannot expect convergence for long times if eddies are present, so there certainly is a loophole.

The precise formulation of the last statement depends on the boundary conditions one chooses. One way is this: Let u be a smooth solution of Euler’s equation in \mathbb{R}^3 with sufficiently fast decay at infinity (this is our boundary condition), then the kinetic energy E

\displaystyle{ E =  \frac{1}{2} \int \| u \|^2 \; \mathrm{d x} }

is conserved for all time. This is not the only conserved quantity for Euler’s equation, but it’s a very important one.

But now, suppose u is a smooth solution of the Navier–Stokes equation in \mathbb{R}^3 with sufficiently fast decay at infinity. In this case we have

\displaystyle{ \frac{d E}{d t} =  - \nu \int \| \nabla \times u \|^2 \mathrm{d x} }

So, the presence of viscosity turns a conserved quantity into a decaying quantity! Since the 20th century, engineers have taken these effects into account following the idea of ‘boundary layers’ introduced by Ludwig Prandtl, as I already mentioned last time. Actually the whole technique of singular perturbation theory has been developed following this ansatz. This has become a mathematical technique to get asymptotic expansions of solutions of complicated nonlinear partial differential equations.

The idea is that the concept of ‘ideal’ fluid is good except at boundaries, where effects due to viscosity cannot be ignored. This is true for a lot of fluids like air and water, which have a very low viscosity. Therefore one tries to match a solution describing an ideal fluid flow far away from the boundaries with a specific solution for a viscous fluid with prescribed boundary conditions valid in a thin layer on the boundaries. This works quite well in applications. One of the major textbooks about this topic has been around for over 60 years now and has reached its 10th edition in German. It is:

• H. Schlichting and K. Gersten: Boundary-Layer Theory, 8th edition, Springer, Berlin, 2000.

Since I am also interested in numerical models and applications in engineering, I should probably read it. (I don’t know when the 10th edition will be published in English.)

flow on a sphere, lamiar to turbulent

Sending Viscosity to Zero: Convergence Results

However, this approach does not tell us under what circumstances we can expect convergence of solutions u_{\nu} to the viscous Navier–Stokes equations with viscosity \nu > 0 to a solution u_{0} of Euler’s equation with zero viscosity. That is, are there such solutions, boundary and initial conditions and a topology on an appropriate topological vector space, such that

\lim_{\nu \to 0} u_{\nu} = u_{0} \; \text{?}

I asked this question over on Mathoverflow and got some interesting answers. Obviously, a lot of brain power has gone into this question, and there are both interesting positive and negative results. As an example, let me describe a very interesting positive result. I learned about it from this book:

• Andrew J. Majda and Andrea L. Bertozzi, Vorticity and Incompressible Flow, Cambridge University Press, Cambridge, 2001.

It’s Proposition 3.2 in this book. There are three assumptions that we need to make in order for things to work out:

• First, we need to fix an interval [0, T] for the time. As mentioned above, we should not expect that we can get convergence for an unbounded time interval like [0, \infty].

• Secondly, we need to assume that solutions u_{\nu} of the Navier–Stokes equation and a solution u_0 of Euler’s equation exist and are smooth.

• Thirdly we will dodge the issue of boundary layers by assuming that the solutions exist on the whole of \mathbb{R}^3 with sufficiently fast decay. As I already mentioned above, a viscous fluid will of course show very different behavior at a boundary than an ideal (that is, nonviscous) one. Our third assumption means that there is no such boundary layer present.

We will denote the L^{\infty} norm on vector fields by \| \cdot \| and use the big O notation.

Given our three assumptions, Proposition 3.2 says that:

\displaystyle{ \mathrm{sup}_{0 \le t \le T} \; \| u_{\nu} - u_0 \| = O(\nu) }

It also gives the convergence of the derivatives:

\displaystyle{ \int_0^T \| \nabla (u_{\nu} - u_0) \| \; d t =  O(\nu^{1/2}) }

This result illustrates that the boundary layer ansatz may work, because the ideal fluid approximation could be a good one away from boundaries and for fluids with low viscosity like water and air.

flow around a symmetric airfoil, laminar to turbulent

So, when John von Neumann joked that “ideal water” is like “dry water”, I would have said: “Well, that is half right”.


Symmetry and the Fourth Dimension (Part 2)

27 May, 2012

The Coxeter group of the cube

Coxeter groups are a huge amount of fun. Normally their delights are reserved for people who have already studied group theory. But I don’t think that’s fair. You don’t need to know music theory to enjoy a piece by Bach. And you don’t need to know group theory to enjoy Coxeter groups.

In fact, it’s probably better to learn theories after falling in love with some examples. Imagine a world where you had to learn music theory before listening to music. A world where everyone studied music theory in elementary school, high school and college, but only people who majored in music were allowed to listen to the stuff. In that world, people might say they hate music… just as in this world they say they hate math.

So, here goes:

Last time I showed you that any Platonic solid has a bunch of symmetries where we reflect it across planes. These planes, called mirrors, all intersect at the center of the solid. If we take a sphere and slice it with these mirrors, it gets chopped up into triangles, and we get a pattern called a Coxeter complex.

If we start with the cube, here’s the Coxeter complex we get:

For artistic reasons, half the triangles are colored blue and half are colored black. But it’s not just pretty: there’s also math here. If we take any black triangle and reflect it across any mirror, we get a blue triangle… and vice versa.

Instead of taking a sphere and slicing it with mirrors, we can start with the cube itself. Here’s what we get:

It’s not quite as pretty (especially because I drew it), but it makes certain games easier to play. These games involve picking one triangle and calling it our favorite. It doesn’t matter which. But we have to pick one… so how about this:

Each different symmetry of the cube sends this triangle to a different triangle. This instantly lets us count the symmetries: there are 48, since each of the cube’s 6 faces has been chopped into 8 triangles.

But even better, we get a vivid picture of the symmetries of the cube! Let’s see how this works.

Any triangle in the Coxeter complex has three corners:

• one corner is a vertex of the cube,

• one corner is the center of an edge of the cube,

• one corner is the center of a face of the cube.

Here’s how it works for our favorite triangle:

Now the real fun starts. We can move from any triangle to a neighboring one in three ways:

1) We can change which vertex of the cube our triangle contains. Starting from our favorite triangle, we get this blue triangle:

Note: the blue triangle touches the same edge of the cube as the black one. It also lies on the same face. Only the vertex has changed!

What have we actually done here? We’ve reflected our triangle across a mirror in a way that changes which vertex of the cube it contains. Let’s call this way of flipping a triangle V.

2) We can change which edge of the cube our triangle touches. Starting from our favorite triangle, we get this yellow triangle:

Note: the yellow triangle contains the same vertex of the cube. It also lies on the same face. Only the edge has changed!

What have we actually done here? We’ve reflected our triangle across a mirror in a way that changes which edge of the cube it touches. Let’s call this way of flipping a triangle E.

3) We can change which face of the cube our triangle lies on. Starting from our favorite triangle, we get this green triangle:

Note: the green triangle contains the same vertex of the cube. It also touches the same edge. Only the face has changed.

What have we actually done here? If you can’t guess, you must be asleep: we’ve reflected our triangle across a mirror in a way that changes which face of the cube it lies on! Let’s call this way of flipping a triangle F.

By repeating these three operations—changing the vertex, edge or face—we can get to any triangle starting from our favorite one. We can even use this trick to label all the triangles. Let’s call our favorite triangle 1:

Let’s call its neighbors F, E and V, since we use those three reflections to get to these new triangles:

Starting with these, we can get more triangles by changing the vertex, edge or face:

See what I’m doing? We get the triangle VE by starting with the triangle V and then changing which edge of the cube it contains. We get EF by starting with E and then changing which face it lies on. And so on.

However, there’s a ‘problem’. See the triangle VF? We got there from the triangle V by changing which face it lies on. But we could also get there another way! We could start at F and then change which vertex this triangle contains. So we could equally well call this triangle FV.

Luckily, in math nothing is really a problem once you understand it. This is why math is more fun than real life: merely understanding a problem makes it go away. We’ll just say that

VF = FV

So, we can use either label for this triangle: it doesn’t matter.

More deeply, if you start with any triangle, change the vertex it contains and then change the face it lies on, you get the same triangle as if you first change the face and then the vertex. That’s what the equation VF = FV really means. It’s a fact of geometry: a general fact about Platonic solids.

Let’s go a bit further:

I’m using the same rules; check to make sure I did everything right! There’s another little ‘problem’, though: see the triangle labelled FEF? We got there from FE by changing which face of the cube our triangle lies on. But we could also get there starting from EF by changing the edge. So really we have

FEF = EFE

But this is not a general fact about Platonic solids: it shows up because in the cube, three faces and three edges meet at each vertex. That’s why the equation has three F’s and three E’s.

We can go on even further, but you can already see where the next problem will show up. See that unlabelled triangle in the front face of the cube? At the next stage we’ll want to label it VEVE, but we’ll also want to label it EVEV. So:

VEVE = EVEV

Again, this is not a general fact about Platonic solids! It shows up because the cube has square faces, so four vertices and four edges touch each face. That’s why the equation has four V’s and four E’s.

We almost have enough equations to avoid all future problems. But there are a few more that are so obvious you may have overlooked them. Suppose we change which vertex of the cube our triangle contains, and then do this again. We get back where we started! For example, first we go from the black one to the blue one:

and then we go back to black.

So, we have

VV = 1

This says switching vertices twice gets us back where we started. Similarly, we have

EE = 1

and

FF = 1

And now, although I haven’t proved it to you, we have a complete set of equations to give each triangle an unambiguous name… or more precisely, an unambiguous element of the ‘Coxeter group’ of the cube. Two different expressions, like EFE and FEF, give the same element of the Coxeter group if we can get from one to the other using our equations. For example, in the Coxeter group we have

FEFVVEFE = FEFEFE = FEFFEF = FEEF = FF = 1

Coxeter groups of Platonic solids

We can do this stuff for other Platonic solids, too. The Coxeter group of the octahedron is secretly the same as that of the cube, since they’re dual. The only difference is that the names F and V get switched, because faces of the cube correspond to vertices of the octahedron, and vice versa! Similarly for the icosahedron and dodecahedron. So, I mainly have two puzzles for you:

Puzzle 1: Find equations defining the Coxeter group of the tetrahedron.

Puzzle 2: Find equations defining the Coxeter group of the dodecahedron.

If these seem hard, let’s reflect a bit more on what we did for the cube. For the cube we have

VF = FV

because of this picture:

Similarly, we have

EFE = FEF

because of this picture:

And finally, we have

VEVE = EVEV

because of this picture:

This should make it easy to solve the puzzles. We can also phrase the solutions in a different way:

Puzzle 3: Show that that Coxeter groups of the tetrahedron, cube and dodecahedron can be completely described by the equations

V2 = E2 = F2 = 1

and

(VE)a = (VF)b = (EF)c = 1

for some integers a, b, and c.

The story doesn’t stop here—far from it! Later we’ll meet Coxeter groups for the higher-dimensional analogues of Platonic solids, which are called regular polytopes. And we’ll use them to classify so-called uniform polytopes obtained by chopping vertices, edges, faces and so on off the regular ones. For example, the cuboctahedron:

can be gotten either by chopping the corners off a cube, or chopping the corners of an octahedron! We can classify such shapes using Coxeter diagrams, which are based on Coxeter groups. So, there’s no shortage of fun stuff to do… in fact, there’s way too much!

Actually, that’s the main problem with mathematics, once you start actually doing it. There’s just too much fun stuff.


Symmetry and the Fourth Dimension (Part 1)

21 May, 2012

Coxeter complexes

Though I’m shifting toward applied math, I find myself unable to quit explaining pure math to people—stuff that’s fun purely for its own sake. So, I’ve been posting about symmetry and the fourth dimension over on Google+. Now I want to take those posts, polish them up a bit, and combine them into blog articles.

The idea is to start with something very familiar and then take it a little further than most people have seen… without getting so technical that only people with PhDs understand what’s going on. I’m more interested in communicating with ordinary folks than in wowing the experts.

So, I’ll assume you know and love the five Platonic solids:

The tetrahedron, with 4 triangular faces, 6 edges and 4 vertices:

The cube, with 6 square faces, 12 edges and 8 vertices:


The octahedron, with 8 triangular faces, 12 edges and 6 vertices:

The dodecahedron, with 12 pentagonal faces, 30 edges and 20 vertices:

The icosahedron, with 20 triangular faces, 30 edges and 12 vertices:

Starting from these, we’ll build the six Platonic solids that exist in 4 dimensions, and the various fancier shapes we can get from these by cutting off corners, edges and so on.

Luckily, a lot of heroic mathematicians and programmers have made pictures of these shapes freely available online. For example, the rotating Platonic solids above were made by Tom Ruen, who put them on Wiki Commons. It wouldn’t be so bad if all I did is show you lots of these pictures and explain them. But there are some underlying themes that make the story deeper, so I thought I’d reveal those now. As the series marches on, I’ll try to make it easy to ignore these themes or pay attention to them, depending on what you want.

One theme is the quaternions. This is a number system introduced by the famous mathematician William Rowan Hamilton back in 1843. A typical quaternion looks like this:

a + b i + c j + dk

where a,b,c,d are ordinary real numbers and i, j, k are square roots of -1 that ‘anticommute’:

i^2 = j^2 = k^2 = - 1

ij = -ji = k

jk = -kj = i

ki = -ik = j

As their name indicates, the quaternions are a 4-dimensional number system. We can use them to relate rotations in 3 dimensions to rotations in 4 dimensions… and this establishes links between 3d Platonic solids and 4d Platonic solids: special links that just don’t exist in higher dimensions.

For example, the dodecahedron has 60 rotational symmetries, and this fact gives a 4d Platonic solid—or as mathematicians say, a 4d regular polytope—with 120 dodecahedral faces. Getting to understand this in detail will be one of the high points of this series: it’s a really wonderful story!

Another theme is 5-fold symmetry. In 2 dimensions there’s an obvious polygon with 5-fold symmetry: the regular pentagon. In 3 dimensions we have a Platonic solid with pentagonal faces: the regular dodecahedron. In 4 dimensions, as I just mentioned, there’s a regular polytope with regular dodecahedra as faces. But then this pattern ends. There are no higher-dimensional regular polytopes with pentagons in them! Only squares and triangles show up.

But the biggest unifying theme is ‘finite reflection groups’. These show up as symmetry groups of Platonic solids and 4d regular polytopes. Technically, a finite reflection group is a finite group of transformations of n-dimensional Euclidean space that’s generated by reflections. Some examples in 3 dimensions will illustrate the idea: it’s not as scary as it might sound.

Take a regular dodecahedron, for example:


This has lots of rotations and reflections as symmetries—but a finite number of them. Each reflection corresponds to a mirror: a plane through the center of the dodecahedron. The reflection switches points on opposite sides of this mirror. We can get every symmetry by doing a bunch of these reflections, one after another: that’s what we mean by saying a group of symmetries is ‘generated by reflections’. So, the symmetry group of a dodecahedron is a finite reflection group.

But the fun starts when we take a sphere centered at the center of the dodecahedron, and slice it with all these mirrors. We get a picture like this, called a Coxeter complex:

The great circles in this picture are where the mirrors intersect the sphere.

You’ll notice there are 120 triangles in this picture: each of the 12 pentagonal faces of the dodecahedron has been subdivided into 10 right triangles. You should be able to see that if we pick one of these triangles, there’s a symmetry carrying it to any other. So, the symmetry group of the dodecahedron has 120 elements!

By the way: earlier I mentioned the 60 rotational symmetries of the dodecahedron; now I’m talking about 120 symmetries including rotations and reflections. There’s no contradiction. If we start by picking a black triangle, a rotation will take it to another black triangle, while a reflection will take it to be a blue one. There are 60 of each, for a total of 120.

We can play the same game starting with any other Platonic solid. If we start with the icosahedron, nothing really new happens. It has the same symmetry group, so we get the same Coxeter complex. Indeed, if you look carefully here:

you can see a bunch of equilateral triangles, each containing 6 right triangles. There are 20 of these equilateral triangles, and they’re the faces of an icosahedron:


The corners of the icosahedron are located at the centers of the faces of a dodecahedron, and vice versa. So we say these Platonic solids are dual to each other. Dual polyhedra, or more generally dual polytopes, have the same symmetry group and the same Coxeter complex.

But we get something different if we start with the cube:

This gives a Coxeter complex with 48 triangles, formed by subidividing each of the 6 square faces of the cube into 8 right triangles:

There’s a symmetry of the cube carrying any of these right triangles to any other, so its symmetry group has 48 elements.

The octahedron has the same symmetry group as the cube, because they’re duals. But we get something different if we start with the regular tetrahedron:

This gives a Coxeter complex with 24 triangles, formed by subidividing each of the 4 triangular faces of the tetrahedron into 6 right triangles:

There’s a symmetry of the tetrahedron carrying any of these right triangles to any other, so its symmetry group has 24 elements.

The tetrahedron is its own dual. So the Platonic solids only give us three finite reflection groups in 3 dimensions. There are also some some infinite sequences of less interesting ones, like the symmetry groups of the hosohedra, which are funny degenerate Platonic solids whose faces have just 2 sides… these faces need to be curved:

But in general, the possibilities are quite restricted. So, finite reflection groups are not only beautiful: they’re a bit rare. This makes them doubly precious. People have written books about them:

• James E. Humphreys, Reflection Groups and Coxeter Groups, Cambridge U. Press, 1992.

And as we’re beginning to see, the Coxeter complex is a vivid picture of the finite reflection group it comes from. We can already see that in 3 dimensions, it has one black triangle for each reflection in the group, and one blue triangle for each rotation. But it contains much more information than this, in neatly visible form… and it works well not just in 3 dimensions, but any dimension.

That’s enough for now: I want to keep these blog articles bite-sized, rather than letting them grow jaw-breakingly big. But if you’re hungry for more right now, try this:

• John Baez, Platonic solids in all dimensions.

I’ll also leave you with this:

Puzzle: How many great circles are there in these Coxeter complexes?

• The Coxeter complex of the tetrahedral finite reflection group, also known to mathematicians as A3:

• The Coxeter complex of the octahedral finite reflection group, also known as B3:

• The Coxeter complex of the icosahedral finite reflection group, also known as H3:

The answers will say how many reflections there are in the finite reflection groups we’ve met today.


Five Books About Our Future

16 May, 2012

Jordan Peacock has suggested interviewing me for Five Books, a website where people talk about five books they’ve read.

It’s probably going against the point of this site to read books especially for the purpose of getting interviewed about them. But I like the idea of talking about books that paint different visions of our future, and the issues we face. And I may need to read some more to carry out this plan.

So: what are you favorite books on this subject?

I’d like to pick books with different visions, preferably focused on the relatively near-term future, and preferably somewhat plausible—though I don’t expect every book to seem convincing to all reasonable people.

Here are some options that leap to mind.

Whole Earth Discipline

• Stewart Brand, Whole Earth Discipline: An Ecopragmatist Manifesto, Viking Penguin, 2009.

I’ve been meaning to write about this one for a long time! Brand argues that changes in this century will be dominated by global warming, urbanization and biotechnology. He advocates new thinking on topics that traditional environmentalists have rather set negative opinions about, like nuclear power, genetic engineering, and the advantages of urban life. This is on my list for sure.

Limits to Growth

• Donnella Meadows, Jørgen Randers, and Dennis Meadows, Limits to Growth: The 30-Year Update, Chelsea Green Publishing Company, 2004.

Sad to say, I’ve never read the original 1972 book The Limits to Growth—or the 1974 edition which among other things presented a simple computer model of world population, industrialization, pollution, food production and resource depletion. Both the book and the model (called World3) have been much criticized over the years. But recently some have argued its projections—which were intended to illustrate ideas, not predict the future—are not doing so badly:

• Graham Turner, A comparison of The Limits to Growth with thirty years of reality, Commonwealth Scientific and Industrial Research Organisation (CSIRO).

It would be interesting to delve into this highly controversial topic. By the way, the model is now available online:

• Brian Hayes, Limits to Growth.

with an engaging explanation here:

• Brian Hayes, World3, the public beta, Bit-Player: An Amateur’s Look at Computation and Mathematics, 15 April 2012.

It runs on your web-browser, and it’s easy to take a copy for yourself and play around with it.

The Ecotechnic Future

John Michael Greer believes that ‘peak oil’—or more precisely, the slow decline of fossil fuel production—will spell the end to our modern technological civilization. He spells this out here:

• John Michael Greer, The Long Descent, New Society Publishers, 2008.

I haven’t read this book, but I’ve read the sequel, which begins to imagine what comes afterwards:

• John Michael Greer, The Ecotechnic Future, New Society Publishers, 2009.

Here he argues that in the next century or three we will go through a transition through ‘scarcity economies’ to ‘salvage economies’ to sustainable economies that use much less energy than we do now.

Both these books seem to outrage everyone who envisages our future as a story of technological progress continuing more or less along the lines we’ve already staked out.

The Singularity is Near

In the opposite direction, we have:

• Ray Kurzweil, The Singularity is Near, Penguin Books, 2005.

I’ve only read bits of this. According to Wikipedia, the main premises of the book are:

• A technological-evolutionary point known as “the singularity” exists as an achievable goal for humanity. (What exactly does Kurzeil mean by the “the singularity”? I think I know what other people, like Vernor Vinge and Eliezer Yudkowsky, mean by it. But what does he mean?)

• Through a law of accelerating returns, technology is progressing toward the singularity at an exponential rate. (What does in the world does it mean to progress toward a singularity at an exponential rate? I know that Kurzweil provides evidence that lots of things are growing exponentially… but if they keep doing that, that’s not what I’d call a singularity.)

• The functionality of the human brain is quantifiable in terms of technology that we can build in the near future.

• Medical advances make it possible for a significant number of Kurzweil’s generation (Baby Boomers) to live long enough for the exponential growth of technology to intersect and surpass the processing of the human brain.

If you think you know a better book that advocates a roughly similar thesis, let me know.

A Prosperous Way Down

• Howard T. Odum and Elisabeth C. Odum, A Prosperous Way Down: Principles and Policies, Columbia University Press, 2001.

Howard T. Odum is the father of ‘systems ecology’, and developed an interesting graphical language for describing energy flows in ecosystems. According to George Mobus:

In this book he and Elisabeth take on the situation regarding social ecology under the conditions of diminishing energy flows. Taking principles from systems ecology involving systems suffering from the decline of energy (e.g. deciduous forests in fall), showing how such systems have adapted or respond to those conditions, they have applied these to the human social system. The Odums argued that if we humans were wise enough to apply these principles through policy decisions to ourselves, we might find similar ways to adapt with much less suffering than is potentially implied by sudden and drastic social collapse.

This seems to be a more scholarly approach to some of the same issues:

• Howard T. Odum, Environment, Power, and Society for the Twenty-First Century: The Hierarchy of Energy, Columbia U. Press, 2007.

More?

There are plenty of other candidates I know less about. These two seem to be free online:

• Lester Brown, World on the Edge: How to Prevent Environmental and Economic Collapse, W. W. Norton & Company, 2011.

• Richard Heinberg, The End of Growth: Adapting to Our New Economic Reality, New Society Publishers, 2009.

I would really like even more choices—especially books by thoughtful people who do think we can solve the problems confronting us… but do not think all problems will automatically be solved by human ingenuity and leave it to the rest of us to work out the, umm, details.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers