Rényi Entropy and Free Energy

10 February, 2011

I want to keep telling you about information geometry… but I got sidetracked into thinking about something slightly different, thanks to some fascinating discussions here at the CQT.

There are a lot of people interested in entropy here, so some of us — Oscar Dahlsten, Mile Gu, Elisabeth Rieper, Wonmin Son and me — decided to start meeting more or less regularly. I call it the Entropy Club. I’m learning a lot of wonderful things, and I hope to tell you about them someday. But for now, here’s a little idea I came up with, triggered by our conversations:

• John Baez, Rényi entropy and free energy.

In 1960, Alfred Rényi defined a generalization of the usual Shannon entropy that depends on a parameter. If p is a probability distribution on a finite set, its Rényi entropy of order \beta is defined to be

\displaystyle{ H_\beta = \frac{1}{1 - \beta} \ln \sum_i p_i^\beta }

where 0 \le \beta < \infty. This looks pretty weird at first, and we need \beta \ne 1 to avoid dividing by zero, but you can show that the Rényi entropy approaches the Shannon entropy as \beta approaches
1:

\lim_{\beta \to 1} H_\beta = -\sum_{i} p_i \ln p_i .

(A fun puzzle, which I leave to you.) So, it’s customary to define H_1 to be the Shannon entropy… and then the Rényi entropy generalizes the Shannon entropy by allowing an adjustable parameter \beta.

But what does it mean?

If you ask people what’s good about the Rényi entropy, they’ll usually say: it’s additive! In other words, when you combine two independent probability distributions into a single one, their Rényi entropies add. And that’s true — but there are other quantities that have the same property. So I wanted a better way to think about Rényi entropy, and here’s what I’ve come up with so far.

Any probability distribution can be seen as the state of thermal equilibrium for some Hamiltonian at some fixed temperature, say T = 1. And that Hamiltonian is unique. Starting with that Hamiltonian, we can then compute the free energy F at any temperature T, and up to a certain factor this free energy turns out to be the Rényi entropy H_\beta, where \beta = 1/T. More precisely:

F = (1 - T) H_\beta.

So, up to the fudge factor 1 - T, Rényi entropy is the same as free energy. It seems like a good thing to know — but I haven't seen anyone say it anywhere! Have you?

Let me show you why it’s true — the proof is pathetically simple. We start with our probability distribution p_i. We can always write

p_i = e^{- E_i}

for some real numbers E_i. Let’s think of these numbers E_i as energies. Then the state of thermal equilibrium, also known as the canonical ensemble or Gibbs state at inverse temperature \beta is the probability distribution

\frac{e^{- \beta E_i}}{Z}

where Z is the partition function:

Z = \sum_i e^{-\beta E_i}

Since Z = 1 when \beta = 1, the Gibbs state reduces to our original probability distribution at \beta = 1.

Now in thermodynamics, the quantity

F = - \frac{1}{\beta} \ln Z

is called the free energy. It’s important, because it equals the total expected energy of our system, minus the energy in the form of heat. Roughly speaking, it’s the energy that you can use.

Let’s see how the Rényi entropy is related to the free energy. The proof is a trivial calculation:

- \beta F = \ln Z = \ln \sum_{i \in X} e^{-\beta E_i} = \ln \sum_{i \in X} p_i^\beta = (1 - \beta) H_\beta

so

H_\beta = -  \frac{\beta}{1 - \beta} F

at least for \beta \ne 1. But you can also check that both sides of this equation have well-defined limits as \beta \to 1.

The relation between free energy and Rényi entropy looks even neater if we solve for F and write the answer using T instead of \beta = 1/T:

F = (1 - T)H_\beta

So, what’s this fact good for? I’m not sure yet! In my paper, I combine it with this equation:

F = \langle E \rangle - T S

Here \langle E \rangle is the expected energy in the Gibbs state at temperature T:

\langle E \rangle = \frac{1}{Z} \sum_i E_i \, e^{-\beta E_i}

while S is the usual Shannon entropy of this Gibbs state. I also show that all this stuff works quantum-mechanically as well as classically. But so far, it seems the main benefit is that Rényi entropy has become a lot less mysterious. It’s not a mutant version of Shannon entropy: it’s just a familiar friend in disguise.


Carbon Dioxide Puzzles

4 February, 2011

I like it when people do interesting calculations and help me put their results on this blog. Renato Iturriaga has plotted a graph that raises some interesting questions about carbon dioxide in the Earth’s atmosphere. Maybe you can help us out!

The atmospheric CO2 concentration, as measured at Mauna Loa in Hawaii, looks like it’s rising quite smoothly apart from seasonal variations:



However, if you take the annual averages from here:

• NOAA Earth System Laboratory, Global Monitoring Division, Recent Mauna Loa CO2.

and plot how much the average rises each year, the graph is pretty bumpy. You’ll see what I mean in a minute.

In comparison, if you plot the carbon dioxide emissions produced by burning fossil fuels, you get a rather smooth curve, at least according to these numbers:

• U. S. Energy Information Administration Total carbon dioxide emissions from the consumption of energy, 1980-2008.

Renato decided to plot both of these curves and their difference. Here’s his result:



The blue curve shows how much CO2 we put into the atmosphere each year by burning fossil fuels, measured in parts per million.

The red curve shows the observed increase in atmospheric CO2.

The green curve is the difference.

The puzzle is to explain this graph. Why is the red curve roughly 40% lower than the blue one? Why is the red curve so jagged?

Of course, a lot of research has already been done on these issues. There are a lot of subtleties! So if you like, think of our puzzle as an invitation to read the existing literature and tell us how well it does at explaining this graph. You might start here, and then read the references, and then keep digging.

But first, let me explain exactly how Renato Iturriaga created this graph! If he’s making a mistake, maybe you can catch it.

The red curve is straightforward: he took the annual mean growth rate of CO2 from the NOAA website I mentioned above, and graphed it. Let me do a spot check to see if he did it correctly. I see a big spike in the red curve around 1998: it looks like the CO2 went up around 2.75 ppm that year. But then the next year it seems to have gone up just about 1 ppm. On the website it says 2.97 ppm for 1998, and 0.91 for 1999. So that looks roughly right, though I’m not completely happy about 1998.

[Note added later: as you’ll see below, he actually got his data from here; this explains the small discrepancy.]

Renato got the blue curve by taking the US Energy Information Administration numbers and converting them from gigatons of CO2 to parts per million moles. He assumed that that the atmosphere weighs 5 × 1015 tons and that CO2 gets well mixed with the whole atmosphere each year. Given this, we can simply say that one gigaton is 0.2 parts per million of the atmosphere’s mass.

But people usually measure CO2 in parts per million volume. Now, a mole is just a certain large number of molecules. Furthermore, the volume of a gas at fixed pressure is almost exactly proportional to the number of molecules, regardless of its composition. So parts per million volume is essentially the same as parts per million moles.

So we just need to do a little conversion. Remember:

• The molecular mass of N2 is 28, and about 79% of the atmosphere’s volume is nitrogen.

• The molecular mass of O2 is 32, and about 21% of the atmosphere’s volume is oxygen.

• By comparison, there’s very little of the other gases.

So, the average molecular mass of air is

28 × .79 + 32 × .21 = 28.84

On the other hand, the molecular mass of CO2 is 44. So one ppm mass of CO2 is less than one ppm volume: it’s just

28.84/44 = 0.655

parts per million volume. So, a gigaton of CO2 is about 0.2 ppm mass, but only about

0.2 × 0.655 = 0.13

parts per million volume (or moles).

So to get the blue curve, Renato took gigatons of CO2 and multiplied by 0.13 to get ppm volume. Let me do another spot check! The blue curve reaches about 4 ppm in 2008. Dividing 4 by 0.13 we get about 30, and that’s good, because energy consumption put about 30 gigatons of CO2 into the atmosphere in 2008.

And then, of course, the green curve is the blue one minus the red one:



Now, more about the puzzles.

One puzzle is why the red curve is so much lower than the blue one. The atmospheric CO2 concentration is only going up by about 60% of the CO2 emitted, on average — though the fluctuations are huge. So, you might ask, where’s the rest of the CO2 going?

Probably into the ocean, plants, and soil:



But at first glance, the fact that only 60% stays in the atmosphere seems to contract this famous graph:



This shows it taking many years for a dose of CO2 added to the atmosphere to decrease to 60% of its original level!

Is the famous graph wrong? There are other possible explanations!

Here’s a non-explanation. Humans are putting CO2 into the atmosphere in other ways besides burning fossil fuels. For example, deforestation and other changes in land use put somewhere between 0.5 and 2.7 gigatons of carbon into the atmosphere each year. There’s a lot of uncertainty here. But this doesn’t help solve our puzzle: it means there’s more carbon to account for.

Here’s a possible explanation. Maybe my estimate of 5 × 1015 tons for the mass of the atmosphere is too high! That would change everything. I got my estimate off the internet somewhere — does anyone know a really accurate figure?

Renato came up with a more interesting possible explanation. It’s very important, and very well-known, that CO2 doesn’t leave the atmosphere in a simple exponential decay process. Imagine for simplicity that carbon stays in three boxes:

• Box A: the atmosphere.

• Box B: places that exchange carbon with the atmosphere quite rapidly.

• Box C: places that exchange carbon with the atmosphere and box B quite slowly.

As we pump CO2 into box A, a lot of it quickly flows into box B. It then slowly flows from boxes A and B into box C.

The quick flow from box A to box B accounts for the large amounts of ‘missing’ CO2 in Renato’s graph. But if we stop putting CO2 into box A, it will soon come into equilibrium with box B. At that point, we will not see the CO2 level continue to quickly drop. Instead, CO2 will continue to slowly flow from boxes A and B into box C. So, it can take many years for the atmospheric CO2 concentration to drop to 60% of its original level — as the famous graph suggests.

This makes sense to me. It shows that the red curve can be a lot lower than the blue one even if the famous graph is right.

But I’m still puzzled by the dramatic fluctuations in the red curve! That’s the other puzzle.


Azimuth News (Part 1)

24 January, 2011

The world seems to be heading for tough times. From a recent New York Times article:

Over the next 100 years, many scientists predict, 20 percent to 30 percent of species could be lost if the temperature rises 3.6 degrees to 5.4 degrees Fahrenheit. If the most extreme warming predictions are realized, the loss could be over 50 percent, according to the United Nations climate change panel.

But when the going gets tough, the tough get going! The idea of the Azimuth Project is create a place where scientists and engineers can meet and work together to help save the planet from global warming and other environmental threats. The first step was to develop a procedure for collecting reliable information and explaining it clearly. That means: not just a wiki, but a wiki with good procedures and a discussion forum to help us criticize and correct the articles.

Thanks to the technical wizardry of Andrew Stacey, and a lot of sage advice and help from Eric Forgy, the wiki and forum officially opened their doors about four months ago.

That seems like ages ago. For months a small band of us worked hard to get things started. With the beginning of the new year, we seem to be entering a phase transition: we’re getting a lot of new members. So, it’s time to give you an update!

There’s a lot going on now. If you’ve been reading this blogs and clicking some of the links, you’ve probably seen some of our pages on sea level rise, coral reefs, El Niño, biochar, photovoltaic solar power, peak oil, energy return on energy invested, and dozens of other topics. If you haven’t, check them out!

But that’s just the start of it. If you haven’t been reading the Azimuth Forum, you probably don’t know most of what’s going on. Let me tell you what we’re doing.

I’ll also tell you some things you can do to help.

Azimuth Project Pages

By far the easiest thing is to go to any Azimuth Project page, think of some information or reference that it’s missing, and add it! Go to the home page, click on a category, find an interesting article in that category and give it a try. Or, if you want to start a new page, do that. We desperately need more help from people in the life sciences, to build up our collection of pages on biodiversity.

If you need help, start here:

How to get started.

Plans of Action

We’re working through various plans for dealing with peak oil, global warming, and various environmental problems. You can see our progress here:

Plans of action, Azimuth Project.

So far it goes like this. First we write summaries of these plans. Then I blog about them. Then Frederik De Roo is distilling your criticisms and comments and adding them to the Azimuth Project. The idea is to build up a thorough comparison of many different plans.

We’re the furthest along when it comes to Pacala and Socolow’s plan:

Stabilization wedges, Azimuth Project.

You don’t need to be an expert on any particular discipline to help here! You just need to be able to read plans of action and write crisp precise summaries, as above. We also need help finding the most important plans of action.

In addition to plans of action, we’re also summarizing various ‘reports’. The idea is that a report presents facts, while a plan of action advocates a course of action. See:

Reports, Azimuth Project.

In practice the borderline between plans of action and reports is a bit fuzzy, but that’s okay.

Plan C

Analyzing plans of action is just the first step in a more ambitious project: we’d like to start formulating our own plans. Our nickname for this project is Plan C.

Why Plan C? Many other plans, like Lester Brown’s Plan B, are too optimistic. They assume that most people will change their behavior in dramatic ways before problems become very serious. We want a plan that works with actual humans.

In other words: while optimism is a crucial part of any successful endeavor, we also need plans that assume plausibly suboptimal behavior on the part of the human race. It would be best if we did everything right in the first place. It would be second best to catch problems before they get very bad — that’s the idea of Plan B. But realistically, we’ll be lucky if we do the third best thing: muddle through when things get bad.

Azimuth Code Project

Some people on the Amazon Project, most notably Tim van Beek, are writing software that illustrates ideas from climate physics and quantitative ecology. Full-fledged climate models are big and tough to develop; it’s a lot easier to start with simple models, which are good for educational purposes. I’m starting to use these in This Week’s Finds.

If you have a background in programming, we need your help! We have people writing programs in R and Sage… but Tim is writing code in Java for a systematic effort he calls the Azimuth Code Project. The idea is that over time, the results will become a repository of open-source modelling software. As a side effect, he’ll try to show that clean, simple, open-source, well-managed and up-to-date code handling is possible at a low cost — and he’ll explain how it can be done.

So far most of our software is connected to stochastic differential equations:

• Software for investigating the Hopf bifurcation and its stochastic version: see week308 of This Week’s Finds.

• Software for studying predator-prey models, including stochastic versions: see the page on quantitative ecology. Ultimately it would be nice to have some software to simulate quite general stochastic Petri nets.

• Software for studying stochastic resonance: see the page on stochastic resonance. We need a lot more on this, leading up to software that takes publicly available data on Milankovitch cycles — cyclic changes in the Earth’s orbit — and uses it to make predictions of the glacial cycles. It’s not clear how good these predictions will be — the graphs I’ve seen so far don’t look terribly convincing — but the Milankovitch cycle theory of the ice ages is pretty popular, so it’ll be fun to see.

• We would like a program that simulates the delayed action oscillator, which is an interesting simple model for the El Niño / Southern Oscillation.

Graham Jones has proposed some more challenging projects:

• An open source version of FESA, the Future Energy Scenario Assessment. FESA, put out by Orion Innovations, is proprietary software that models energy systems scenarios, including meteorological data, economic analysis and technology performance.

• An automated species-identification system. See the article Time to automate identification in the journal Nature. The authors say that taxonomists should work with specialists in pattern recognition, machine learning and artificial intelligence to increase accuracy and reduce drudgery.

David Tweed, who is writing a lot of our pages on the economics of energy, has suggested some others:

• Modeling advanced strategies for an electrical smart grid.

• Modeling smartphone or website based car- or ride-sharing schemes.

• Modeling supply routing systems for supermarkets that attempt to reduce their ecological footprint.

All these more challenging projects will only take off if we find some energetic people and get access to good data.

This Week’s Finds

I’m interviewing people for This Week’s Finds: especially scientists who have switched from physics to environmental issues, and people with big ideas about how to save the planet. The goal here is to attract people, especially students, into working on these subjects.

Here’s my progress so far:

Nathan Urban — climate change. Done.

Tim Palmer — weather prediction. Done.

Eliezer Yudkowsky — friendly AI. Interviewed.

Thomas Fischbacher — sustainability. Interviewed.

Gregory Benford — geoengineering. Underway.

David Ellerman – helping people, economics. Underway.

Eric Drexler — nanotechnology. Agreed to do it.

Chris Lee — bioinformatics. Agreed to do it.

If you’re a scientist or engineer doing interesting things on the topics we’re interested in at the Azimuth Project, and you’d like me to interview you, let me know! Of course, your ego should be tough enough to handle it if I say no.

Alternatively: if you know somebody like this, and you’re good at interviewing people, this is another place you might help. You could either send them to me, or interview them yourself! I’m already trying to subcontract out one interview to a mathematician friend.

Blog articles

While I’ve been writing most of the articles on this blog so far, I don’t want it to stay that way. If you want to write articles, let me know! I might or might not agree… but if you read this blog, you know what I like, so you can guess ahead of time whether I’ll like your article or not.

In fact, the next two articles here will be written by Curtis Faith, a new member of the Azimuth Forum.

More

There’s also a lot more you can do. For suggestions, try:

Things to do, Azimuth Project.

Open projects, Azimuth Project.


Information Geometry (Part 6)

21 January, 2011

So far, my thread on information geometry hasn’t said much about information. It’s time to remedy that.

I’ve been telling you about the Fisher information metric. In statistics this is nice a way to define a ‘distance’ between two probability distributions. But it also has a quantum version.

So far I’ve showed you how to define the Fisher information metric in three equivalent ways. I also showed that in the quantum case, the Fisher information metric is the real part of a complex-valued thing. The imaginary part is related to the uncertainty principle.

You can see it all here:

Part 1     • Part 2     • Part 3     • Part 4     • Part 5

But there’s yet another way to define the Fisher information metric, which really involves information.

To explain this, I need to start with the idea of ‘information gain’, or ‘relative entropy’. And it looks like I should do a whole post on this.

So:

Suppose that \Omega is a measure space — that is, a space you can do integrals over. By a probability distribution on \Omega, I’ll mean a nonnegative function

p : \Omega \to \mathbb{R}

whose integral is 1. Here d \omega is my name for the measure on \Omega. Physicists might call \Omega the ‘phase space’ of some classical system, but probability theorists might call it a space of ‘events’. Today I’ll use the probability theorist’s language. The idea here is that

\int_A \; p(\omega) \; d \omega

gives the probability that when an event happens, it’ll be one in the subset A \subseteq \Omega. That’s why we want

p \ge 0

Probabilities are supposed to be nonnegative. And that’s also why we want

\int_\Omega \; p(\omega) \; d \omega = 1

This says that the probability of some event happening is 1.

Now, suppose we have two probability distributions on \Omega, say p and q. The information gain as we go from q to p is

S(p,q) = \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

We also call this the entropy of p relative to q. It says how much information you learn if you discover that the probability distribution of an event is p, if before you had thought it was q.

I like relative entropy because it’s related to the Bayesian interpretation of probability. The idea here is that you can’t really ‘observe’ probabilities as frequencies of events, except in some unattainable limit where you repeat an experiment over and over infinitely many times. Instead, you start with some hypothesis about how likely things are: a probability distribution called the prior. Then you update this using Bayes’ rule when you gain new information. The updated probability distribution — your new improved hypothesis — is called the posterior.

And if you don’t do the updating right, you need a swift kick in the posterior!

So, we can think of q as the prior probability distribution, and p as the posterior. Then S(p,q) measures the amount of information that caused you to change your views.

For example, suppose you’re flipping a coin, so your set of events is just

\Omega = \{ \mathrm{heads}, \mathrm{tails} \}

In this case all the integrals are just sums with two terms. Suppose your prior assumption is that the coin is fair. Then

q(\mathrm{heads}) = 1/2, \; q(\mathrm{tails}) = 1/2

But then suppose someone you trust comes up and says “Sorry, that’s a trick coin: it always comes up heads!” So you update our probability distribution and get this posterior:

p(\mathrm{heads}) = 1, \; p(\mathrm{tails}) = 0

How much information have you gained? Or in other words, what’s the relative entropy? It’s this:

S(p,q) = \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega = 1 \cdot \log(\frac{1}{1/2}) + 0 \cdot \log(\frac{0}{1/2}) = 1

Here I’m doing the logarithm in base 2, and you’re supposed to know that in this game 0 \log 0 = 0.

So: you’ve learned one bit of information!

That’s supposed to make perfect sense. On the other hand, the reverse scenario takes a bit more thought.

You start out feeling sure that the coin always lands heads up. Then someone you trust says “No, that’s a perfectly fair coin.” If you work out the amount of information you learned this time, you’ll see it’s infinite.

Why is that?

The reason is that something that you thought was impossible — the coin landing tails up — turned out to be possible. In this game, it counts as infinitely shocking to learn something like that, so the information gain is infinite. If you hadn’t been so darn sure of yourself — if you had just believed that the coin almost always landed heads up — your information gain would be large but finite.

The Bayesian philosophy is built into the concept of information gain, because information gain depends on two things: the prior and the posterior. And that’s just as it should be: you can only say how much you learned if you know what you believed beforehand!

You might say that information gain depends on three things: p, q and the measure d \omega. And you’d be right! Unfortunately, the notation S(p,q) is a bit misleading. Information gain really does depend on just two things, but these things are not p and q: they’re p(\omega) d\omega and q(\omega) d\omega. These are called probability measures, and they’re ultimately more important than the probability distributions p and q.

To see this, take our information gain:

\int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

and juggle it ever so slightly to get this:

\int_\Omega \;  \log(\frac{p(\omega) d\omega}{q(\omega)d \omega}) \; p(\omega) d \omega

Clearly this depends only on p(\omega) d\omega and q(\omega) d\omega. Indeed, it’s good to work directly with these probability measures and give them short names, like

d\mu = p(\omega) d \omega

d\nu = q(\omega) d \omega

Then the formula for information gain looks more slick:

\int_\Omega \; \log(\frac{d\mu}{d\nu}) \; d\mu

And by the way, in case you’re wondering, the d here doesn’t actually mean much: we’re just so brainwashed into wanting a d x in our integrals that people often use d \mu for a measure even though the simpler notation \mu might be more logical. So, the function

\frac{d\mu}{d\nu}

is really just a ratio of probability measures, but people call it a Radon-Nikodym derivative, because it looks like a derivative (and in some important examples it actually is). So, if I were talking to myself, I could have shortened this blog entry immensely by working with directly probability measures, leaving out the d‘s, and saying:

Suppose \mu and \nu are probability measures; then the entropy of \mu relative to \nu, or information gain, is

S(\mu, \nu) =  \int_\Omega \; \log(\frac{\mu}{\nu}) \; \mu

But I’m under the impression that people are actually reading this stuff, and that most of you are happier with functions than measures. So, I decided to start with

S(p,q) =  \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

and then gradually work my way up to the more sophisticated way to think about relative entropy! But having gotten that off my chest, now I’ll revert to the original naive way.

As a warmup for next time, let me pose a question. How much is this quantity

S(p,q) =  \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

like a distance between probability distributions? A distance function, or metric, is supposed to satisfy some axioms. Alas, relative entropy satisfies some of these, but not the most interesting one!

• If you’ve got a metric, the distance between points should always be nonnegative. Indeed, this holds:

S(p,q) \ge 0

So, we never learn a negative amount when we update our prior, at least according to this definition. It’s a fun exercise to prove this inequality, at least if you know some tricks involving inequalities and convex functions — otherwise it might be hard.

• If you’ve got a metric, the distance between two points should only be zero if they’re really the same point. In fact,

S(p,q) = 0

if and only if

p d\omega = q d \omega

It’s possible to have p d\omega = q d \omega even if p \ne q, because d \omega can be zero somewhere. But this is just more evidence that we should really be talking about the probability measure p d \omega instead of the probability distribution p. If we do that, we’re okay so far!

• If you’ve got a metric, the distance from your first point to your second point is the same as the distance from the second to the first. Alas,

S(p,q) \ne S(q,p)

in general. We already saw this in our example of the flipped coin. This is a slight bummer, but I could live with it, since Lawvere has already shown that it’s wise to generalize the concept of metric by dropping this axiom.

• If you’ve got a metric, it obeys the triangle inequality. This is the really interesting axiom, and alas, this too fails. Later we’ll see why.

So, relative entropy does a fairly miserable job of acting like a distance function. People call it a divergence. In fact, they often call it the Kullback-Leibler divergence. I don’t like that, because ‘the Kullback-Leibler divergence’ doesn’t really explain the idea: it sounds more like the title of a bad spy novel. ‘Relative entropy’, on the other hand, makes a lot of sense if you understand entropy. And ‘information gain’ makes sense if you understand information.

Anyway: how can we save this miserable attempt to get a distance function on the space of probability distributions? Simple: take its matrix of second derivatives and use that to define a Riemannian metric g_{ij}. This Riemannian metric in turn defines a metric of the more elementary sort we’ve been discussing today.

And this Riemannian metric is the Fisher information metric I’ve been talking about all along!

More details later, I hope.


Petri Nets

18 January, 2011

I’m trying to build bridges between mathematics and practical subjects like ecology and engineering — subjects that might help us save the planet.

As part of this, I have a project to explain how some ideas from electrical engineering, control theory, systems ecology, systems biology and the like can be formalized — and to some extent unified — using “symmetric monoidal categories”. Whenever you have a setup with:

• abstract gadgets that have ‘input wires’ and ‘output wires’,

and you can

• hook up these gadgets by connecting outputs to inputs,

and

• the wires can cross over each other, and

• it doesn’t matter whether one wire crosses over or under another

then you’ve probably got a symmetric monoidal category! For a precise definition, and lots of examples, try:

• John Baez and Mike Stay, Physics, topology, logic and computation: a Rosetta Stone, in New Structures for Physics, ed. Bob Coecke, Lecture Notes in Physics vol. 813, Springer, Berlin, 2011, pp. 95-174.

Back then I was excited about examples from ‘topological quantum field theory’, and ‘linear logic’, and ‘programming semantics’, and ‘quantum circuit theory’. But now I’m trying to come down to earth and think about examples of a more everyday nature. And they turn out to be everywhere!

For example, when you see these diagrams:

you may see a 10-watt amplifier with bass boost, and 16-volt power supply with power-on indicator light. But I see morphisms in symmetric monoidal categories!

Back in week296 I explained a symmetric monoidal category where the morphisms are electrical circuits made only from linear resistors. It’s easy to generalize this to ‘passive linear circuits’ where we include linear inductors and capacitors… and we can keep going further in this direction. I’m writing a paper on this stuff now.

But today I want to head in another direction, and tell you about something I’m beginning to work on with Jacob Biamonte: something called ‘Petri nets’.

Before I do, I have to answer a question that I can see forming in your forehead: what’s the use of this stuff?

I don’t actually think electrical engineers are going to embrace category theory — at least not until it’s routinely taught as part of college math. And I don’t even think they should! I don’t claim that category theory is going to help them do anything they want to do.

What it will do is help organize our understanding of systems made of parts.

In mathematics, whenever you see a pattern that keeps showing up in different places, you should try to precisely define it and study it — and whenever you see it showing up somewhere, you should note that fact. Eventually, over time, your store of patterns grows, and you start seeing connections that weren’t obvious before. And eventually, really cool things will happen!

It’s hard to say what these things will be before they happen. It’s almost not worth trying. For example, the first people who started trying to compute the perimeter of an ellipse could never have guessed that the resulting math would, a century or so later, be great for cryptography. The necessary chain of ideas was too long and twisty to possibly envision ahead of time.

But in the long run, having mathematicians develop and investigate a deep repertoire of patterns tends to pay off.

Petri nets – the definition

So: what’s a Petri net? Here’s my quick explanation, using graphics that David Tweed kindly provided for this article:

Petri net, Azimuth Project.

A Petri net is a kind of diagram for the describing processes that arise in systems with many components. They were invented in 1939 by Carl Adam Petri — when he was 13 years old — in order to describe chemical reactions. They are a widely used model of concurrent processes in theoretical computer science. They are also used to describe interactions between organisms (e.g. predation, death, and birth), manufacturing processes, supply chains, and so on.

Here is an example from chemistry:

The circles in a Petri net denote so-called states, which in this example are chemical compounds. The rectangular boxes denote transitions, which in this example are chemical reactions. Every transition has a finite set of input states, with wires coming in from those, and a finite set of output states, with wires going out. All this information can equally well be captured by the usual notation for chemical reactions, as follows:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

One advantage of Petri nets is that we can also label each state by some number (0,1,2,3,…) of black dots, called tokens. In our example, each black dot represents a molecule of a given kind. Thus

describes a situation where one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide, and one molecule of hydrochloric acid are present. No molecules of any other kind are present at this time.

We can then describe processes that occur with the passage of time by moving the tokens around. If the carbon reacts with oxygen we obtain:

If the carbon dioxide combines with the sodium hydroxide to form sodium bicarbonate, we then obtain:

Finally, if the sodium bicarbonate and hydrochloric acid react, we get:

Note that in each case the following rule holds: for any given transition, we can delete one token for each of its input states and simultaneously add one token for each of its output states.

Here is a simpler example, taken from this article:

Petri net, Wikipedia.



Here the transitions are denoted by black rectangles. This example is somewhat degenerate, because there no transitions with more than one input or more than one output. However, it illustrates the possibility of having more than one token in a given state. It also illustrates the possibility of a transition with no inputs, or no outputs. In chemistry this is useful for modelling a process where molecules of a given sort are added to or removed from the environment.

Symmetric monoidal categories

Now, what about symmetric monoidal categories? If you really want the definition, either read theRosetta Stone paper or the nLab article. For now, you can probably fake it.

A symmetric monoidal category is, very roughly, a category with a tensor product. If we ignore the tokens, a Petri net is a way of specifying a symmetric monoidal category by giving a set of objects x_1, \dots, x_n and a set of morphisms between tensor products of these objects, for example

f_1 : x_1 \otimes x_2 \to x_2 \otimes x_2

f_2 : x_2 \to 1

f_3 : 1 \to x_1

where 1 denotes the tensor product of no objects. For example, the objects might be molecules and the morphisms might be chemical reactions; in this case the tensor product symbol is written ‘+‘ rather than ‘\otimes‘.

The kind of symmetric monoidal category we get this way is a ‘free strict symmetric monoidal category’. It’s ‘free’ in the sense that it’s ‘freely generated’ from some objects and morphisms, without any relations. It’s ‘strict’ because we’re assuming the tensor product is precisely associative, not just associative ‘up to isomorphism’.

For more on how Petri nets describe free symmetric monoidal categories, see:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

Here Sassone describes a category of Petri nets and sketches the description of a functor from that category to the category of strict symmetric monoidal categories. Sassone also has some other papers on Petri nets and category theory.

As I mentioned, Petri nets have been put to work in many ways. I won’t even start trying to explain that today — for that, try the references on the Azimuth Project page. My only goal today was to convince you that Petri nets are a pathetically simple idea, nothing to be scared of. And if you happen to be a category theorist, it should also be pathetically simple to see how they describe free strict symmetric monoidal categories. If you’re not… well, never mind!


Stabilization Wedges (Part 4)

16 January, 2011

Okay, here are the last two of Pacala and Socolow’s stabilization wedges. Remember, these wedges are ways to reduce carbon emissions. Each one is supposed to ramp up from 2004 to 2054, so that by the end it reduces carbon emissions by 1 gigaton per year. They claimed that seven wedges would be enough to keep emissions flat:



In Part 1 of this series we talked about four wedges involving increased efficiency and conservation. In Part 2 we covered one about shifting from coal to natural gas, and three about carbon capture and storage. In Part 3 we discussed five involving nuclear power and renewable energy. The last two wedges involve forests and agriculture:

14. Stop deforestation, start reforestation. They say we could stop half a gigaton of carbon emissions per if we completely stopped clear-cutting tropical forests over 50 years, instead of just halving the rate at which they’re getting cut down. For another half gigaton, plant 250 million hectares of new forests in the tropics, or 400 million hectares in the temperate zone!

To get a sense of the magnitude here, note that current areas of tropical and temperate forests are 1500 and 700 million hectares, respectively.

Pacala and Socolow also say that another half gigaton of carbon emissions could be prevented by created by establishing approximately 300 million hectares of plantations on nonforested land.

15. Soil management. When forest or grassland is converted to cropland, up to one-half of the soil carbon gets converted to CO2, mainly because tilling increases the rate of decomposition by aerating undecomposed organic matter. Over the course of history, they claim, 55 gigatons of carbon has gone into the atmosphere this way. That’s the equivalent of two wedges. (Note that one wedge, ramping up linearly to 1 gigaton/year for 50 years, adds up to 25 gigatons of carbon by 2054.)

However, good agricultural practices like no-till farming can reverse these losses — also reduce erosion! By 1995, these practices had been adopted on 110 million of the world’s 1600 million hectares of cropland. If this could be extended to all cropland, accompanied by a verification program that enforces practices that actually work as advertised, somewhere between half and one gigaton of carbon per year could be stored in this way. So: maybe half a wedge, maybe a whole wedge!

I’ve seen a lot of argument about both these topics, and I’d love to learn more facts. Some of the controversy concerns the UN’s <a href="REDD+ program, which got a big boost in Cancún. “REDD” stands for Reducing Emissions from Deforestation and Forest Degradation — while the plus sign hints at the the role of conservation, sustainable management of forests, and enhancement of forest carbon stocks.

Some people think REDD+ is great, while others think it could actually hurt. The Wikipedia article says, among other things:

REDD is presented as an “offset” scheme of the carbon markets and thus, will produce carbon credits. Carbon offsets are “emissions-saving projects” that in theory “compensate” for the polluters’ emissions. Offsets allow polluting governments and corporations, which have the historical responsibility to clean up the atmosphere, to buy their way out of the problem with cheap projects that exacerbate social and environmental conflicts in the South. Moreover, it delays any real domestic action where a historical responsibility lies and allows the expansion of more fossil fuel explorations and extractions. The “carbon credits” generated by these projects can be used by industrialised governments and corporations to meet their targets and/or to be traded within the carbon markets.

There’s also a lot of argument about just how much long-term impact on atmospheric CO2 a standing forest has, though everyone seems to agree that cutting one down releases a lot of CO2.

For a bit more, try:

About REDD+, United Nations.

REDD+: Reducing Emissions from Deforestation and Forest Degradation, Center for International Forestry Research.

Next time I’ll give you a update on the stabilization wedges from Stephen Pacala himself, based on a talk he gave in 2008. It’s a bit scary…


Postdoc Positions in Climate Mathematics

13 January, 2011

Young mathematicians interested in climate issues, check this out! There are some postdoc positions available where you can work with people who are doing really cool stuff. Click on their names or photos for more. And apply soon — they’ll start reviewing applications on the 20th of January!

The Mathematics and Climate Research Network is a nation-wide NSF funded initiative. Our goal is to define, develop and solve mathematical problems that arise in climate science. A number of postdoctoral positions will be available starting in Summer or Fall, 2011. The successful applicants will be known as Ed Lorenz Postdoctoral Fellows in the Mathematics of Climate and will have an affiliation with one of the network nodes. The topics of research will range from sea-ice processes to paleoclimate studies and data assimilation in climate models. The network has twelve funded nodes and a number of other collaborating institutions. For more details, see www.mathclimate.org.

The postdoctoral fellows will be based at the nodes indicated below. There will be considerable interaction possible with other network members through weekly web-based seminars and working groups. The network encourages and will support extended visits by postdocs to other nodes.

All interested recent PhDs are encouraged to apply. There are two steps necessary for a complete application: (1) posting materials to mathjobs.org (cover letter, CV, research statement and 3 letters of recommndation), and (2) completing a short questionnaire to be found at: http://jobs.mathclimate.org.

The specific positions with areas of focus, primary locations and postdoctoral mentors as well as institution relevant information are given below. Salaries will be competitive. The postdocs are multi-year and starting times will all be sometime Summer or Fall, 2011. Teaching one course per year will be an option in most positions.

1. Arizona State University (School of Mathematical and Statistical Sciences), Data assimilation and large complex models of the atmosphere. Mentors: Eric Kostelich and Alex Mahalov.

2. Bowdoin College (Department of Mathematics), Dynamical systems in climate process models and paleoclimate. Mentors: Mary-Lou Zeeman and Dick McGehee (Minnesota).

3. New York University (Courant Institute of Mathematical Sciences), Southern Ocean, sea ice, Antarctic ice sheet, and regional atmospheric modeling. Mentor: David Holland.

4. University of Chicago (Department of Geosciences), Modeling and analysis of climate processes such as water vapor and cloud feedback, atmospheric circulation, land and sea ice including applications to past climate, and modeling of carbon cycle fluctuations on varying time scales. Mentors: Pam Martin, Ray Pierrehumbert, Dorian Abbot and Mary Silber (Northwestern).

5. University of Utah (Department of Mathematics), Analysis of sea ice through modeling, computation, and methods of applied mathematics and physics. Field trips to the Arctic or Antarctic potentially part of postdoctoral work. Mentor: Ken Golden.

6. University of Vermont (Department of Mathematics and Statistics), Development of data assimilation methods and implementation on climate models, both conceptual and global. Mentors: Chris Danforth and Chris Jones (UNC-CH).

7. University of Washington (Department of Applied Mathematics), Analysis of historical climate data using linear and nonlinear time series techniques. Mentors: Ka-Kit Tung and Dave Camp (Calpoly-SLO).

Each of the universities involved is an Affirmative Action/Equal Opportunity employer and welcomes applications from women, underrepresented ethnic, racial and cultural groups, and from people with disabilities. Reviewing of applications will begin on Jan 20, 2011 but applications will continue to be accepted until the positions are filled.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers