Stabilization Wedges (Part 4)

16 January, 2011

Okay, here are the last two of Pacala and Socolow’s stabilization wedges. Remember, these wedges are ways to reduce carbon emissions. Each one is supposed to ramp up from 2004 to 2054, so that by the end it reduces carbon emissions by 1 gigaton per year. They claimed that seven wedges would be enough to keep emissions flat:



In Part 1 of this series we talked about four wedges involving increased efficiency and conservation. In Part 2 we covered one about shifting from coal to natural gas, and three about carbon capture and storage. In Part 3 we discussed five involving nuclear power and renewable energy. The last two wedges involve forests and agriculture:

14. Stop deforestation, start reforestation. They say we could stop half a gigaton of carbon emissions per if we completely stopped clear-cutting tropical forests over 50 years, instead of just halving the rate at which they’re getting cut down. For another half gigaton, plant 250 million hectares of new forests in the tropics, or 400 million hectares in the temperate zone!

To get a sense of the magnitude here, note that current areas of tropical and temperate forests are 1500 and 700 million hectares, respectively.

Pacala and Socolow also say that another half gigaton of carbon emissions could be prevented by created by establishing approximately 300 million hectares of plantations on nonforested land.

15. Soil management. When forest or grassland is converted to cropland, up to one-half of the soil carbon gets converted to CO2, mainly because tilling increases the rate of decomposition by aerating undecomposed organic matter. Over the course of history, they claim, 55 gigatons of carbon has gone into the atmosphere this way. That’s the equivalent of two wedges. (Note that one wedge, ramping up linearly to 1 gigaton/year for 50 years, adds up to 25 gigatons of carbon by 2054.)

However, good agricultural practices like no-till farming can reverse these losses — also reduce erosion! By 1995, these practices had been adopted on 110 million of the world’s 1600 million hectares of cropland. If this could be extended to all cropland, accompanied by a verification program that enforces practices that actually work as advertised, somewhere between half and one gigaton of carbon per year could be stored in this way. So: maybe half a wedge, maybe a whole wedge!

I’ve seen a lot of argument about both these topics, and I’d love to learn more facts. Some of the controversy concerns the UN’s <a href="REDD+ program, which got a big boost in Cancún. “REDD” stands for Reducing Emissions from Deforestation and Forest Degradation — while the plus sign hints at the the role of conservation, sustainable management of forests, and enhancement of forest carbon stocks.

Some people think REDD+ is great, while others think it could actually hurt. The Wikipedia article says, among other things:

REDD is presented as an “offset” scheme of the carbon markets and thus, will produce carbon credits. Carbon offsets are “emissions-saving projects” that in theory “compensate” for the polluters’ emissions. Offsets allow polluting governments and corporations, which have the historical responsibility to clean up the atmosphere, to buy their way out of the problem with cheap projects that exacerbate social and environmental conflicts in the South. Moreover, it delays any real domestic action where a historical responsibility lies and allows the expansion of more fossil fuel explorations and extractions. The “carbon credits” generated by these projects can be used by industrialised governments and corporations to meet their targets and/or to be traded within the carbon markets.

There’s also a lot of argument about just how much long-term impact on atmospheric CO2 a standing forest has, though everyone seems to agree that cutting one down releases a lot of CO2.

For a bit more, try:

About REDD+, United Nations.

REDD+: Reducing Emissions from Deforestation and Forest Degradation, Center for International Forestry Research.

Next time I’ll give you a update on the stabilization wedges from Stephen Pacala himself, based on a talk he gave in 2008. It’s a bit scary…


Postdoc Positions in Climate Mathematics

13 January, 2011

Young mathematicians interested in climate issues, check this out! There are some postdoc positions available where you can work with people who are doing really cool stuff. Click on their names or photos for more. And apply soon — they’ll start reviewing applications on the 20th of January!

The Mathematics and Climate Research Network is a nation-wide NSF funded initiative. Our goal is to define, develop and solve mathematical problems that arise in climate science. A number of postdoctoral positions will be available starting in Summer or Fall, 2011. The successful applicants will be known as Ed Lorenz Postdoctoral Fellows in the Mathematics of Climate and will have an affiliation with one of the network nodes. The topics of research will range from sea-ice processes to paleoclimate studies and data assimilation in climate models. The network has twelve funded nodes and a number of other collaborating institutions. For more details, see www.mathclimate.org.

The postdoctoral fellows will be based at the nodes indicated below. There will be considerable interaction possible with other network members through weekly web-based seminars and working groups. The network encourages and will support extended visits by postdocs to other nodes.

All interested recent PhDs are encouraged to apply. There are two steps necessary for a complete application: (1) posting materials to mathjobs.org (cover letter, CV, research statement and 3 letters of recommndation), and (2) completing a short questionnaire to be found at: http://jobs.mathclimate.org.

The specific positions with areas of focus, primary locations and postdoctoral mentors as well as institution relevant information are given below. Salaries will be competitive. The postdocs are multi-year and starting times will all be sometime Summer or Fall, 2011. Teaching one course per year will be an option in most positions.

1. Arizona State University (School of Mathematical and Statistical Sciences), Data assimilation and large complex models of the atmosphere. Mentors: Eric Kostelich and Alex Mahalov.

2. Bowdoin College (Department of Mathematics), Dynamical systems in climate process models and paleoclimate. Mentors: Mary-Lou Zeeman and Dick McGehee (Minnesota).

3. New York University (Courant Institute of Mathematical Sciences), Southern Ocean, sea ice, Antarctic ice sheet, and regional atmospheric modeling. Mentor: David Holland.

4. University of Chicago (Department of Geosciences), Modeling and analysis of climate processes such as water vapor and cloud feedback, atmospheric circulation, land and sea ice including applications to past climate, and modeling of carbon cycle fluctuations on varying time scales. Mentors: Pam Martin, Ray Pierrehumbert, Dorian Abbot and Mary Silber (Northwestern).

5. University of Utah (Department of Mathematics), Analysis of sea ice through modeling, computation, and methods of applied mathematics and physics. Field trips to the Arctic or Antarctic potentially part of postdoctoral work. Mentor: Ken Golden.

6. University of Vermont (Department of Mathematics and Statistics), Development of data assimilation methods and implementation on climate models, both conceptual and global. Mentors: Chris Danforth and Chris Jones (UNC-CH).

7. University of Washington (Department of Applied Mathematics), Analysis of historical climate data using linear and nonlinear time series techniques. Mentors: Ka-Kit Tung and Dave Camp (Calpoly-SLO).

Each of the universities involved is an Affirmative Action/Equal Opportunity employer and welcomes applications from women, underrepresented ethnic, racial and cultural groups, and from people with disabilities. Reviewing of applications will begin on Jan 20, 2011 but applications will continue to be accepted until the positions are filled.


Quantum Information Processing 2011 (Day 2)

12 January, 2011

Here are some very fragmentary notes on the second day of QIP 2011. You can see arXiv references, slides, and videos of the talks here. I’ll just give links to the slides, and again I’ll only mention 3 talks.

Stephanie Werner gave a talk on the relation between the uncertainty principle and nonlocality in quantum theory. There’s a general framework for physical theories, called “generalized probabilistic theories”, which includes classical and quantum mechanics as special cases. In this framework we can see that while quantum theory is “nonlocal” in the sense made famous by John Bell, even more nonlocal theories are logically possible!

For example, while quantum theory violates the Clauser–Horn–Shimony–Holt inequality, which is obeyed by any local hidden variables theory, it doesn’t violate it to the maximum possible extent. There is a logically conceivable gadget, the Popescu–Rohrlich box, which violates the CHSH inequalities to the maximum extent allowed by a theory that prohibits faster-than-light signalling. However, such a gadget would give us implausibly godlike computational powers.

In Werner’s talk, she explained another reason not to hope for more nonlocality than quantum theory provides. Namely, given the “steering” ability we have in quantum theory — that is, our ability to prepare a state at one location by doing a measurement at another — the theory cannot be more nonlocal than it is while still obeying the uncertainty principle.

Jérémie Roland gave a talk on generalizations of Grover’s search algorithm. Grover’s algorithm is one of the implausibly godlike powers that quantum computers might give us, if only we could build the bloody things: it’s a way to search a list of N items for a given item in a time that’s only on the order of N1/2. This algorithm assumes we can freely jump from place to place on the list, so instead of a linearly ordered “list” it’s probably better to visualize this data structure as a complete graph with N vertices. Roland’s talk explained a way to generalize this idea to arbitrary graphs.

He began by considering a Markov chain on the graph — that is, a way to blunder randomly from vertex to vertex along the graph, where you can only go from one vertex to another in one step if there’s an edge connecting them. He assumed that it’s reversible and ergodic. Then, starting from this, he described how to fashion a quantum process that finds the desired vertex (or vertices) in about the square root of the time that the Markov chain would take.

Robin Kothari gave a talk on using quantum computation to efficiently detect certain properties of graphs. He considered “minor-closed properties” of graphs. Let me just tell you what those properties are, and tell you about a fascinating older result about them.

The word graph means many slightly different things, but in this blog entry I mean a finite set V whose elements are called vertices, together with a set E of unordered pairs of distinct vertices, which we call edges. So, these are undirected finite graphs without self-loops or multiple edges connecting vertices.

A minor of a graph is another graph that can be obtained from the first one by repeatedly:

1) removing edges,

2) removing vertices that have no edges connected to them, and

3) contracting edges, that is, shrinking them to nothing and then identifying the vertices at both ends, like this:



For example, this graph:



is a minor of this one:



A property of graphs is minor-closed if whenever one graph has it, all its minors also have it.

What are some minor-closed properties? An obvious one is lacking cycles, that is, being a forest. You can get rid of cycles by getting rid of edges and vertices, or contracting edges, but you can’t create them!

A more interesting minor-closed property is planarity. If you can draw a graph on the plane, you can clearly also draw the graph you get by removing an edge, or removing a lone vertex, or contracting an edge.

Now, Kuratowski showed that planar graphs as precisely those that don’t have this graph:



or this one:



as minors.

Similarly, graphs that lack cycles are precisely those that don’t have a triangle as a minor!

So, we could ask if this pattern generalizes. Given a minor-closed property of graphs, is it equivalent to a property saying that there’s some finite set of graphs that don’t show up as minors?

This is called Wagner’s conjecture, after Klaus Wagner. He published it in 1970.

Wagner’s conjecture is true! It was proved by Neil Robertson and Paul Seymour in 2004. But the really interesting thing, to me, is that their proof takes about 400 or 500 pages!

I find this quite surprising… but then, I wouldn’t have guessed the four-color theorem was so hard to prove, either.

To make sure you understand Wagner’s conjecture, check that if we dropped the word “finite”, it would have a one-sentence proof.


Quantum Information Processing 2011 (Day 1)

11 January, 2011

This year’s session of the big conference on quantum computation, quantum cryptography, and so on is being held in Singapore this year:

QIP 2011, the 14th Workshop on Quantum Information Processing, 8-14 January 2011, Singapore.

Because the battery on my laptop is old and not very energetic, and I can’t find any sockets in the huge ballroom where the talks are being delivered, I can’t live-blog the talks. So, my reports here will be quite incomplete.

Here are microscopic summaries of just three talks from Monday’s session. You can see arXiv references, slides, and videos of the talks here. I’ll just give links to the slides.

Christian Kurtsiefer gave a nice talk on how to exploit the physics of photodetectors to attack quantum key distribution systems! By cutting the optical fiber and shining a lot of light down both directions, evil Eve can ‘blind’ Alice and Bob’s photodetectors. Then, by shining a quick even brighter pulse of light, she can fool one of their photodetectors into thinking it’s seen a single photon. She can even fool them into thinking they’ve seen a violation of Bell’s inequality, by purely classical means, thanks to the fact that only a small percentage of photons make it down the cable in the first place. Christian and his collaborators have actually done this trick in an experiment here at the CQT!

Tzu-Chieh Wei and Akimasa Miyake gave a two-part joint talk on how the AKLT ground state is universal for measurement-based quantum computation. The AKLT ground state works like this: you’ve got a hexagonal lattice with three spin-1/2 particles at each vertex. Think of each particle as attached to one of the three edges coming out of that vertex. In the ground state, you start by putting the pair of particles at the ends of each edge in the spin-0 (also known as “singlet”, or antisymmetrized) state, and then you project the three particles at each vertex down to the spin-3/2 (completely symmetrized) state. This is indeed the ground state of a cleverly chosen antiferromagnetic Hamiltonian. But has anyone ever prepared this sort of system in the lab?

David Poulin gave a talk on how to efficiently compute time evolution given a time-dependent quantum Hamiltonian. The trickiness arises from Hamiltonians that change very rapidly with time. In a naive evenly spaced discretization of the time-ordered exponential, this would require you to use lots of tiny time steps to get a good approximation. But using a clever randomly chosen discretization you can avoid this problem, at least for uniformly bounded Hamiltonians, those obeying:

\| H(t) \| \le K

for all times t. The key is that the high-frequency part of a time-dependent Hamiltonian only couples faraway energy levels, but a uniformly bounded Hamiltonian doesn’t have faraway energy levels.

A couple more things — really just notes to myself:

Steve Flammia told me about this paper relating the Cramer-Rao bound (which involves Fisher information) to the time-energy uncertainty principle:

• Sergio Boixo, Steven T. Flammia, Carlton M. Caves, and J.M. Geremia, Generalized limits for single-parameter quantum estimation.

Markus Müller told me about a paper mentioning relations between Maxwell’s demon and algorithmic entropy. I need to get some references on this work — it might help me make progress on algorithmic thermodynamics. It’s probably one of these:

• Markus Müller, Quantum Kolmogorov complexity and the quantum Turing machine (PhD thesis).

• Markus Müller, On the quantum Kolmogorov complexity of classical strings, Int. J. Quant. Inf. 7 (2009), 701-711.

Hmm — the first one says:

A concrete proposal for an application of quantum Kolmogorov complexity is to analyze a quantum version of the thought experiment of Maxwell’s demon. In one of the versions of this thought experiment, some microscopic device tries to decrease the entropy of some gas in a box, without the expense of energy, by intelligently opening or closing some little door that separates both halves of the box. It is clear that a device like this cannot work as described, since its existence would violate the second law of thermodynamics. But then, the question is what prevents such a little device (or “demon”) from operating.

Roughly, the answer is that the demon has to make observations to decide whether to close or open the door, and these observations accumulate information. From time to time, the demon must erase this additional information, which is only possible at the expense of energy, due to Landauer’s principle. In Li and Vitanyi’s book An Introduction to Kolmogorov Complexity and Its Applications, this cost of energy is analyzed under very weak assumptions with the help of Kolmogorov complexity. Basically, the energy that the demon can extract from the gas is limited by the difference of the entropy of the gas, plus the difference of the Kolmogorov complexity of the demon’s memory before and after the demon’s actions. The power of this analysis is that it even encloses the case that the demon has a computer to do clever calculations, e.g. to compress the accumulated information before erasing it.

So, I guess I need to reread Li and Vitanyi’s book!


Mathematical Economics

8 January, 2011

My student Miguel Carrión Álvarez got his Ph.D. in 2004. This is when I was still working on loop quantum gravity. So, he decided to work on a rigorous loop quantization of the electromagnetic field. I like his thesis a lot:

• Miguel Carrión Álvarez, Loop Quantization versus Fock Quantization of p-Form Electromagnetism on Static Spacetimes.

However, he decided to leave mathematical physics when he got his degree… and he switched to finance.

There’s a lot of math in common between quantum field theory and mathematical finance. When you take quantum fluctuations in quantum fields, and replace time by imaginary time, you get random fluctuations in the stock market!

Or at least in some models of the stock market. One difference between quantum field theory and mathematical finance is that the former is famous for predicting certain quantities with many decimal digits of accuracy, while the latter is famous for predicting certain quantities with no digits of accuracy at all! I’m talking about the recent financial crisis.

Miguel and I share an interest in the failures of neoclassical economics. My interest comes from the hope — quite possibly a futile hope — that correcting some mistakes in economic theory could help show us a way out of some problems civilization now finds itself in. In fact, the Azimuth Project has its origins in my old economics diary.

Right now we’re having a little conversation about mathematical economics. Maybe you’d like to join in!

Miguel wrote:

I’ll be teaching parts of two courses on mathematical finance and financial risk management in an ‘Mathematical Engineering’ MSc course at the Universidad Complutense here in Madrid.

I replied:

Cool! Has the way people teach these subjects changed any since the economic crisis? I would hope so…

Miguel replied:

I don’t think it has.

First of all, these courses are mostly technical, as part of a Master’s programme intended to teach people what people do in practice. I don’t think criticizing the foundations is part of the program.

But you may have noticed (for instance, if you follow Krugman in the NYT) that the economic establishment has been very resistant to recognizing that the crisis is an empirical refutation of neoclassical economics. This crisis doesn’t fit within the conceptual framework of NCE, but that’s not a problem as they can just call it an “external shock” and continue pretending that the economy will trend to equilibrium from its new perturbed position. Related jokes of mine include that the recession part of the economic cycle is considered an outlier.

And this is not to speak of mathematical finance, where the situation is even worse. Academics still think that the best way to manage a new risk is to quantify it, define an index, and then create derivatives markets to trade it. In other words, turn all risk into market price risk and then push it all to the least solvent participants on the periphery of the financial system.

I think there is good progress being made by non-mainstream economists. Notably Steve Keen – see here:

• ARGeezer, Steve Keen’s dynamic model of the economy, European Tribune, 23 September 2009.

One of the problems with economics that Steve Keen complains about is that economists generally don’t know much about dynamical systems. I doubt they know what the Lotka-Volterra equation is, let alone understanding it (if you discretize it, the predator-prey model displays chaos like the logistic equation). I also doubt economists know about chaos in the logistic equation:

even if they know about logistic growth models which may not be generally the case either. Their model of the economy seems to be basically the Clausius-Clapeyron equation.

I believe the proper way to look at macroeconomics is as a flow network and as such ideas from category theory may be useful at least to organize one’s thinking.

The economy is a network whose nodes are “agents” and links are “economic relations”. Economic relations are flows, of goods and services in one direction and of money in the other direction (opposite categories via arrow reversal?).

Each node also has a balance sheet: assets and liabilities, and it’s the liabilities that are the key here, because they are mostly monetary. But they are also intertemporal. When you buy on credit, you get a physical asset today in exchange for the promise of cash at a later date. Presumably, the discounted present value of that future cash covers the value of the asset today, and that’s what you book as an asset or a liability. But the asset/liability accrues interest over time so its value changes on the balance sheet, and every so often you still need to make an actual transfer of cash along the an edge of the network. And when IOUs become tradeable (you issue me a bearer-IOU which I can then give to someone else who trusts your credit) they become money, too. And the relative variations in the prices of all these different kinds of money, their liquidity, etc, are key in understanding a recession like the one we’re in, or the growth (ehm, bubble) phase of a business cycle.

I don’t have a theory of this, but I keep thinking in these terms and one thing that seems to come out of it is a sort of “fluctuation-dissipation” relation between market fluctuations in prices and trade volumes, the creation of money-as-debt, and inflation. Because nodes abhor insolvency, fluctuations in cash flows lead to the creation of IOUs, which inflate the money mass. With the analogy inflation ~ dissipation, you get a sense that the more intense the market-induced cash flow fluctuations, the higher the inflation rate of the money mass, through the creation of money-as-tradeable-credit.

But none of this is mainstream economics, though there is an active community around “modern monetary theory”, “chartalism” or “monetary circuit theory” working on similar ideas.

But “Dynamic Stochastic General Equilibrium” is not going to cut it.

Anyway, maybe I could do a guest post on your blog about these things… It’s all very inchoate as you can see.

I also think the parallels between these economic flow networks and ecosystems and climates as matter/energy flow networks are important and such dynamical systems are a relatively unexplored area of mathematical physics – it’s just too difficult to say anything general about them!

Best,
Miguel

Miguel allowed me to post this exchange, noting that he could fill in gaps or moderate excessive claims in the comments. It would be nice if together we could all figure out how to take his thoughts and make them a bit more precise.

In week309 I plan to give an explanation of the Lotka-Volterra equation, based on work Graham Jones has done here:

Quantitative ecology, Azimuth Project.

I’m also dying to talk about flow networks in ecology, so it’s nice to hear that Miguel has been thinking about them in economics.

But here’s a basic question: in what sense do economists model the economy using the Clausius-Clapeyron equation? Is the idea that we can take this equation and use it to model economic equilibrium, somehow? How, exactly?


Algorithmic Thermodynamics (Part 2)

6 January, 2011

Here are some notes for a talk tomorrow at the Centre for Quantum Technologies. You can think of this as a kind of followup to my first post on this subject.

Introduction

The idea of entropy arose in thermodynamics, and its connection to probability theory is fundamental to statistical mechanics. Its connection to information was developed later by Shannon. We now think of entropy and information as two sides of the same coin.

But there’s another concept, called algorithmic information, which was developed in work on logic and computer science. This concept is related to Shannon’s earlier notion of information, but it also seems rather different.

For example, Shannon’s ideas let us compute the information of a probability distribution on bit strings. So if we detect a radio signal from an extraterrestrial life form, that sends us a string of a dozen 0’s and 1’s each day, and it seems the string is chosen randomly according to some probability distribution, we can calculate the information per message. But if I just show you a single bit string, like this:

011101100001

it makes no sense to ask what its information is. On the other hand, we can define its algorithmic information.

Nonetheless, my goal here is to show you that algorithmic information can really be seen as a special case of the old probabilistic concept of information. This lets us take all the familiar tools from statistical mechanics and thermodynamics, and apply them to algorithmic information theory. Mike Stay and I started to do this here:

• John Baez and Mike Stay, Algorithmic thermodynamics.

Unfortunately, I’ll only have time to sketch a bit of what we did!

Algorithmic information – first definition

Here’s a definition of algorithmic information. Later we’ll see a better one that’s almost equivalent.

Fix a programming language where a program is a finite bit string and its output, if it halts, is again a finite bit string. Then the algorithmic information of a finite bit string is the length of the shortest program that has that bit string as output.

So, for example, the algorithmic information of a string of a trillion 0’s is low, because you can write a short program that prints this out. On the other hand, the algorithmic information of a long “random” string of bits will be high, because the shortest program to print it out will be a program that says “print out this string: 01011100101100…” — so the program is approximately as long as the string itself: slightly longer, but only by a fixed amount.

Of course you may ask what “random” means here. I haven’t defined it! But the point is, people have used these ideas to give a definition what it means for a string of bits to be random. There are different ways to do it. For example: a bit string is ε-random if the shortest program that prints it out is at least ε times as long as the string itself.

You may also wonder: “Doesn’t the definition of algorithmic information depend on the details of our programming language?” And the answer is, “Yes, but not very much.” More precisely, for any two universal programming languages, there’s a constant K such that for any finite bit string, the algorithmic information of that string will differ by at most K depending on which language we use.

Ordinary information

Let’s see how algorithmic information compares to the usual concept of information introduced by Shannon. The usual concept works like this:

If an event of probability p occurs, we define the information gained by learning this event has occurred to be - \log p. We take the logarithm because probabilities of independent events multiply, and we want information to add. The minus sign makes the information positive. We can use any base we want for the logarithm: physicists like e, while computer scientists favor 2.

If there’s a set X of possible outcomes, where the outcome x \in X occurs with probability p_x, then the average or expected amount of information we gain by learning the outcome is

S(p) = - \sum_{x \in X} p_x \log p_x

We call this the entropy of the probability distribution p.

Now, ordinary information doesn’t look very much like algorithmic information! But there’s a second definition of algorithmic information, almost equivalent to the first one, that makes the relation clearer.

Algorithmic information – second definition

First, a minor shift in viewpoint. Before we defined algorithmic information for a finite bit string. Now let’s define it for a natural number. This change in viewpoint is no big deal, since we can set up a one-to-one correspondence between natural numbers and finite bit strings that’s easy to compute.

We’ll still think of our programs as finite bit strings. Not every such string gives a program that halts. We also may demand that bit strings have special features to count as programs. For example, we may demand that programs end in the string 11111, just like a Fortran program must end with the word END.

Let X be the set of programs that halt. Without loss of generality, we can assume that X is prefix-free. This means that if x \in X, no other bit string starting with the string x is also in X. For example, if the string 11111 marks the end of a program, and is only allowed to show up at the end of the program, X will be prefix-free.

Let V(x) be the length of the program x \in X. It’s easy to see that if X is prefix-free,

\sum_{x \in X} 2^{-V(x)} \le 1

Making this sum finite is the reason we want the prefix-free condition — you’ll see exactly why in a minute.

Let N(x) be the output of the program x \in X. Remember, we now think of the output as a natural number. So, we have functions

V, N : X \to \mathbb{N}

We define the algorithmic information of a number n \in \mathbb{N} to be

S(n) = - \log \sum_{x \in X, N(x) = n} 2^{-V(x)}

So: we do a sum over all programs x having the number n as output. We sum up 2^{-V(x)}, where V(x) is the length of the program x. The sum converges because

\sum_{x \in X} 2^{-V(x)} \le 1

That’s where the prefix-free condition comes into play.
Finally, we take minus the logarithm of this sum.

This seems a bit complicated! But suppose there’s just one program x having the number n as output. Then the sum consists of one term, the logarithm cancels out the exponential, and the minus signs cancel too, so we get

S(n) = V(x)

This matches our first definition: the algorithmic information of a number is the length of the shortest program having that number as output.

Of course in this case the shortest program is the only program having that output. If we have more than one program with n as output, the second definition of algorithmic entropy gives a smaller answer than the first definition:

S_{\mathrm{second}}(n) \le S_{\mathrm{first}}(n)

However, there’s a famous theorem, proved by Leonid Levin in 1974, that says the new definition is not very different from the old one. The difference is bounded by a constant. More precisely: for any universal prefix-free programming language, there’s a constant K such that for every n \in \mathbb{N},

S_{\mathrm{second}}(n) \ge S_{\mathrm{first}}(n) - K

Since the algorithmic information depends on our programming language, it was only well-defined within an additive constant in the first place. So, switching from the first definition to the second one doesn’t significantly change the concept. But, it makes the relation to ordinary information a lot easier to see!

From now on we’ll use the second definition.

Algorithmic information versus ordinary information

To relate algorithmic information to ordinary information, we need to get logarithms and probabilities into the game. We see a logarithm here:

S(n) = - \log \sum_{x \in X, N(x) = n} 2^{-V(x)}

but it’s in a funny place, outside the sum — and where are the probabilities?

To solve both these puzzles, let’s define a probability distribution p on the set of natural numbers by

p_n = \frac{1}{Z} \sum_{x \in X, N(x) = n} 2^{-V(x)}

Here Z is a normalizing constant, to make the probabilities sum to 1. Taking

Z = \sum_{x \in X} 2^{-V(x)}

ensures that

\sum_{n \in \mathbb{N}} p_n = 1

Now, notice that

S(n) = - \log p_n \; + \; \log Z

So, up to an additive constant, the algorithmic entropy of the number n is -\log p_n. But -\log p_n is just the information gained upon learning the number n, if we started out knowing only that n was chosen randomly according to the probability distribution p.

What’s the meaning of the probability distribution p? Simple: p_n is the probability that the number n is the output of a randomly chosen program… where ‘randomly chosen’ means chosen according to a certain specific rule. The rule is that increasing the length of a program by 1 makes it half as probable. In other words, the probability q_x of the program x is

q_x = \frac{1}{Z} 2^{-V(x)}

where Z is the normalization factor chosen to make these probabilities sum to 1.

So, the relation between algorithmic information and ordinary information is this:

The algorithmic information of a number is the information gained by learning that number, if beforehand we only knew that it was the output of a randomly chosen program.

Algorithmic thermodynamics

Why should the probability of the program x \in X be defined as

q_x = \frac{1}{Z} 2^{-V(x)} ?

There is no reason we need to use this probability distribution. However, there’s something good about it. It’s an example of a Gibbs state, meaning a probability distribution that maximizes entropy subject to a constraint on the expected value of some observable. Nature likes to maximize entropy, so Gibbs states are fundamental to statistical mechanics. So is the quantity Z: it’s called the partition function.

The idea works like this: suppose we want to maximize the entropy of a probability distribution p on the set of programs, subject to a constraint on the expected value of the length of the program. Then the answer is

p_x = \frac{1}{Z} e^{-\gamma V(x)}

where \gamma is some number, and the partition function Z ensures that the probabilities sum to 1:

Z = \sum_{x \in X} e^{- \gamma V(x)}

So, p equals our previous probability distribution q when

\gamma = \ln 2

However, it is also interesting to consider other values of \gamma, and in fact Tadaki has already done so:

• K. Tadaki, A generalization of Chaitin’s halting probability and halting self-similar sets, Hokkaido Math. J. 31 (2002), 219–253.

The partition function converges for \gamma \ge 2 but diverges otherwise.

More generally, we can look for the probability distribution that maximizes entropy subject to constraints on the expected value of several observables. For example, let:

E(x) = the logarithm of the runtime of the program x

V(x) = the length of the program x

N(x) = the output of the program x

Then the probability distribution that maximizes entropy subject to constraints on the expected values of these three quantities is

p_x = \frac{1}{Z} e^{-\beta E(x) - \gamma V(x) - \delta N(x)}

where now the partition function is

Z = \sum_{x \in X} e^{-\beta E(x) - \gamma V(x) - \delta N(x)}

We’ve chosen our notation to remind experts on statistical mechanics that they’ve seen formulas like this before. The exact same formulas describe a piston full of gas in thermal equilibrium! From a formal, purely mathematical viewpoint:

E is analogous to the internal energy of the gas, and \beta is analogous to 1/T where T is the temperature in units where Boltzmann’s constant is 1.

V is analogous to the volume of the gas, and \gamma is analogous to P/T where P is the pressure.

N is analogous to the number of molecules in the gas, and \delta is analogous to -\mu/T where \mu is the chemical potential.

The analogy here is quite arbitrary. I’m not saying that the length of a program is profoundly similar to the volume of a cylinder of gas; we could have chosen the letter E to stand for the length of a program and everything would still work.

But the analogy works. In other words: now we can take a lot of familiar facts about thermodynamics and instantly translate them into analogous facts about algorithmic information theory! For example, define the entropy of the probability distribution p in the usual way:

S(p) = \sum_{x \in X} p_x \ln p_x

We can think of this as a function of either T, P, and \mu or the expected values of E, V and N. In thermodynamics we learn lots of equations like this:

\left. \frac{\partial S}{\partial E} \right|_{V,N} = \frac{1}{T}

where by standard abuse of notation we’re using E, V, N to stand for their expected values. And, all these equations remain true in algorithmic thermodynamics!

The interesting part is figuring out what all these equations really mean in the context of algorithmic thermodynamics. For a start on that, try our paper.

For example, we call T algorithmic temperature. If you allow programs to run longer, more of them will halt and give an answer. Thanks to the equation

\left. \frac{\partial S}{\partial E} \right|_{V,N} = \frac{1}{T}

the algorithmic temperature is roughly the number of times you have to double the runtime in order to double the number of programs that satisfy the constraints on length and output.

We also consider the algorithmic analogues of thermodynamic cycles, familiar from old work on steam engines. Charles Babbage described a computer powered by a steam engine; we describe a heat engine powered by programs! The significance of this line of thinking remains fairly mysterious. However, I’m hoping that it will help point the way toward a further synthesis of algorithmic information theory and thermodynamics.


Welcome to the Greenhouse

5 January, 2011

Happy New Year!

I just got back from Hanoi. I think I love it. It’s a fascinating city, full of street vendors and craftsmen and twisting back alleys and deep ‘tunnel houses’ and Buddhist temples and more. It’s over a thousand years old, layered with history. The Vietnamese have taken culture from all the more powerful countries that tried to invade them — the Chinese, the French, the Americans — and made it their own. You can see it all here.

At first I hated Hanoi. In the Old Quarter, the roads are packed with motorbikes zipping in all directions — and the sidewalks are all used either as motorbike parking lots, or restaurants with people sitting on little plastic chairs, so you pretty much need to walk in the road a lot of the time. Traffic lights are few and rarely heeded.

It seems like a pedestrian’s worst nightmare. Try walking across this:


But on my first night there, I ran in a philosopher friend from Singapore, an expert on Moore’s paradox: John Williams. He was sitting in a bar, looking out the door, watching the world go by. Strange coincidence!

So we had a beer, and I relaxed a bit, and I noticed that the traffic is actually governed by a set of unwritten rules that you can learn by watching. Little old ladies or gaggles of laughing teenagers can stroll across the street, motorbikes whizzing past them in both directions, and not get hurt! And so can you, once you learn what to do.

And then the fun starts. There’s way, way, way too much to tell you about, so I won’t even try now. Maybe later, probably on my diary.

I think Jeremy Kressman said it right: “You don’t just visit Hanoi. Hanoi visits you.” The picture above is from his blog. I’ll replace it with one of my own as soon as I get around to downloading my photos.

But as I’ve already hinted, this blog entry is not really about Hanoi. It’s about a new collection of SF stories, all tackling the theme of global warming:

• Gordon Van Gelder, editor, Welcome to the Greenhouse, O/R Books, 2011.

It includes stories by Brian W. Aldiss, Jeff Carlson, Judith Moffett, Matthew Hughes, Gregory Benford, Michael Alexander, Bruce Sterling, Joseph Green, Pat MacEwen, Alan Dean Foster, David Prill, George Guthridge, Paul Di Filippo, Chris Lawson, Ray Vukcevich and M. J. Locke.

It’s long past time for SF writers to start envisioning the world we seem to be heading towards. Were some of these stories published a while ago? Have you read any of them? Are they any good?

This item was pointed out to me by David Roberts. And by the way: in case you missed it, check out the global warming game that Giampiero Campa brought to our attention:

Fate of the World, website.

It’s not free, but Giampiero says it’s fun, and I trust him. He wrote:

But don’t waste your time, winning is plain impossible!

but then later:

I take it back, I have finally won today, which basically means that I got to the year 2120 with less than 3 degrees increment without destroying civilization. I mean, Latin America has to be rebuilt and there’s war raging in China, but other than that, it’s an excellent result :-)


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers