Rényi Entropy and Free Energy

10 February, 2011

I want to keep telling you about information geometry… but I got sidetracked into thinking about something slightly different, thanks to some fascinating discussions here at the CQT.

There are a lot of people interested in entropy here, so some of us — Oscar Dahlsten, Mile Gu, Elisabeth Rieper, Wonmin Son and me — decided to start meeting more or less regularly. I call it the Entropy Club. I’m learning a lot of wonderful things, and I hope to tell you about them someday. But for now, here’s a little idea I came up with, triggered by our conversations:

• John Baez, Rényi entropy and free energy.

In 1960, Alfred Rényi defined a generalization of the usual Shannon entropy that depends on a parameter. If p is a probability distribution on a finite set, its Rényi entropy of order \beta is defined to be

\displaystyle{ H_\beta = \frac{1}{1 - \beta} \ln \sum_i p_i^\beta }

where 0 \le \beta < \infty. This looks pretty weird at first, and we need \beta \ne 1 to avoid dividing by zero, but you can show that the Rényi entropy approaches the Shannon entropy as \beta approaches
1:

\lim_{\beta \to 1} H_\beta = -\sum_{i} p_i \ln p_i .

(A fun puzzle, which I leave to you.) So, it’s customary to define H_1 to be the Shannon entropy… and then the Rényi entropy generalizes the Shannon entropy by allowing an adjustable parameter \beta.

But what does it mean?

If you ask people what’s good about the Rényi entropy, they’ll usually say: it’s additive! In other words, when you combine two independent probability distributions into a single one, their Rényi entropies add. And that’s true — but there are other quantities that have the same property. So I wanted a better way to think about Rényi entropy, and here’s what I’ve come up with so far.

Any probability distribution can be seen as the state of thermal equilibrium for some Hamiltonian at some fixed temperature, say T = 1. And that Hamiltonian is unique. Starting with that Hamiltonian, we can then compute the free energy F at any temperature T, and up to a certain factor this free energy turns out to be the Rényi entropy H_\beta, where \beta = 1/T. More precisely:

F = (1 - T) H_\beta.

So, up to the fudge factor 1 - T, Rényi entropy is the same as free energy. It seems like a good thing to know — but I haven't seen anyone say it anywhere! Have you?

Let me show you why it’s true — the proof is pathetically simple. We start with our probability distribution p_i. We can always write

p_i = e^{- E_i}

for some real numbers E_i. Let’s think of these numbers E_i as energies. Then the state of thermal equilibrium, also known as the canonical ensemble or Gibbs state at inverse temperature \beta is the probability distribution

\frac{e^{- \beta E_i}}{Z}

where Z is the partition function:

Z = \sum_i e^{-\beta E_i}

Since Z = 1 when \beta = 1, the Gibbs state reduces to our original probability distribution at \beta = 1.

Now in thermodynamics, the quantity

F = - \frac{1}{\beta} \ln Z

is called the free energy. It’s important, because it equals the total expected energy of our system, minus the energy in the form of heat. Roughly speaking, it’s the energy that you can use.

Let’s see how the Rényi entropy is related to the free energy. The proof is a trivial calculation:

- \beta F = \ln Z = \ln \sum_{i \in X} e^{-\beta E_i} = \ln \sum_{i \in X} p_i^\beta = (1 - \beta) H_\beta

so

H_\beta = -  \frac{\beta}{1 - \beta} F

at least for \beta \ne 1. But you can also check that both sides of this equation have well-defined limits as \beta \to 1.

The relation between free energy and Rényi entropy looks even neater if we solve for F and write the answer using T instead of \beta = 1/T:

F = (1 - T)H_\beta

So, what’s this fact good for? I’m not sure yet! In my paper, I combine it with this equation:

F = \langle E \rangle - T S

Here \langle E \rangle is the expected energy in the Gibbs state at temperature T:

\langle E \rangle = \frac{1}{Z} \sum_i E_i \, e^{-\beta E_i}

while S is the usual Shannon entropy of this Gibbs state. I also show that all this stuff works quantum-mechanically as well as classically. But so far, it seems the main benefit is that Rényi entropy has become a lot less mysterious. It’s not a mutant version of Shannon entropy: it’s just a familiar friend in disguise.


Carbon Dioxide Puzzles

4 February, 2011

I like it when people do interesting calculations and help me put their results on this blog. Renato Iturriaga has plotted a graph that raises some interesting questions about carbon dioxide in the Earth’s atmosphere. Maybe you can help us out!

The atmospheric CO2 concentration, as measured at Mauna Loa in Hawaii, looks like it’s rising quite smoothly apart from seasonal variations:



However, if you take the annual averages from here:

• NOAA Earth System Laboratory, Global Monitoring Division, Recent Mauna Loa CO2.

and plot how much the average rises each year, the graph is pretty bumpy. You’ll see what I mean in a minute.

In comparison, if you plot the carbon dioxide emissions produced by burning fossil fuels, you get a rather smooth curve, at least according to these numbers:

• U. S. Energy Information Administration Total carbon dioxide emissions from the consumption of energy, 1980-2008.

Renato decided to plot both of these curves and their difference. Here’s his result:



The blue curve shows how much CO2 we put into the atmosphere each year by burning fossil fuels, measured in parts per million.

The red curve shows the observed increase in atmospheric CO2.

The green curve is the difference.

The puzzle is to explain this graph. Why is the red curve roughly 40% lower than the blue one? Why is the red curve so jagged?

Of course, a lot of research has already been done on these issues. There are a lot of subtleties! So if you like, think of our puzzle as an invitation to read the existing literature and tell us how well it does at explaining this graph. You might start here, and then read the references, and then keep digging.

But first, let me explain exactly how Renato Iturriaga created this graph! If he’s making a mistake, maybe you can catch it.

The red curve is straightforward: he took the annual mean growth rate of CO2 from the NOAA website I mentioned above, and graphed it. Let me do a spot check to see if he did it correctly. I see a big spike in the red curve around 1998: it looks like the CO2 went up around 2.75 ppm that year. But then the next year it seems to have gone up just about 1 ppm. On the website it says 2.97 ppm for 1998, and 0.91 for 1999. So that looks roughly right, though I’m not completely happy about 1998.

[Note added later: as you’ll see below, he actually got his data from here; this explains the small discrepancy.]

Renato got the blue curve by taking the US Energy Information Administration numbers and converting them from gigatons of CO2 to parts per million moles. He assumed that that the atmosphere weighs 5 × 1015 tons and that CO2 gets well mixed with the whole atmosphere each year. Given this, we can simply say that one gigaton is 0.2 parts per million of the atmosphere’s mass.

But people usually measure CO2 in parts per million volume. Now, a mole is just a certain large number of molecules. Furthermore, the volume of a gas at fixed pressure is almost exactly proportional to the number of molecules, regardless of its composition. So parts per million volume is essentially the same as parts per million moles.

So we just need to do a little conversion. Remember:

• The molecular mass of N2 is 28, and about 79% of the atmosphere’s volume is nitrogen.

• The molecular mass of O2 is 32, and about 21% of the atmosphere’s volume is oxygen.

• By comparison, there’s very little of the other gases.

So, the average molecular mass of air is

28 × .79 + 32 × .21 = 28.84

On the other hand, the molecular mass of CO2 is 44. So one ppm mass of CO2 is less than one ppm volume: it’s just

28.84/44 = 0.655

parts per million volume. So, a gigaton of CO2 is about 0.2 ppm mass, but only about

0.2 × 0.655 = 0.13

parts per million volume (or moles).

So to get the blue curve, Renato took gigatons of CO2 and multiplied by 0.13 to get ppm volume. Let me do another spot check! The blue curve reaches about 4 ppm in 2008. Dividing 4 by 0.13 we get about 30, and that’s good, because energy consumption put about 30 gigatons of CO2 into the atmosphere in 2008.

And then, of course, the green curve is the blue one minus the red one:



Now, more about the puzzles.

One puzzle is why the red curve is so much lower than the blue one. The atmospheric CO2 concentration is only going up by about 60% of the CO2 emitted, on average — though the fluctuations are huge. So, you might ask, where’s the rest of the CO2 going?

Probably into the ocean, plants, and soil:



But at first glance, the fact that only 60% stays in the atmosphere seems to contract this famous graph:



This shows it taking many years for a dose of CO2 added to the atmosphere to decrease to 60% of its original level!

Is the famous graph wrong? There are other possible explanations!

Here’s a non-explanation. Humans are putting CO2 into the atmosphere in other ways besides burning fossil fuels. For example, deforestation and other changes in land use put somewhere between 0.5 and 2.7 gigatons of carbon into the atmosphere each year. There’s a lot of uncertainty here. But this doesn’t help solve our puzzle: it means there’s more carbon to account for.

Here’s a possible explanation. Maybe my estimate of 5 × 1015 tons for the mass of the atmosphere is too high! That would change everything. I got my estimate off the internet somewhere — does anyone know a really accurate figure?

Renato came up with a more interesting possible explanation. It’s very important, and very well-known, that CO2 doesn’t leave the atmosphere in a simple exponential decay process. Imagine for simplicity that carbon stays in three boxes:

• Box A: the atmosphere.

• Box B: places that exchange carbon with the atmosphere quite rapidly.

• Box C: places that exchange carbon with the atmosphere and box B quite slowly.

As we pump CO2 into box A, a lot of it quickly flows into box B. It then slowly flows from boxes A and B into box C.

The quick flow from box A to box B accounts for the large amounts of ‘missing’ CO2 in Renato’s graph. But if we stop putting CO2 into box A, it will soon come into equilibrium with box B. At that point, we will not see the CO2 level continue to quickly drop. Instead, CO2 will continue to slowly flow from boxes A and B into box C. So, it can take many years for the atmospheric CO2 concentration to drop to 60% of its original level — as the famous graph suggests.

This makes sense to me. It shows that the red curve can be a lot lower than the blue one even if the famous graph is right.

But I’m still puzzled by the dramatic fluctuations in the red curve! That’s the other puzzle.


Seven Rules for Risk and Uncertainty

31 January, 2011

Curtis Faith

Saving the planet will not be easy. We know what the most urgent problems are, but there is too much uncertainty to predict any particular outcomes or events with precision.

So what are we to do? How do we formulate strategy and plans to avert or mitigate disasters? How do we plan for problems when we don’t know exactly what form they will take?

Seven rules

As I noted in my previous blog post, from my experience as a trader, an entrepreneur, and what what I learned about how emergency room (ER) doctors manage risk and uncertainty, those who are confronted with uncertainty as part of their daily lives use a similar strategy for managing that uncertainty. Further, the way they make decisions and develop plans for the future is very relevant to the Azimuth project.

My second book, Inside the Mind of the Turtles, described this strategy in detail. In the book, I outlined seven rules for managing risk and uncertainty. They are:

  • Overcome fear,
  • Remain flexible,
  • Take reasoned risks,
  • Prepare to be wrong,
  • Actively seek reality,
  • Respond quickly to change, and
  • Focus on decisions, not outcomes.

Most of you are familiar with many of the aspects of life-or-death emergencies, having experienced them when you or a loved one has been seriously sick or injured. So it may be a little easier to understand these rules if you examine them from the perspective of an ER doctor.

Overcome fear

Risk is everywhere in the ER. You can’t avoid it. Do nothing, and the patient may die. Do the wrong thing, and the patient may die. Do the right thing, and the patient still may die. You can’t avoid risk.

At times, there may be so many patients requiring assistance that it becomes impossible to give them all care. Yet decisions must be made. The time element is critical to emergency care, and this greatly increases the risk associated with delay or making the wrong decisions. The doctor must make decisions quickly when the ER is busy, and these decisions are extremely important. Unlike in trading or in a startup, in the ER, mistakes can kill someone.

To be a successful as an ER doctor, you must be able to handle life-or-death decisions every day. You must have the confidence in your own abilities and your own judgment to act quickly when there is very little time. No doctor who is afraid to make life-or-death decisions stays in ER for very long.

Remain flexible

One of the hallmarks of an ER facility is the ability to act very quickly to address virtually any type of critical medical need. A well-equipped ER will have diagnostic and surgical facilities onsite, defibrillators for heart attack victims, and even surgical tools for those times when a patient may not survive the trip up the elevator to a full surgical suite.

Another way that an ER facility organizes for flexibility is by making sure that there are sufficient doctors with a broad range of specialties available. ERs don’t staff for the average workload; they staff for the maximum expected workload. They keep a strategic reserve of doctors and nurses available to assist in case things get extremely busy.

Take reasoned risks

Triage is one way of managing the risks associated with the uncertainty of medical diagnoses and treatments. Triage is a way of sorting patients so those who require immediate assistance are helped first, those in potentially critical situations next, and those in no imminent danger of further damage are helped last. For example, if you go to the ER with a broken leg, you may or may not be the first person in line for treatment. If a shooting victim comes in, you will be shuffled back in line. Your injury, while serious, can wait because you are in little danger of dying, and a few hours’ delay in setting a bone is unlikely to cause permanent damage.

Diagnosis itself is one of the most important aspects of emergency treatment. The wrong diagnosis can kill a patient. The right diagnosis can save a life. Yet diagnosis is messy. There are no right answers, only probable answers.

Doctors weigh the probability of particular diseases or injuries against the seriousness of outcomes for the likely conditions and the time sensitivity of a given treatment. Some problems require immediate care, whereas some are less urgent. Good doctors can quickly evaluate the symptoms and results of diagnostic tests to deliver the best diagnosis. The diagnosis may be wrong, but a good doctor will evaluate the factors to determine the most likely one and will continue to run tests to eliminate rarer but potentially more serious problems in time to effect relevant treatment.

Prepare to be wrong

A preliminary diagnosis may be wrong; the onset of more serious symptoms may indicate that a problem is more urgent than anticipated initially. Doctors know this. This is why they and their staff continuously monitor the health status of their patients.

Often while the initial diagnosis is being treated doctors will order additional tests to verify the correctness of that initial diagnosis. They know they can be wrong in their assessment. So they allow for this by checking for alternatives even while treating for the current diagnosis.

More than perhaps any other experts in uncertainty, doctors understand the ways that uncertainty can manifest itself. As a profession, doctors have almost completely mapped the current thinking in medicine into a large tree of objective and even subjective tests that can be run to confirm or eliminate a particular diagnosis. So a doctor knows exactly how to tell if she is wrong and what to do in that event almost every time she makes a diagnosis. Doctors also know which other, less common medical problems also can exhibit the same symptoms that the previous diagnosis did.

For example, if a patient comes in with a medium-grade fever, a doctor normally will check the ears, nose, sinuses, lymph nodes, and breathing to eliminate organ-specific issues and then probably issue a diagnosis of a flu infection. If the fever rises above 102 degrees (39C), the doctor probably will start running some tests to eliminate more serious problems, such as a bacterial infection or viral meningitis.

Actively seek reality

Since doctors are not 100 percent certain that the diagnosis they have made for a given patient is correct, they continue to monitor that patient’s health. If the patient is in serious danger, he will be monitored continuously. Anyone who visits a hospital emergency room will notice all the various monitors and diagnostic machines. There are ECG monitors to check the general health of the heart, pulse monitors, blood oxygenation testers, etc. The ER staff always has up-to-the-second status for their patients. These immediate readings alert doctors and nurses quickly to changes indicating a worsening condition.

Once the monitors have been set up (generally by the nursing staff), ER doctors double-check their diagnosis by running tests to rule out more serious illnesses or injuries that may be less common. The more serious the patient’s condition, the more tests will be run. A small error in diagnosis may cost a patient’s life if she suffers from a serious condition with poor vital signs such as very low blood pressure or an erratic pulse. A large error in diagnosis may not matter for a patient who is relatively healthy. So more time and effort are spent to verify the diagnoses of patients with more serious conditions, and less time and effort are spent verifying the diagnoses of stable patients.

Actively seeking reality is extremely important in emergency medicine because initial diagnoses are likely to be in error to some degree a significant percentage of the time. Since misdiagnoses can kill people, much time and effort are spent to verify and check a diagnosis and to make sure that a patient does not regress.

Respond quickly to change

If caught early, a misdiagnosis or a significant change in a patient’s condition need not be cause for worry. If caught late, it can mean serious complications, extended hospitalization, or even death. For critical illness and injury, time is very important.

The entire point of closely monitoring a patient is to enable the doctor to quickly determine if there is something more serious wrong than was first evident. A doctor’s initial diagnosis comes from the symptoms that are readily apparent. A good doctor knows that there may be a more serious condition causing those symptoms. More serious conditions often warrant different treatment. Sometimes a patient’s condition is serious enough that a few hours can mean the difference between life and death or between full recovery and permanent brain damage.

For example, a mother comes into the ER with her preteen son, who is running a fever of 102 degrees (39C), has a headache, and is vomiting. These are most likely symptoms from a flu infection that is not particularly emergent. The treatment for the flu is normally just bed rest and drinking lots of fluids. So, if the ER is busy, the flu patient normally will wait as patients with more urgent problems get care.

The addition of one more symptom may change the treatment completely. If the patient who may have been sitting in the ER waiting room starts complaining of a stiff painful neck in addition to the flu symptoms, this may be indicative of spinal meningitis, which is a life-threatening disease if not treated quickly. The attending physician likely will order an immediate lumbar puncture (also called a spinal tap) to examine the spinal fluid to see if it is infected with the organisms that cause spinal meningitis. If it is a bacterial infection, treatment with antibiotics will begin right away. A few hours difference can save a life in the case of bacterial spinal meningitis.

The important thing to remember is that a good doctor knows what to look for that will indicate a more serious condition than was indicated initially. She also will respond very quickly to administer appropriate treatment when the symptoms or tests indicate a more serious condition. A good doctor is not afraid of being wrong. A good doctor is looking for any sign that she might have been wrong so that she can help the patient who has a more serious disease in time to treat it so the patient can recover completely.

Focus on decisions, not outcomes

One of the difficulties facing ER doctors because of the uncertainty of medical diagnoses and treatments is the fact that a doctor can do everything correctly, and the patient still may die or suffer permanent damage. The doctor might perform perfectly and still lose the patient.

At times, a patient may require risky surgery to save his life. The doctor will weigh the risk of the surgery itself against the risk of alternative treatments. If the surgery will increase the chances of the patient surviving, then the doctor will order the surgery or perform it herself in cases of extreme emergency.

A doctor may make the best decision under the circumstances using the very best information available, and still the patient may die. A good doctor will evaluate the decision not on the basis of how it turns out but according to the relative probabilities of the outcomes themselves. An outcome of a dead patient does not mean that surgery was a mistake. Likewise, it may be that the surgery should not have been performed even when it has a successful outcome.

If ER doctors evaluated their decisions on the basis of outcomes, then it would lead to bad medicine. For example, if a particular surgery has a 10 percent mortality rate, meaning that 10 percent of the patients who have the surgery die soon after, this is risky surgery. If a patient has an injury that will kill the patient 60 percent of the time without that surgery, then the correct action is to have the surgery performed because the patient will be six times more likely to live with it than without it. If an ER doctor orders the surgery and it is performed without error, the patient still may die. This does not change the fact that absent any new information, the decision to have the surgery still was correct.

The inherent uncertainty of diagnosis and treatment means that many times the right treatment will have a bad outcome. A good doctor knows this and will continue prescribing the best possible treatment even when a few rare examples cross her path.

Relevance for Azimuth

Like an ER doctor trying to diagnose a patient in critical condition, we don’t have much time. We need to prepare ourselves so that when problems arise and disaster strikes, we can quickly determine what’s wrong, stabilize the patient, make sure we have found all the problems, monitor progress, and maintain vigilance until the patient has recovered.

The sheer complexity of the issues, and the scope of the problems that endanger the planet and life on it, ensure that there will never be enough information to make a “correct” analysis, or one single foolproof plan of action. Except in the very broadest terms, we can’t know what the future will bring so we need to build plans that acknowledge that very real limitation.

Rather than pretend that we know more than is possible to know we should embrace the uncertainty. We need to build flexible organizations and structures so that we are prepared to act no matter what happens. We need to build flexible plans that can accommodate change.

We need to build the capability to acquire and assimilate an understanding of reality as it unfolds. We need to seek the truth about our condition and likely prospects for the future.

And we need to be willing to change our minds when circumstances indicate that our judgments have been wrong.

Being ready for any potential scenario will not be easy. It will require a tremendous effort on the part of a global network of scientists, engineers, and others who are interested in saving the planet.

I hope that you consider joining our effort.


Curtis Faith on the Azimuth Project

27 January, 2011

Hi, I’m Curtis Faith. I’m very excited to be helping with the Azimuth Project.

A few weeks ago, I read John’s exhortation for blog readers to join in the discussion on the Azimuth Forum, so I decided to check it out. I was surprised at the amount of work that has been done in the last six months. I was inspired by the project’s goals and decided to commit to helping.

Since we need help, and hope that other blog readers might pitch in to help too, John and I thought it would be a good idea for me to explain a little about myself, why I think the Azimuth Project is so important, and how I think I can help.

I just turned 47 on Sunday. I am a first-time father with a 9-month old daughter. She is amazing. I don’t want her to grow up and wonder why our generation let things get so bad and I didn’t do anything to help make the world better.

I’m a real optimist by nature. But ignoring the very clear trends of the last 30 to 40 years is no longer an option. Our generation must stand up and do something about this.

A few years back I thought that politics might be the answer. I spent a lot of time learning the ins and outs of politics. My wife and I even followed the 2008 U.S. election and filmed the campaigns of Obama and Ron Paul in the process of learning. It is clear to me—having seen the way the last few years have unfolded—that political solutions will not avert the coming crisis.

In the last few years, my wife and I have lived in southeast Asia for 4 months, and in South America for a few years. I wanted to get to understand the world from outside the U.S. perspective. To get to know people in other countries as individuals, as humans. This has made it even more clear what the major problems are, and that the solutions won’t be implemented until a major crisis strikes.

So I’ve been working on learning relevant technology and science for the last few years as a backup plan. Trying to see where I might be able to help out in the most effective way possible. I have also spent a lot of time investigating the various other efforts working on the major global problems. None of them appear to me to be facing reality. In contrast, the Azimuth Project fits what I’ve seen with my own eyes.

But most of all, the reason that I’m excited about the Azimuth Project is that it has the loftiest of goals and the Earth needs saving. We’ve screwed it up and we’re running out of time.

A bit about me

I’m best known for something that started 27 years ago, in the fall of 1983, when I was just 19 years old.

I dropped out of college because I was bored and joined a small group of traders who later became famous in the trading world because of our subsequent success and how we learned to trade. Some of the lessons I learned in that group about managing risk and uncertainty are very relevant to the Azimuth Project goals for saving the world.

A famous Chicago trader, Richard Dennis, took out large ads in the New York Times, Barrons, and the Wall Street Journal announcing trainee positions. After only two weeks of training we were given money to trade and at the end of the first month of practice with a small account, I was given a $2 million account to trade. Over the next 4 plus years, I turned that $2 million into more than $33 million, more than doubling the money each year. Most of the other trainees were also successful and this story became legend in trading circles as the group made more than $100 million for Richard during the life of the program. Our group was known as the Turtles and I wrote a book about this experience, Way of the Turtle, that became a bestseller in finance a few years back.

After Rich disbanded the Turtles, I got bored with trading. I was more interested in software and wanted to do something to make the world a better place. I started a few companies, built innovative software, tried to solve challenging problems and eventually found my way to Silicon Valley at the latter half of the Internet Boom.

Chaos, and risk and uncertainty

The sheer complexity of the issues and the scope of the problems that endanger the planet and life on it ensure that there will never be enough information to make a “correct” analysis. Except in the very broadest terms, we can’t know what the future will bring so we need to build plans that acknowledge that very real limitation.

We could pretend that we know more than is possible to know, or we can embrace the uncertainty and adapt to it. If we do this, we can concentrate on building flexibility and responsiveness along with an ability to assimilate and acquire an understanding of reality as it unfolds.

As a trader and entrepreneur, I learned about managing risk and uncertainty and how to develop flexible plans that will work when you can’t predict the future. Over time I came to see that other professionals who were forced to plan and make decisions under conditions of uncertainty used similar strategies.

But first, some background. While in Silicon Valley, I met a couple of guys who were forming a new hedge fund in the Virgin Islands, Simon Olsen and Bruce Tizes. In early 2001, it was obvious to most people that the Internet party was over in Silicon Valley. Pink slips were flying everywhere. So I thought it might be a good time to do something new for a few years.

I had often thought about getting back into trading so I could build up enough money to fund my own projects. I didn’t like the way that all the funding in software was focused on money. Most investors didn’t care about building cool software, and certainly not about doing positive things for the world. If those things came, they were secondary to profits. So for a while I thought it best to go make my own money. So I wouldn’t be restricted to only those strategies which optimized profits for investors.

So I decided to join Bruce and Simon in their hedge fund venture, and Bruce and I subsequently became good friends. Bruce had a very interesting background. He is one of the rare true polymaths that I’ve run into. He is incredibly bright, with a very flexible mind. He graduated high school at 15 years of age, college at 16, and medical school at 20. He later made a lot of money investing in real estate and trading stocks.

For most of the time since he had become a doctor, Bruce had been practicing emergency medicine at Mount Sinai Hospital in Chicago. Mount Sinai is the inspiration for the television series E.R. that also takes place in Chicago and is a major destination for accident and gunshot victims in the downtown Chicago area.

So in various discussions over lunch or dinner over the few years we worked together, I came to learn a bit about the life of an emergency room doctor. Over time, Bruce showed me that there were similarities in how ER doctors and traders approached risk and uncertainty.

From my experience with software entrepreneurs and venture capitalists, I knew that they too handle risks and uncertainty in similar ways. It seemed like everyone who was forced to deal with uncertainty in the normal course of business followed similar general principles, and that these principles would be very useful even for those who didn’t learn them on the job.

Since my first book sold very well, the publisher was interested in getting me to write another book. I agreed to write one. But this time I wanted to write a book about these important principles for managing risk and uncertainty rather than a trading book.

This became my second book, Inside the Mind of the Turtles. Unfortunately, against my wishes and better judgment, it was marketed as a trading book. The truth is that it is a much more general book written for times of chaos and uncertainty, even for the emergency room doctors for a planet in peril. It contains ideas that are very relevant to the Azimuth Project.

In my next post here, I’ll outline the Seven Rules for Risk I develop in the book and show how they are relevant for the Azimuth Project because of the tremendous uncertainty inherent in environmental and sustainability issues.

In the meantime, I urge you to join in and help with the Azimuth Project. Read some articles in the Azimuth Library and join the discussions on the related Azimuth Forum.

The Azimuth Project is multidisciplinary so there are opportunities for all different kinds of people to help out. For example, I have been interested in low-energy transportation alternatives. So I plan on doing more research, adding to the Azimuth library of articles for advanced transportation, and finding some of the best experts to see if they will help on the Azimuth Project itself. I am also good at simplifying and explaining complicated problems. So I plan to take some of the more complicated sustainability issues and summarize them for non-experts. This will make it easier for people of diverse talents to grasp the full scope of the problems Azimuth is tackling.

I’ve spent much of the last 10 years trying to figure out how I can best help make the world a better place.

For me, the Azimuth Project is that answer. Come check it out.


Azimuth News (Part 1)

24 January, 2011

The world seems to be heading for tough times. From a recent New York Times article:

Over the next 100 years, many scientists predict, 20 percent to 30 percent of species could be lost if the temperature rises 3.6 degrees to 5.4 degrees Fahrenheit. If the most extreme warming predictions are realized, the loss could be over 50 percent, according to the United Nations climate change panel.

But when the going gets tough, the tough get going! The idea of the Azimuth Project is create a place where scientists and engineers can meet and work together to help save the planet from global warming and other environmental threats. The first step was to develop a procedure for collecting reliable information and explaining it clearly. That means: not just a wiki, but a wiki with good procedures and a discussion forum to help us criticize and correct the articles.

Thanks to the technical wizardry of Andrew Stacey, and a lot of sage advice and help from Eric Forgy, the wiki and forum officially opened their doors about four months ago.

That seems like ages ago. For months a small band of us worked hard to get things started. With the beginning of the new year, we seem to be entering a phase transition: we’re getting a lot of new members. So, it’s time to give you an update!

There’s a lot going on now. If you’ve been reading this blogs and clicking some of the links, you’ve probably seen some of our pages on sea level rise, coral reefs, El Niño, biochar, photovoltaic solar power, peak oil, energy return on energy invested, and dozens of other topics. If you haven’t, check them out!

But that’s just the start of it. If you haven’t been reading the Azimuth Forum, you probably don’t know most of what’s going on. Let me tell you what we’re doing.

I’ll also tell you some things you can do to help.

Azimuth Project Pages

By far the easiest thing is to go to any Azimuth Project page, think of some information or reference that it’s missing, and add it! Go to the home page, click on a category, find an interesting article in that category and give it a try. Or, if you want to start a new page, do that. We desperately need more help from people in the life sciences, to build up our collection of pages on biodiversity.

If you need help, start here:

How to get started.

Plans of Action

We’re working through various plans for dealing with peak oil, global warming, and various environmental problems. You can see our progress here:

Plans of action, Azimuth Project.

So far it goes like this. First we write summaries of these plans. Then I blog about them. Then Frederik De Roo is distilling your criticisms and comments and adding them to the Azimuth Project. The idea is to build up a thorough comparison of many different plans.

We’re the furthest along when it comes to Pacala and Socolow’s plan:

Stabilization wedges, Azimuth Project.

You don’t need to be an expert on any particular discipline to help here! You just need to be able to read plans of action and write crisp precise summaries, as above. We also need help finding the most important plans of action.

In addition to plans of action, we’re also summarizing various ‘reports’. The idea is that a report presents facts, while a plan of action advocates a course of action. See:

Reports, Azimuth Project.

In practice the borderline between plans of action and reports is a bit fuzzy, but that’s okay.

Plan C

Analyzing plans of action is just the first step in a more ambitious project: we’d like to start formulating our own plans. Our nickname for this project is Plan C.

Why Plan C? Many other plans, like Lester Brown’s Plan B, are too optimistic. They assume that most people will change their behavior in dramatic ways before problems become very serious. We want a plan that works with actual humans.

In other words: while optimism is a crucial part of any successful endeavor, we also need plans that assume plausibly suboptimal behavior on the part of the human race. It would be best if we did everything right in the first place. It would be second best to catch problems before they get very bad — that’s the idea of Plan B. But realistically, we’ll be lucky if we do the third best thing: muddle through when things get bad.

Azimuth Code Project

Some people on the Amazon Project, most notably Tim van Beek, are writing software that illustrates ideas from climate physics and quantitative ecology. Full-fledged climate models are big and tough to develop; it’s a lot easier to start with simple models, which are good for educational purposes. I’m starting to use these in This Week’s Finds.

If you have a background in programming, we need your help! We have people writing programs in R and Sage… but Tim is writing code in Java for a systematic effort he calls the Azimuth Code Project. The idea is that over time, the results will become a repository of open-source modelling software. As a side effect, he’ll try to show that clean, simple, open-source, well-managed and up-to-date code handling is possible at a low cost — and he’ll explain how it can be done.

So far most of our software is connected to stochastic differential equations:

• Software for investigating the Hopf bifurcation and its stochastic version: see week308 of This Week’s Finds.

• Software for studying predator-prey models, including stochastic versions: see the page on quantitative ecology. Ultimately it would be nice to have some software to simulate quite general stochastic Petri nets.

• Software for studying stochastic resonance: see the page on stochastic resonance. We need a lot more on this, leading up to software that takes publicly available data on Milankovitch cycles — cyclic changes in the Earth’s orbit — and uses it to make predictions of the glacial cycles. It’s not clear how good these predictions will be — the graphs I’ve seen so far don’t look terribly convincing — but the Milankovitch cycle theory of the ice ages is pretty popular, so it’ll be fun to see.

• We would like a program that simulates the delayed action oscillator, which is an interesting simple model for the El Niño / Southern Oscillation.

Graham Jones has proposed some more challenging projects:

• An open source version of FESA, the Future Energy Scenario Assessment. FESA, put out by Orion Innovations, is proprietary software that models energy systems scenarios, including meteorological data, economic analysis and technology performance.

• An automated species-identification system. See the article Time to automate identification in the journal Nature. The authors say that taxonomists should work with specialists in pattern recognition, machine learning and artificial intelligence to increase accuracy and reduce drudgery.

David Tweed, who is writing a lot of our pages on the economics of energy, has suggested some others:

• Modeling advanced strategies for an electrical smart grid.

• Modeling smartphone or website based car- or ride-sharing schemes.

• Modeling supply routing systems for supermarkets that attempt to reduce their ecological footprint.

All these more challenging projects will only take off if we find some energetic people and get access to good data.

This Week’s Finds

I’m interviewing people for This Week’s Finds: especially scientists who have switched from physics to environmental issues, and people with big ideas about how to save the planet. The goal here is to attract people, especially students, into working on these subjects.

Here’s my progress so far:

Nathan Urban — climate change. Done.

Tim Palmer — weather prediction. Done.

Eliezer Yudkowsky — friendly AI. Interviewed.

Thomas Fischbacher — sustainability. Interviewed.

Gregory Benford — geoengineering. Underway.

David Ellerman – helping people, economics. Underway.

Eric Drexler — nanotechnology. Agreed to do it.

Chris Lee — bioinformatics. Agreed to do it.

If you’re a scientist or engineer doing interesting things on the topics we’re interested in at the Azimuth Project, and you’d like me to interview you, let me know! Of course, your ego should be tough enough to handle it if I say no.

Alternatively: if you know somebody like this, and you’re good at interviewing people, this is another place you might help. You could either send them to me, or interview them yourself! I’m already trying to subcontract out one interview to a mathematician friend.

Blog articles

While I’ve been writing most of the articles on this blog so far, I don’t want it to stay that way. If you want to write articles, let me know! I might or might not agree… but if you read this blog, you know what I like, so you can guess ahead of time whether I’ll like your article or not.

In fact, the next two articles here will be written by Curtis Faith, a new member of the Azimuth Forum.

More

There’s also a lot more you can do. For suggestions, try:

Things to do, Azimuth Project.

Open projects, Azimuth Project.


Information Geometry (Part 6)

21 January, 2011

So far, my thread on information geometry hasn’t said much about information. It’s time to remedy that.

I’ve been telling you about the Fisher information metric. In statistics this is nice a way to define a ‘distance’ between two probability distributions. But it also has a quantum version.

So far I’ve showed you how to define the Fisher information metric in three equivalent ways. I also showed that in the quantum case, the Fisher information metric is the real part of a complex-valued thing. The imaginary part is related to the uncertainty principle.

You can see it all here:

Part 1     • Part 2     • Part 3     • Part 4     • Part 5

But there’s yet another way to define the Fisher information metric, which really involves information.

To explain this, I need to start with the idea of ‘information gain’, or ‘relative entropy’. And it looks like I should do a whole post on this.

So:

Suppose that \Omega is a measure space — that is, a space you can do integrals over. By a probability distribution on \Omega, I’ll mean a nonnegative function

p : \Omega \to \mathbb{R}

whose integral is 1. Here d \omega is my name for the measure on \Omega. Physicists might call \Omega the ‘phase space’ of some classical system, but probability theorists might call it a space of ‘events’. Today I’ll use the probability theorist’s language. The idea here is that

\int_A \; p(\omega) \; d \omega

gives the probability that when an event happens, it’ll be one in the subset A \subseteq \Omega. That’s why we want

p \ge 0

Probabilities are supposed to be nonnegative. And that’s also why we want

\int_\Omega \; p(\omega) \; d \omega = 1

This says that the probability of some event happening is 1.

Now, suppose we have two probability distributions on \Omega, say p and q. The information gain as we go from q to p is

S(p,q) = \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

We also call this the entropy of p relative to q. It says how much information you learn if you discover that the probability distribution of an event is p, if before you had thought it was q.

I like relative entropy because it’s related to the Bayesian interpretation of probability. The idea here is that you can’t really ‘observe’ probabilities as frequencies of events, except in some unattainable limit where you repeat an experiment over and over infinitely many times. Instead, you start with some hypothesis about how likely things are: a probability distribution called the prior. Then you update this using Bayes’ rule when you gain new information. The updated probability distribution — your new improved hypothesis — is called the posterior.

And if you don’t do the updating right, you need a swift kick in the posterior!

So, we can think of q as the prior probability distribution, and p as the posterior. Then S(p,q) measures the amount of information that caused you to change your views.

For example, suppose you’re flipping a coin, so your set of events is just

\Omega = \{ \mathrm{heads}, \mathrm{tails} \}

In this case all the integrals are just sums with two terms. Suppose your prior assumption is that the coin is fair. Then

q(\mathrm{heads}) = 1/2, \; q(\mathrm{tails}) = 1/2

But then suppose someone you trust comes up and says “Sorry, that’s a trick coin: it always comes up heads!” So you update our probability distribution and get this posterior:

p(\mathrm{heads}) = 1, \; p(\mathrm{tails}) = 0

How much information have you gained? Or in other words, what’s the relative entropy? It’s this:

S(p,q) = \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega = 1 \cdot \log(\frac{1}{1/2}) + 0 \cdot \log(\frac{0}{1/2}) = 1

Here I’m doing the logarithm in base 2, and you’re supposed to know that in this game 0 \log 0 = 0.

So: you’ve learned one bit of information!

That’s supposed to make perfect sense. On the other hand, the reverse scenario takes a bit more thought.

You start out feeling sure that the coin always lands heads up. Then someone you trust says “No, that’s a perfectly fair coin.” If you work out the amount of information you learned this time, you’ll see it’s infinite.

Why is that?

The reason is that something that you thought was impossible — the coin landing tails up — turned out to be possible. In this game, it counts as infinitely shocking to learn something like that, so the information gain is infinite. If you hadn’t been so darn sure of yourself — if you had just believed that the coin almost always landed heads up — your information gain would be large but finite.

The Bayesian philosophy is built into the concept of information gain, because information gain depends on two things: the prior and the posterior. And that’s just as it should be: you can only say how much you learned if you know what you believed beforehand!

You might say that information gain depends on three things: p, q and the measure d \omega. And you’d be right! Unfortunately, the notation S(p,q) is a bit misleading. Information gain really does depend on just two things, but these things are not p and q: they’re p(\omega) d\omega and q(\omega) d\omega. These are called probability measures, and they’re ultimately more important than the probability distributions p and q.

To see this, take our information gain:

\int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

and juggle it ever so slightly to get this:

\int_\Omega \;  \log(\frac{p(\omega) d\omega}{q(\omega)d \omega}) \; p(\omega) d \omega

Clearly this depends only on p(\omega) d\omega and q(\omega) d\omega. Indeed, it’s good to work directly with these probability measures and give them short names, like

d\mu = p(\omega) d \omega

d\nu = q(\omega) d \omega

Then the formula for information gain looks more slick:

\int_\Omega \; \log(\frac{d\mu}{d\nu}) \; d\mu

And by the way, in case you’re wondering, the d here doesn’t actually mean much: we’re just so brainwashed into wanting a d x in our integrals that people often use d \mu for a measure even though the simpler notation \mu might be more logical. So, the function

\frac{d\mu}{d\nu}

is really just a ratio of probability measures, but people call it a Radon-Nikodym derivative, because it looks like a derivative (and in some important examples it actually is). So, if I were talking to myself, I could have shortened this blog entry immensely by working with directly probability measures, leaving out the d‘s, and saying:

Suppose \mu and \nu are probability measures; then the entropy of \mu relative to \nu, or information gain, is

S(\mu, \nu) =  \int_\Omega \; \log(\frac{\mu}{\nu}) \; \mu

But I’m under the impression that people are actually reading this stuff, and that most of you are happier with functions than measures. So, I decided to start with

S(p,q) =  \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

and then gradually work my way up to the more sophisticated way to think about relative entropy! But having gotten that off my chest, now I’ll revert to the original naive way.

As a warmup for next time, let me pose a question. How much is this quantity

S(p,q) =  \int_\Omega \; p(\omega) \log(\frac{p(\omega)}{q(\omega)}) \; d \omega

like a distance between probability distributions? A distance function, or metric, is supposed to satisfy some axioms. Alas, relative entropy satisfies some of these, but not the most interesting one!

• If you’ve got a metric, the distance between points should always be nonnegative. Indeed, this holds:

S(p,q) \ge 0

So, we never learn a negative amount when we update our prior, at least according to this definition. It’s a fun exercise to prove this inequality, at least if you know some tricks involving inequalities and convex functions — otherwise it might be hard.

• If you’ve got a metric, the distance between two points should only be zero if they’re really the same point. In fact,

S(p,q) = 0

if and only if

p d\omega = q d \omega

It’s possible to have p d\omega = q d \omega even if p \ne q, because d \omega can be zero somewhere. But this is just more evidence that we should really be talking about the probability measure p d \omega instead of the probability distribution p. If we do that, we’re okay so far!

• If you’ve got a metric, the distance from your first point to your second point is the same as the distance from the second to the first. Alas,

S(p,q) \ne S(q,p)

in general. We already saw this in our example of the flipped coin. This is a slight bummer, but I could live with it, since Lawvere has already shown that it’s wise to generalize the concept of metric by dropping this axiom.

• If you’ve got a metric, it obeys the triangle inequality. This is the really interesting axiom, and alas, this too fails. Later we’ll see why.

So, relative entropy does a fairly miserable job of acting like a distance function. People call it a divergence. In fact, they often call it the Kullback-Leibler divergence. I don’t like that, because ‘the Kullback-Leibler divergence’ doesn’t really explain the idea: it sounds more like the title of a bad spy novel. ‘Relative entropy’, on the other hand, makes a lot of sense if you understand entropy. And ‘information gain’ makes sense if you understand information.

Anyway: how can we save this miserable attempt to get a distance function on the space of probability distributions? Simple: take its matrix of second derivatives and use that to define a Riemannian metric g_{ij}. This Riemannian metric in turn defines a metric of the more elementary sort we’ve been discussing today.

And this Riemannian metric is the Fisher information metric I’ve been talking about all along!

More details later, I hope.


Petri Nets

18 January, 2011

I’m trying to build bridges between mathematics and practical subjects like ecology and engineering — subjects that might help us save the planet.

As part of this, I have a project to explain how some ideas from electrical engineering, control theory, systems ecology, systems biology and the like can be formalized — and to some extent unified — using “symmetric monoidal categories”. Whenever you have a setup with:

• abstract gadgets that have ‘input wires’ and ‘output wires’,

and you can

• hook up these gadgets by connecting outputs to inputs,

and

• the wires can cross over each other, and

• it doesn’t matter whether one wire crosses over or under another

then you’ve probably got a symmetric monoidal category! For a precise definition, and lots of examples, try:

• John Baez and Mike Stay, Physics, topology, logic and computation: a Rosetta Stone, in New Structures for Physics, ed. Bob Coecke, Lecture Notes in Physics vol. 813, Springer, Berlin, 2011, pp. 95-174.

Back then I was excited about examples from ‘topological quantum field theory’, and ‘linear logic’, and ‘programming semantics’, and ‘quantum circuit theory’. But now I’m trying to come down to earth and think about examples of a more everyday nature. And they turn out to be everywhere!

For example, when you see these diagrams:

you may see a 10-watt amplifier with bass boost, and 16-volt power supply with power-on indicator light. But I see morphisms in symmetric monoidal categories!

Back in week296 I explained a symmetric monoidal category where the morphisms are electrical circuits made only from linear resistors. It’s easy to generalize this to ‘passive linear circuits’ where we include linear inductors and capacitors… and we can keep going further in this direction. I’m writing a paper on this stuff now.

But today I want to head in another direction, and tell you about something I’m beginning to work on with Jacob Biamonte: something called ‘Petri nets’.

Before I do, I have to answer a question that I can see forming in your forehead: what’s the use of this stuff?

I don’t actually think electrical engineers are going to embrace category theory — at least not until it’s routinely taught as part of college math. And I don’t even think they should! I don’t claim that category theory is going to help them do anything they want to do.

What it will do is help organize our understanding of systems made of parts.

In mathematics, whenever you see a pattern that keeps showing up in different places, you should try to precisely define it and study it — and whenever you see it showing up somewhere, you should note that fact. Eventually, over time, your store of patterns grows, and you start seeing connections that weren’t obvious before. And eventually, really cool things will happen!

It’s hard to say what these things will be before they happen. It’s almost not worth trying. For example, the first people who started trying to compute the perimeter of an ellipse could never have guessed that the resulting math would, a century or so later, be great for cryptography. The necessary chain of ideas was too long and twisty to possibly envision ahead of time.

But in the long run, having mathematicians develop and investigate a deep repertoire of patterns tends to pay off.

Petri nets – the definition

So: what’s a Petri net? Here’s my quick explanation, using graphics that David Tweed kindly provided for this article:

Petri net, Azimuth Project.

A Petri net is a kind of diagram for the describing processes that arise in systems with many components. They were invented in 1939 by Carl Adam Petri — when he was 13 years old — in order to describe chemical reactions. They are a widely used model of concurrent processes in theoretical computer science. They are also used to describe interactions between organisms (e.g. predation, death, and birth), manufacturing processes, supply chains, and so on.

Here is an example from chemistry:

The circles in a Petri net denote so-called states, which in this example are chemical compounds. The rectangular boxes denote transitions, which in this example are chemical reactions. Every transition has a finite set of input states, with wires coming in from those, and a finite set of output states, with wires going out. All this information can equally well be captured by the usual notation for chemical reactions, as follows:

C + O2 → CO2

CO2 + NaOH → NaHCO3

NaHCO3 + HCl → H2O + NaCl + CO2

One advantage of Petri nets is that we can also label each state by some number (0,1,2,3,…) of black dots, called tokens. In our example, each black dot represents a molecule of a given kind. Thus

describes a situation where one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide, and one molecule of hydrochloric acid are present. No molecules of any other kind are present at this time.

We can then describe processes that occur with the passage of time by moving the tokens around. If the carbon reacts with oxygen we obtain:

If the carbon dioxide combines with the sodium hydroxide to form sodium bicarbonate, we then obtain:

Finally, if the sodium bicarbonate and hydrochloric acid react, we get:

Note that in each case the following rule holds: for any given transition, we can delete one token for each of its input states and simultaneously add one token for each of its output states.

Here is a simpler example, taken from this article:

Petri net, Wikipedia.



Here the transitions are denoted by black rectangles. This example is somewhat degenerate, because there no transitions with more than one input or more than one output. However, it illustrates the possibility of having more than one token in a given state. It also illustrates the possibility of a transition with no inputs, or no outputs. In chemistry this is useful for modelling a process where molecules of a given sort are added to or removed from the environment.

Symmetric monoidal categories

Now, what about symmetric monoidal categories? If you really want the definition, either read theRosetta Stone paper or the nLab article. For now, you can probably fake it.

A symmetric monoidal category is, very roughly, a category with a tensor product. If we ignore the tokens, a Petri net is a way of specifying a symmetric monoidal category by giving a set of objects x_1, \dots, x_n and a set of morphisms between tensor products of these objects, for example

f_1 : x_1 \otimes x_2 \to x_2 \otimes x_2

f_2 : x_2 \to 1

f_3 : 1 \to x_1

where 1 denotes the tensor product of no objects. For example, the objects might be molecules and the morphisms might be chemical reactions; in this case the tensor product symbol is written ‘+‘ rather than ‘\otimes‘.

The kind of symmetric monoidal category we get this way is a ‘free strict symmetric monoidal category’. It’s ‘free’ in the sense that it’s ‘freely generated’ from some objects and morphisms, without any relations. It’s ‘strict’ because we’re assuming the tensor product is precisely associative, not just associative ‘up to isomorphism’.

For more on how Petri nets describe free symmetric monoidal categories, see:

Vladimiro Sassone, On the category of Petri net computations, 6th International Conference on Theory and Practice of Software Development, Proceedings of TAPSOFT ’95, Lecture Notes in Computer Science 915, Springer, Berlin, pp. 334-348.

Here Sassone describes a category of Petri nets and sketches the description of a functor from that category to the category of strict symmetric monoidal categories. Sassone also has some other papers on Petri nets and category theory.

As I mentioned, Petri nets have been put to work in many ways. I won’t even start trying to explain that today — for that, try the references on the Azimuth Project page. My only goal today was to convince you that Petri nets are a pathetically simple idea, nothing to be scared of. And if you happen to be a category theorist, it should also be pathetically simple to see how they describe free strict symmetric monoidal categories. If you’re not… well, never mind!


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers