Azimuth on Google Plus (Part 3)

21 October, 2011

I’ve been spending a lot of time Google+ lately, trying to drum up interest in the Azimuth Project. Unsurprisingly, my ‘fun’ posts have attracted more attention than those dealing with serious issues. This confirms my suspicion that computers were invented so we could goof off while it looks like we’re working.

My most popular contribution was this eye-catching image:

63 people shared it with others, despite my warning that it causes brain damage. By the way, there’s another cool illusion at the end of this post, but it’s only visible to people who read the whole thing.

The second most popular tidbit was this movie of Alvin Lucier’s “Music for solo performer”. If you enjoy puzzles, watch it before reading my explanation, and try to figure out what’s going on:

This piece exploits the fact that the brain’s alpha waves—which only start when you’re relaxed with eyes closed—have a frequency of 8-12 hertz. Thus, if amplified enormously, they can be made audible! To perform this piece, you put electrodes on your head and route the signal through an amplifier to loudspeakers coupled to percussion instruments. The performer here wrote:

I welcomed the challenge to reduce my performative activities to a minimum. While working out my interpretation I slowly learned to be aware of my mental activities. I acquired a sensitivity for subtle changes in tension and the ability to switch the state of my brain from beta to alpha and back again. Nevertheless, the outcome is not completely controllable. This makes the live act quite thrilling.

For more, read the text on YouTube.

But I posted about some deadly serious issues, too!

Is the Earth’s surface warming?

In 2010, a Berkeley physicist named Richard Muller decided to launch the Berkeley Earth Surface Temperature (BEST) project to independently check what the Earth’s surface temperature has been doing. The team included physicists, statisticians, and the climatologist Judith Curry, noted for “challenging the IPCC consensus” (her words).

The Charles G. Koch Foundation, which helps bankroll those who support inaction on climate change, gave Muller’s project $150,000. Anthony Watts, one of the big climate skeptic bloggers, wrote:

I’m prepared to accept whatever result they produce…

On the other side of the aisle, some who believe in global warming pre-emptively pooh-poohed the project.

Now BEST has released a bunch of papers on their results. Here’s their summary:

Global warming is real, according to a major study released today. Despite issues raised by climate change skeptics, the Berkeley Earth Surface Temperature study finds reliable evidence of a rise in the average world land temperature of approximately 1 °C since the mid-1950s.

Analyzing temperature data from 15 sources, in some cases going back as 1800, the Berkeley Earth study directly addressed scientific concerns raised by skeptics, including the urban heat island effect, poor station quality, and the risk of selection bias.

On the basis of its analysis, according to Berkeley Earth’s founder and scientific director, the group concluded that earlier studies based on more limited data by teams in the United States and Britain had accurately estimated the extent of land surface warming.

“Our biggest surprise was that the new results agreed so closely with the warming values published previously by other times in the U.S. and U.K.,” Muller said. “This confirms that these studies were done carefully and that potential biases identified by climate change skeptics did not seriously affect their conclusions.”

Anthony Watts’ response is here. As you might have guessed, he’s not “accepting whatever result they produce”.

Is it even possible for someone to back down from a position they’re deeply invested in? It may require a bit of help—an act of kindness. In Brian Merchant’s article Do climate skeptics change their minds?, he writes:

I asked Anthony Watts, the meteorologist who runs what may be the most popular climate-skeptic blog, Watts Up With That, what could lead him to accept climate science. A “starting point for the process,” he said, wouldn’t begin with more facts but instead with a public apology from the high profile scientists who have labeled him and his colleagues “deniers.”

Should we study geoengineering?

Should we study our options for fighting global warming by deliberately manipulating the Earth’s climate? This is called geoengineering—and not surprisingly, it makes lots of people nervous. There are plenty of things to worry about. But can we afford to completely ignore it?

An organization called the Bipartisan Policy Center, set up by four famous US senators, two Democratic and two Republican (Daschle, Mitchell, Baker and Dole) has released a report on this question.

Written by a panel of 18 experts on the natural sciences, social sciences, science policy, foreign policy, national security, and environmental issues, the report concludes that the U.S. government should start a “focused and systematic program of research into climate remediation.” They emphasize that it’s “far too premature to contemplate deployment of any climate remediation technology”, and note that:

Most climate remediation concepts proposed to date involve some combination of risks, financial costs, and/or physical limitations that make them inappropriate to pursue except as complementary or emergency measures—for example, if the climate system reaches a “tipping point” and swift remedial action is required.

But, they point out that even if the U.S. decides not to engage in geoengineering, it “needs to evaluate steps others might take and be able to effectively participate in—and lead—the important international conversations”.

Climate science report

The World Resources Institute has put out a 48-page report called Climate Science 2009-2010, reviewing recent work. For example:

• 2000-2009 was the warmest decade on record since 1880 (NASA).

• The area of Arctic ice that’s been around for many years decreased by 42 percent between 2005 and 2008. This ice has gotten about 0.6 meters thinner during that time. The average thickness of the seasonal ice in midwinter is about 2 meters. (Kwok et al.).

• Ocean acidification—caused by the buildup of carbon dioxide concentrations—is a threat to coral in areas such as the Great Barrier Reef, and is happening much more quickly than anticipated (De’ath et al.). It is now recognized as having implications for the entire ocean food web which is critical to whales, fish, and mollusks (Munday et al., Gooding et al. and Comeau et al.).

• A global average temperature increase of 7° C, which is toward the extreme upper part of the range of current projections, would make large portions of the world uninhabitable to humans (Sherwood et al.). For more, see my article How Hot is Too Hot?

• Recent literature (Yin et al.) suggests that sea level rise will likely not be even around the globe. In other words, sea level rise does not occur just like water being added to a bathtub. As a result, the northeast coast of the United States may be especially affected by changes in sea level due to changes in ocean circulation.

• The latest research (Francis et al. and Petoukhov et al.) also suggests that recent winter weather experienced in temperate Northern Hemisphere could be connected to climate change. As winter sea ice cover in the Arctic Ocean disappears, it can create a pressure and temperature gradient that sucks heat out of Europe. Therefore, recent extreme winter weather is not inconsistent with increases in global average temperature.

Shrinking Arctic lakes

Some Arctic lakes are shrinking. Why? One possibility: warmer temperatures and higher winds could cause more evaporation. Another: melting permafrost could let lake water soak into thawed soil. The scientist involved, Mark Carroll at the University of Maryland in College Park, “is not aware of any evidence that the permafrost in the far north is melting yet”. Hmm—compare my article Melting Permafrost.

Experiments in deforestation

Starting in December, a Malaysian state-owned company will start chopping down 75,000 hectares of rainforest on Borneo, to create yet another palm oil plantation. Unfortunately, that doesn’t count as newsworthy! What’s news is that a team led by Rob Ewers at Imperial College London will do an experiment based on this. Working to Ewers’s design, the loggers will leave patches of rainforest of different sizes, and at different distances from other patches of rainforest, to determine the effects of different levels of deforestation.

Permian-Triassic extinction

About 251 million years ago, our Earth suffered its biggest mass extinction event ever: the Permian-Triassic extinction. As many as 96% of all marine species and 70% of terrestrial vertebrates went extinct! Here’s what the sea bottom looked like before:

and after:

It took 50 million years for the Earth to completely recover its biodiversity!

Naturally there’s a lot of interest in figuring out what happened. The CO2 concentration soared to 2000 parts per million, and the temperature rose about 8 °C, but other things may have been at work too. I won’t attempt to discuss all this here!—just one little bit of news. Gregory Brennecka and others from Arizona State University and University of Cincinnati found that the ocean was low in oxygen for at most tens of thousands of years before the Permian-Triassic extinction. That’s shorter than previous estimates.

They saw a big shift in the ratio of 238U to 235U in carbonate rocks immediately prior to the mass extinction, which they claim signals an increase in oceanic anoxia—this is apparently a new technique. The team also found higher Th/U ratios in the same interval, which indicate a decrease in the uranium content of seawater. They also consider lower concentrations of uranium in seawater to be a sign of ocean anoxia.

Planet 3.0

Azimuth has joined Planet 3.0 an organization of climate-related blogs that also features blog articles of its own. It has an editorial team consisting of Michael Tobis and Dan Moutal, and a scientific advisory team consisting Steve Easterbrook, Arthur Smith, Michael Tobis, and Bart Verheggen.

I don’t know much about it yet, but it could be good. I’ve been wanting more people to join me blogging here on Azimuth, to build up more of a community and a higher level of energy, but maybe this is a better solution: keep Azimuth as is, but also put climate-related blog articles on Planet 3.0. We’ll see.

The part you’ve been waiting for

As with the picture at the top of this article, if you focus on any small patch, strange things seem to start happening everywhere else. Your eyes gets curious, and it’s hard to avoid looking. As your eyes flicker back and forth, the horizontal lines seem to twitch and bend.

At least, that’s what I see!


A Bet Concerning Neutrinos (Part 3)

19 October, 2011

As you’ve probably heard, an experiment called OPERA measured how fast neutrinos go from a particle accelerator in Switzerland to a detector in Italy. They got a speed slightly faster than light. This got a lot of people excited.

As a conservative old fart, I made a bet with Frederik De Roo saying that no, neutrinos do not go faster than light.

Since then, various reports have been zipping across the internet at near light-speed, claiming that neutrinos don’t go faster than light. But I think they’re a bit premature. Much as I’d like to win, I don’t think I’ve won just yet.

For example, last week someone who works on artificial intelligence at a university in the Netherlands said that the OPERA team made a mistake in their use of special relativity—a mistake that explains away their result:

• Ronald A.J. van Elburg Times of flight between a source and a detector observed from a GPS satellite, 12 October 2011.

Two days later, a pseudonymous blogger who works for MIT’s Technology Review said the argument was “convincing”:

• KentuckyFC, Faster-than-light neutrino puzzle claimed solved by special relativity, The Physics arXiv Blog, 14 October 2011.

The popular news media got all excited! But they may have been getting ahead of themselves. After all, the OPERA team includes a bunch of particle physicists. Special relativity is child’s play for them. Would they really screw up that bad, after years of checking and rechecking their work? Chad Orzel suggests not:

• Chad Orzel, Experimentalists aren’t idiots: The neutrino saga continues, Uncertain Principles, 16 October 2011.

And none of the physicists I know find Elburg’s argument very convincing.

But that’s not all! A couple weeks earlier, Cohen and Glashow did a calculation:

• Andrew G. Cohen, Sheldon L. Glashow, New constraints on neutrino velocities, 29 September 2011.

According to this, faster-than-light neutrinos would lose energy by emitting lots of electron-positron pairs, a bit like how a supersonic jet makes a sonic boom. Two days ago, another team of physicists doing experiments on neutrinos at Gran Sasso claimed that together with their experiment, this result refutes the existence of faster-than-light neutrinos:

• ICARUS team, A search for the analogue to Cherenkov radiation by high energy neutrinos at superluminal speeds in ICARUS, 17 October 2011.

At least one good physics blogger has taken this work as “definitive”:

• Tomasso Dorigo, ICARUS refutes Opera’s superluminal neutrinos, A Quantum Diaries Survivor, 18 October 2011.

He says:

The saga of the superluminal neutrinos took a dramatic turn today, with the publication of a very simple yet definitive study by ICARUS…

And so, the news media are getting excited again, saying that now the OPERA result is really dead, like a vampire with two stakes through its heart.

But how “definitive” is this result, really? I’m a bit disappointed that the Cohen–Glashow paper doesn’t clearly state the assumptions that go into their argument. They zip through the calculation in an offhand way that suggests they’re using standard principles of physics to their heart’s content—in particular, special relativity. Normally that’s fine. But not here. After all, if faster-than-light neutrinos were signalling a breakdown of any of these principles, their calculation might be invalid.

Of course I don’t believe neutrinos are going faster than light: that’s why I made that bet! If you don’t want to believe it either, that’s fine. But if you want to entertain this possibility, in order to disprove it, you’d better be clear on the logic you’re using.

Without actually measuring the speed of neutrinos, the best you can hope for is something like this: “If theoretical principles X and Y and Z are true, then our experiment shows neutrinos don’t go faster than light.” So neutrinos could still go faster than light… but only if X or Y or Z is false.

Maybe X, Y and Z are principles we hold sacred—maybe even more sacred than the principle that nothing goes faster than light! But shocking discoveries can have shocking consequences. Sacred truths can fall like dominoes.

Given this, I think the only truly definitive way to hammer the nail in the coffin of the OPERA experiment is to either

1) find a mistake in the experiment that convincingly explains its result

or

2) do more measurements of the speed of neutrinos.

And maybe Dorigo acknowledges this, in a way. He says:

So, forget superluminal neutrinos. Or maybe not: what remains to be seen is whether other experiments will find results consistent with v=c or not. That’s right: regardless of the tight ICARUS bound, every nerd with a neutrino detector in his or her garage is already set up to produce an independent confirmation of the startling OPERA result… We’ll soon see measurements by MINOS and Borexino, for instance. Interesting times to be a neutrino expert are these!

(Emphasis mine.)

So, I’m going to wait and see what happens. I want to win my bet fair and square.


The Decline Effect

18 October, 2011

I bumped into a surprising article recently:

• Jonah Lehrer, Is there something wrong with the scientific method?, New Yorker, 13 December 2010.

It starts with a bit of a bang:

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time, scientists know what results they want, and that can influence the results they get. The premise of replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.

This phenomenon does have a name now: it’s called the decline effect. The article tells some amazing stories about it. If you’re in the mood for some fun, I suggest going to your favorite couch or café now, and reading them!

For example: John Ioannides is the author of the most heavily downloaded paper in the open-access journal PLoS Medicine. It’s called Why most published research findings are false.

In it, Ioannides took three prestigious medical journals and looked at the 49 most cited clinical research studies. 45 of them used randomized controlled trials and reported positive results. But of the 34 that people tried to replicate, 41% were either directly contradicted or had their effect sizes significantly downgraded.

For more examples, read the article or listen to this radio show:

Cosmic Habituation, Radiolab, May 3, 2011.

It’s a bit sensationalistic… but it’s fun. It features Jonathan Schooler, who discovered a famous effect in psychology, called verbal overshadowing. It doesn’t really matter what this effect is. What matters is that it showed up very strongly in his first experiments… but as he and others continued to study it, it gradually diminished over time! He got freaked out. And then looked around, and saw that this sort of decline happened all over the place, in lots of cases.

What could cause this ‘decline effect’? There are lots of possible explanations.

At one extreme, maybe the decline effect doesn’t really exist. Maybe this sort of decline just happens sometimes purely by chance. Maybe there are equally many cases where effects seem to get stronger each time they’re measured!

At the other extreme, a very disturbing possibility has been proposed by Jonathan Schooler. He suggests that somehow the laws of reality change when they’re studied, in such a way that initially strong effects gradually get weaker.

I don’t believe this. It’s logically possible, but there are lots of less radical explanations to rule out first.

But if it were true, maybe we could make the decline effect go away by studying it. The decline effect would itself decline!

Unless of course, you started studying the decline of the decline effect.

Okay. On to some explanations that are interesting but less far-out.

One plausible explanation is significance chasing. Scientists work really hard to find something that’s ‘statistically significant’ according to the widely-used criterion of having a p-value of less than 0.05.

That sounds technical, but basically all it means is this: there was at most a 5% chance of having found a deviation from the expected situation that’s as big as the one you found.

(To play this game, you have to say ahead of time what the ‘expected situation’ is: this is your null hypothesis.)

Why is significance chasing dangerous? How can it lead to the decline effect?

Well, here’s how to write a paper with a statistically significant result. Go through 20 different colors of jelly bean and see if people who eat them have more acne than average. There’s a good chance that one of your experiments will say ‘yes’ with a p-value of less than 0.05, just because 0.05 = 1/20. If so, this experiment gives a statistically significant result!

I took this example from Randall Munroe’s cartoon strip xkcd:

It’s funny… but it’s actually sad: some testing of drugs is not much better than this! Clearly a result obtained this way is junk, so when you try to replicate it, the ‘decline effect’ will kick in.

Another possible cause of the decline effect is publication bias: scientists and journals prefer positive results over null results, where no effect is found. And surely there are other explanations, too: for starters, all the ways people can fool themselves into thinking they’ve discovered something interesting.

For suggestions on how to avoid the evils of ‘publication bias’, try these:

• Jonathan Schooler, Unpublished results hide the decline effect, Nature 470 (2011), 437.

Putting an end to ‘significance chasing’ may require people to learn more about statistics:

• Geoff Cumming, Significant does not equal important: why we need the new statistics, 9 October 2011.

He explains the problem in simple language:

Consider a psychologist who’s investigating a new therapy for anxiety. She randomly assigns anxious clients to the therapy group, or a control group. You might think the most informative result would be an estimate of the benefit of therapy – the average improvement as a number of points on the anxiety scale-together with the amount that’s the confidence interval around that average. But psychology typically uses significance testing rather than estimation.

Introductory statistics books often introduce significance testing as a step-by-step recipe:

Step 1. Assume the new therapy has zero effect. You don’t believe this and you fervently hope it’s not true, but you assume it.

Step 2. You use that assumption to calculate a strange thing called a ‘p value’, which is the probability that, if the therapy really has zero effect, the experiment would have given a difference as large as you observed, or even larger.

Step 3. If the p value is small, in particular less than the hallowed criterion of .05 (that’s 1 chance in 20), you are permitted to reject your initial assumption—which you never believed anyway—and declare that the therapy has a ‘significant’ effect.

If that’s confusing, you’re in good company. Significance testing relies on weird backward logic. No wonder countless students every year are bamboozled by their introduction to statistics! Why this strange ritual they ask, and what does a p value actually mean? Why don’t we focus on how large an improvement the therapy gives, and whether people actually find it helpful? These are excellent questions, and estimation gives the best answers.

For half a century distinguished scholars have published damning critiques of significance testing, and explained how it hampers research progress. There’s also extensive evidence that students, researchers, and even statistics teachers often don’t understand significance testing correctly. Strangely, the critiques of significance testing have hardly prompted any defences by its supporters. Instead, psychology and other disciplines have simply continued with the significance testing ritual, which is now deeply entrenched. It’s used in more than 90% of published research in psychology, and taught in every introductory textbook.

For more discussion and references, try my co-blogger:

• Tom Leinster, Fetishizing p-values, n-Category Café.

He gives some good examples of how significance testing can lead us astray. Anyone who uses the p-test should read these! He also discusses this book:

• Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance, University of Michigan Press, Ann Arbor, 2008. (Online summary here.)

Now, back to the provocative title of that New Yorker article: “Is there something wrong with the scientific method?”

The answer is yes if we mean science as actually practiced, now. Lots of scientists are using cookbook recipes they learned in statistics class without understanding them, or investigating the alternatives. Worse, some are treating statistics as a necessary but unpleasant piece of bureaucratic red tape, and then doing whatever it takes to achieve the appearance of a significant result!

This is a bit depressing. There’s a student I know, who is taking an introductory statistics course. After she read about this stuff she said:

So, what I’m gleaning here is that what I’m studying is basically bull. It struck me as bull to start with, admittedly, but since my grade depended on it, I grinned and swallowed. At least my eyes are open now, I guess.

But there’s some good news, buried in her last sentence. Science has the marvelous ability to notice and correct its own mistakes. It’s scientists who noticed the decline effect and significance chasing. They’ll eventually figure out what’s going on, and learn how to fix any mistakes that they’ve been making. So ultimately, I don’t find this story depressing. It’s actually inspiring!

The scientific method is not a fixed rulebook handed down from on high. It’s a work in progress. It’s only been around for a few centuries—not very long, in the grand scheme of things. The widespread use of statistics in science has been around for less than one century. And computers, which make heavy-duty number-crunching easy, have only been cheap for 30 years! No wonder people still use primitive cookbook methods for analyzing data, when they could do better.

So science is still evolving. And I think that’s fun, because it means we can help it along. If you see someone claim their results are statistically significant, you can ask them what they mean, exactly… and what they had to do to get those results.


I thank a lot of people on Google+ for discussions on this topic, including (but not limited to) John Forbes, Roko Mijic, Heather Vandagriff, and Willie Wong.


The Science Code Manifesto

15 October, 2011

There’s a manifesto that you can sign, calling for a more sensible approach to the use of software in science. It says:

Software is a cornerstone of science. Without software, twenty-first century science would be impossible. Without better software, science cannot progress.

But the culture and institutions of science have not yet adjusted to this reality. We need to reform them to address this challenge, by adopting these five principles:

Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

Copyright: The copyright ownership and license of any released source code must be clearly stated.

Citation: Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.

Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.

Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

The founding signatories are:

• Nick Barnes and David Jones of the Climate Code Foundation,

• Peter Norvig, the director of research at Google,

• Cameron Neylon of Science in the Open,

• Rufus Pollock of the Open Knowledge Foundation,

• Joseph Jackson of the Open Science Foundation.

I was the 312th person to sign. How about joining?

There’s a longer discussion of each point of the manifesto here. It ties in nicely with the philosophy of the Azimuth Code Project, namely:

Many papers in climate science present results that cannot be reproduced. The authors present a pretty diagram, but don’t explain which software they used to make it, and don’t make this software available, don’t really explain how they did what they did. This needs to change! Scientific results need to be reproducible. Therefore, any software used should be versioned and published alongside any scientific results.

All of this is true for large climate models such as General Circulation Models, as well—but the problem becomes much more serious, because these models have long outgrown the extend where a single developer was able to understand all the code. This is a kind of phase transition in software development: it necessitates a different toolset and a different approach to software development.

As Nick Barnes points out, these ideas

… are simply extensions of the core principle of science: publication. Publication is what distinguishes science from alchemy, and is what has propelled science—and human society—so far and so fast in the last 300 years. The Manifesto is the natural application of this principle to the relatively new, and increasingly important, area of science software.


Network Theory (Part 14)

15 October, 2011

We’ve been doing a lot of hard work lately. Let’s take a break and think about a fun example from chemistry!

The ethyl cation

Suppose you start with a molecule of ethane, which has 2 carbons and 6 hydrogens arranged like this:

Then suppose you remove one hydrogen. The result is a positively charged ion, or ‘cation’. When I was a kid, I thought the opposite of a cation should be called a ‘dogion’. Alas, it’s not.

This particular cation, formed from removing one hydrogen from an ethane molecule, is called an ‘ethyl cation’. People used to think it looked like this:

They also thought a hydrogen could hop from the carbon with 3 hydrogens attached to it to the carbon with 2. So, they drew a graph with a vertex for each way the hydrogens could be arranged, and an edge for each hop. It looks really cool:

The red vertices come from arrangements where the first carbon has 2 hydrogens attached to it, and the blue vertices come from those where the second carbon has 2 hydrogens attached to it. So, each edge goes between a red vertex and a blue vertex.

This graph has 20 vertices, which are arrangements or ‘states’ of the ethyl cation. It has 30 edges, which are hops or ‘transitions’. Let’s see why those numbers are right.

First I need to explain the rules of the game. The rules say that the 2 carbon atoms are distinguishable: there’s a ‘first’ one and a ‘second’ one. The 5 hydrogen atoms are also distinguishable. But, all we care about is which carbon atom each hydrogen is bonded to: we don’t care about further details of its location. And we require that 2 of the hydrogens are bonded to one carbon, and 3 to the other.

If you’re a physicist, you may wonder why the rules work this way: after all, at a fundamental level, identical particles aren’t really distinguishable. I’m afraid I can’t give a fully convincing explanation right now: I’m just reporting the rules as they were told to me!

Given these rules, there are 2 choices of which carbon has two hydrogens attached to it. Then there are

\displaystyle{ \binom{5}{2} = \frac{5 \times 4}{2 \times 1} = 10}

choices of which two hydrogens are attached to it. This gives a total of 2 × 10 = 20 states. These are the vertices of our graph: 10 red and 10 blue.

The edges of the graph are transitions between states. Any hydrogen in the group of 3 can hop over to the group of 2. There are 3 choices for which hydrogen atom makes the jump. So, starting from any vertex in the graph there are 3 edges. This means there are 3 \times 20 / 2 = 30 edges.

Why divide by 2? Because each edge touches 2 vertices. We have to avoid double-counting them.

The Desargues graph

The idea of using this graph in chemistry goes back to this paper:

• A. T. Balaban, D. Fǎrcaşiu and R. Bǎnicǎ, Graphs of multiple 1,2-shifts in carbonium ions and related systems, Rev. Roum. Chim. 11 (1966), 1205.

This paper is famous because it was the first to use graphs in chemistry to describe molecular transitions, as opposed to using them as pictures of molecules!

But this particular graph was already famous for other reasons. It’s called the Desargues-Levi graph, or Desargues graph for short:

Desargues graph, Wikipedia.

Later I’ll say why it’s called this.

There are lots of nice ways to draw the Desargues graph. For example:

The reason why we can draw such pretty pictures is that the Desargues graph is very symmetrical. Clearly any permutation of the 5 hydrogens acts as a symmetry of the graph, and so does any permutation of the 2 carbons. This gives a symmetry group S_5 \times S_2, which has 5! \times 2! = 240 elements. And in fact this turns out to be the full symmetry group of the Desargues graph.

The Desargues graph, its symmetry group, and its applications to chemistry are discussed here:

• Milan Randic, Symmetry properties of graphs of interest in chemistry: II: Desargues-Levi graph, Int. Jour. Quantum Chem. 15 (1997), 663-682.

The ethyl cation, revisited

We can try to describe the ethyl cation using probability theory. If at any moment its state corresponds to some vertex of the Desargues graph, and it hops randomly along edges as time goes by, it will trace out a random walk on the Desargues graph. This is a nice example of a Markov process!

We could also try to describe the ethyl cation using quantum mechanics. Then, instead of having a probability of hopping along an edge, it has an amplitude of doing so. But as we’ve seen, a lot of similar math will still apply.

It should be fun to compare the two approaches. But I bet you’re wondering which approach is correct. This is a somewhat tricky question, at least for me. The answer would seem to depend on how much the ethyl cation is interacting with its environment—for example, bouncing off other molecules. When a system is interacting a lot with its environment, a probabilistic approach seems to be more appropriate. The relevant buzzword is ‘environmentally induced decoherence’.

However, there’s something much more basic I have tell you about.

After the paper by Balaban, Fǎrcaşiu and Bǎnicǎ came out, people gradually realized that the ethyl cation doesn’t really look like the drawing I showed you! It’s what chemists call ‘nonclassical’ ion. What they mean is this: its actual structure is not what you get by taking the traditional ball-and-stick model of an ethane molecule and ripping off a hydrogen. The ethyl cation really looks like this:

For more details, and pictures that you can actually rotate, see:

• Stephen Bacharach, Ethyl cation, Computational Organic Chemistry.

So, if we stubbornly insist on applying the Desargues graph to realistic chemistry, we need to find some other molecule to apply it to.

Trigonal bipyramidal molecules

Luckily, there are lots of options! They’re called trigonal bipyramidal molecules. They look like this:

The 5 balls on the outside are called ‘ligands’: they could be atoms or bunches of atoms. In chemistry, ‘ligand‘ just means something that’s stuck onto a central thing. For example, in phosphorus pentachloride the ligands are chlorine atoms, all attached to a central phosphorus atom:

It’s a colorless solid, but as you might expect, it’s pretty nasty stuff: it’s not flammable, but it reacts with water or heat to produce toxic chemicals like hydrogen chloride.

Another example is iron pentacarbonyl, where 5 carbon-oxygen ligands are attached to a central iron atom:

You can make this stuff by letting powdered iron react with carbon monoxide. It’s a straw-colored liquid with a pungent smell!

Whenever you’ve got a molecule of this shape, the ligands come in two kinds. There are the 2 ‘axial’ ones, and the 3 ‘equatorial’ ones:

And the molecule has 20 states… but only if count the states a certain way. We have to treat all 5 ligands as distinguishable, but think of two arrangements of them as the same if we can rotate one to get the other. The trigonal bipyramid has a rotational symmetry group with 6 elements. So, there are 5! / 6 = 20 states.

The transitions between states are devilishly tricky. They’re called pseudorotations, and they look like this:

If you look very carefully, you’ll see what’s going on. First the 2 axial ligands move towards each other to become equatorial.
Now the equatorial ones are no longer in the horizontal plane: they’re in the plane facing us! Then 2 of the 3 equatorial ones swing out to become axial. This fancy dance is called the Berry pseudorotation mechanism.

To get from one state to another this way, we have to pick 2 of the 3 equatorial ligands to swing out and become axial. There are 3 choices here. So, if we draw a graph with states as vertices and transitions as edges, it will have 20 vertices and 20 × 3 / 2 = 30 edges. That sounds suspiciously like the Desargues graph!

Puzzle 1. Show that the graph with states of a trigonal bipyramidal molecule as vertices and pseudorotations as edges is indeed the Desargues graph.

I think this fact was first noticed here:

• Paul C. Lauterbur and Fausto Ramirez, Pseudorotation in trigonal-bipyramidal molecules, J. Am. Chem. Soc. 90 (1968), 6722–6726.

Okay, enough for now! Next time I’ll say more about the Markov process or quantum process corresponding to a random walk on the Desargues graph. But since the Berry pseudorotation mechanism is so hard to visualize, I’ll pretend that the ethyl cation looks like this:

and I’ll use this picture to help us think about the Desargues graph.

That’s okay: everything we’ll figure out can easily be translated to apply to the real-world situation of a trigonal bipyramidal molecule. The virtue of math is that when two situations are ‘mathematically the same’, or ‘isomorphic’, we can talk about either one, and the results automatically apply to the other. This is true even if the one we talk about doesn’t actually exist in the real world!


Network Theory (Part 13)

11 October, 2011

Unlike some recent posts, this will be very short. I merely want to show you the quantum and stochastic versions of Noether’s theorem, side by side.

Having made my sacrificial offering to the math gods last time by explaining how everything generalizes when we replace our finite set X of states by an infinite set or an even more general measure space, I’ll now relax and state Noether’s theorem only for a finite set. If you’re the sort of person who finds that unsatisfactory, you can do the generalization yourself.

Two versions of Noether’s theorem

Let me write the quantum and stochastic Noether’s theorem so they look almost the same:

Theorem. Let X be a finite set. Suppose H is a self-adjoint operator on L^2(X), and let O be an observable. Then

[O,H] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value of O in the state \psi(t) does not change with t.

Theorem. Let X be a finite set. Suppose H is an infinitesimal stochastic operator on L^1(X), and let O be an observable. Then

[O,H] =0

if and only if for all states \psi(t) obeying the master equation

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

the expected values of O and O^2 in the state \psi(t) do not change with t.

This makes the big difference stick out like a sore thumb: in the quantum version we only need the expected value of O, while in the stochastic version we need the expected values of O and O^2!

Brendan Fong proved the stochastic version of Noether’s theorem in Part 11. Now let’s do the quantum version.

Proof of the quantum version

My statement of the quantum version was silly in a couple of ways. First, I spoke of the Hilbert space L^2(X) for a finite set X, but any finite-dimensional Hilbert space will do equally well. Second, I spoke of the “self-adjoint operator” H and the “observable” O, but in quantum mechanics an observable is the same thing as a self-adjoint operator!

Why did I talk in such a silly way? Because I was attempting to emphasize the similarity between quantum mechanics and stochastic mechanics. But they’re somewhat different. For example, in stochastic mechanics we have two very different concepts: infinitesimal stochastic operators, which generate symmetries, and functions on our set X, which are observables. But in quantum mechanics something wonderful happens: self-adjoint operators both generate symmetries and are observables! So, my attempt was a bit strained.

Let me state and prove a less silly quantum version of Noether’s theorem, which implies the one above:

Theorem. Suppose H and O are self-adjoint operators on a finite-dimensional Hilbert space. Then

[O,H] = 0

if and only if for all states \psi(t) obeying Schrödinger’s equation

\displaystyle{ \frac{d}{d t} \psi(t) = -i H \psi(t) }

the expected value of O in the state \psi(t) does not change with t:

\displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle = 0 }

Proof. The trick is to compute the time derivative I just wrote down. Using Schrödinger’s equation, the product rule, and the fact that H is self-adjoint we get:

\begin{array}{ccl}  \displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle } &=&   \langle -i H \psi(t) , O \psi(t) \rangle + \langle \psi(t) , O (- i H \psi(t)) \rangle \\  \\  &=& i \langle \psi(t) , H O \psi(t) \rangle -i \langle \psi(t) , O H \psi(t)) \rangle \\  \\  &=& - i \langle \psi(t), [O,H] \psi(t) \rangle  \end{array}

So, if [O,H] = 0, clearly the above time derivative vanishes. Conversely, if this time derivative vanishes for all states \psi(t) obeying Schrödinger’s equation, we know

\langle \psi, [O,H] \psi \rangle = 0

for all states \psi and thus all vectors in our Hilbert space. Does this imply [O,H] = 0? Yes, because i times a commutator of a self-adjoint operators is self-adjoint, and for any self-adjoint operator A we have

\forall \psi  \; \; \langle \psi, A \psi \rangle = 0 \qquad \Rightarrow \qquad A = 0

This is a well-known fact whose proof goes like this. Assume \langle \psi, A \psi \rangle = 0 for all \psi. Then to show A = 0, it is enough to show \langle \phi, A \psi \rangle = 0 for all \phi and \psi. But we have a marvelous identity:

\begin{array}{ccl} \langle \phi, A \psi \rangle &=& \frac{1}{4} \left( \langle \phi + \psi, \, A (\phi + \psi) \rangle \; - \; \langle \psi - \phi, \, A (\psi - \phi) \rangle \right. \\ && \left. +i \langle \psi + i \phi, \, A (\psi + i \phi) \rangle \; - \; i\langle \psi - i \phi, \, A (\psi - i \phi) \rangle \right) \end{array}

and all four terms on the right vanish by our assumption.   █

The marvelous identity up there is called the polarization identity. In plain English, it says: if you know the diagonal entries of a self-adjoint matrix in every basis, you can figure out all the entries of that matrix in every basis.

Why is it called the ‘polarization identity’? I think because it shows up in optics, in the study of polarized light.

Comparison

In both the quantum and stochastic cases, the time derivative of the expected value of an observable O is expressed in terms of its commutator with the Hamiltonian. In the quantum case we have

\displaystyle{ \frac{d}{d t} \langle \psi(t), O \psi(t) \rangle = - i \langle \psi(t), [O,H] \psi(t) \rangle }

and for the right side to always vanish, we need [O,H] = 0 latex , thanks to the polarization identity. In the stochastic case, a perfectly analogous equation holds:

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

but now the right side can always vanish even without [O,H] = 0. We saw a counterexample in Part 11. There is nothing like the polarization identity to save us! To get [O,H] = 0 we need a supplementary hypothesis, for example the vanishing of

\displaystyle{ \frac{d}{d t} \int O^2 \psi(t) }

Okay! Starting next time we’ll change gears and look at some more examples of stochastic Petri nets and Markov processes, including some from chemistry. After some more of that, I’ll move on to networks of other sorts. There’s a really big picture here, and I’m afraid I’ve been getting caught up in the details of a tiny corner.


Network Theory (Part 12)

9 October, 2011

Last time we proved a version of Noether’s theorem for stochastic mechanics. Now I want to compare that to the more familiar quantum version.

But to do this, I need to say more about the analogy between stochastic mechanics and quantum mechanics. And whenever I try, I get pulled toward explaining some technical issues involving analysis: whether sums converge, whether derivatives exist, and so on. I’ve been trying to avoid such stuff—not because I dislike it, but because I’m afraid you might. But the more I put off discussing these issues, the more they fester and make me unhappy. In fact, that’s why it’s taken so long for me to write this post!

So, this time I will gently explore some of these issues. But don’t be scared: I’ll mainly talk about some simple big ideas. Next time I’ll discuss Noether’s theorem. I hope that by getting the technicalities out of my system, I’ll feel okay about hand-waving whenever I want.

And if you’re an expert on analysis, maybe you can help me with a question.

Stochastic mechanics versus quantum mechanics

First, we need to recall the analogy we began sketching in Part 5, and push it a bit further. The idea is that stochastic mechanics differs from quantum mechanics in two big ways:

• First, instead of complex amplitudes, stochastic mechanics uses nonnegative real probabilities. The complex numbers form a ring; the nonnegative real numbers form a mere rig, which is a ‘ring without negatives’. Rigs are much neglected in the typical math curriculum, but unjustly so: they’re almost as good as rings in many ways, and there are lots of important examples, like the natural numbers \mathbb{N} and the nonnegative real numbers, [0,\infty). For probability theory, we should learn to love rigs.

But there are, alas, situations where we need to subtract probabilities, even when the answer comes out negative: namely when we’re taking the time derivative of a probability. So sometimes we need \mathbb{R} instead of just [0,\infty).

• Second, while in quantum mechanics a state is described using a ‘wavefunction’, meaning a complex-valued function obeying

\int |\psi|^2 = 1

in stochastic mechanics it’s described using a ‘probability distribution’, meaning a nonnegative real function obeying

\int \psi = 1

So, let’s try our best to present the theories in close analogy, while respecting these two differences.

States

We’ll start with a set X whose points are states that a system can be in. Last time I assumed X was a finite set, but this post is so mathematical I might as well let my hair down and assume it’s a measure space. A measure space lets you do integrals, but a finite set is a special case, and then these integrals are just sums. So, I’ll write things like

\int f

and mean the integral of the function f over the measure space X, but if X is a finite set this just means

\sum_{x \in X} f(x)

Now, I’ve already defined the word ‘state’, but both quantum and stochastic mechanics need a more general concept of state. Let’s call these ‘quantum states’ and ‘stochastic states’:

• In quantum mechanics, the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

\int | \psi |^2 = 1

We call \psi: X \to \mathbb{C} obeying this equation a quantum state.

• In stochastic mechanics, the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

\int \psi = 1

We call \psi: X \to [0,\infty) obeying this equation a stochastic state.

In quantum mechanics we often use this abbreviation:

\langle \phi, \psi \rangle = \int \overline{\phi} \psi

so that a quantum state has

\langle \psi, \psi \rangle = 1

Similarly, we could introduce this notation in stochastic mechanics:

\langle \psi \rangle = \int \psi

so that a stochastic state has

\langle \psi \rangle = 1

But this notation is a bit risky, since angle brackets of this sort often stand for expectation values of observables. So, I’ve been writing \int \psi, and I’ll keep on doing this.

In quantum mechanics, \langle \phi, \psi \rangle is well-defined whenever both \phi and \psi live in the vector space

L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int |\psi|^2 < \infty \}

In stochastic mechanics, \langle \psi \rangle is well-defined whenever \psi lives in the vector space

L^1(X) =  \{ \psi: X \to \mathbb{R} \; : \; \int |\psi| < \infty \}

You’ll notice I wrote \mathbb{R} rather than [0,\infty) here. That’s because in some calculations we’ll need functions that take negative values, even though our stochastic states are nonnegative.

Observables

A state is a way our system can be. An observable is something we can measure about our system. They fit together: we can measure an observable when our system is in some state. If we repeat this we may get different answers, but there’s a nice formula for average or ‘expected’ answer.

• In quantum mechanics, an observable is a self-adjoint operator A on L^2(X). The expected value of A in the state \psi is

\langle \psi, A \psi \rangle

Here I’m assuming that we can apply A to \psi and get a new vector A \psi \in L^2(X). This is automatically true when X is a finite set, but in general we need to be more careful.

• In stochastic mechanics, an observable is a real-valued function A on X. The expected value of A in the state \psi is

\int A \psi

Here we’re using the fact that we can multiply A and \psi and get a new vector A \psi \in L^1(X), at least if A is bounded. Again, this is automatic if X is a finite set, but not otherwise.

Symmetries

Besides states and observables, we need ‘symmetries’, which are transformations that map states to states. We use these to describe how our system changes when we wait a while, for example.

• In quantum mechanics, an isometry is a linear map U: L^2(X) \to L^2(X) such that

\langle U \phi, U \psi \rangle = \langle \phi, \psi \rangle

for all \psi, \phi \in L^2(X). If U is an isometry and \psi is a quantum state, then U \psi is again a quantum state.

• In stochastic mechanics, a stochastic operator is a linear map U: L^1(X) \to L^1(X) such that

\int U \psi = \int \psi

and

\psi \ge 0 \; \; \Rightarrow \; \; U \psi \ge 0

for all \psi \in L^1(X). If U is stochastic and \psi is a stochastic state, then U \psi is again a stochastic state.

In quantum mechanics we are mainly interested in invertible isometries, which are called unitary operators. There are lots of these, and their inverses are always isometries. There are, however, very few stochastic operators whose inverses are stochastic:

Puzzle 1. Suppose X is a finite set. Show that every isometry U: L^2(X) \to L^2(X) is invertible, and its inverse is again an isometry.

Puzzle 2. Suppose X is a finite set. Which stochastic operators U: L^1(X) \to L^1(X) have stochastic inverses?

This is why we usually think of time evolution as being reversible quantum mechanics, but not in stochastic mechanics! In quantum mechanics we often describe time evolution using a ‘1-parameter group’, while in stochastic mechanics we describe it using a 1-parameter semigroup… meaning that we can run time forwards, but not backwards.

But let’s see how this works in detail!

Time evolution in quantum mechanics

In quantum mechanics there’s a beautiful relation between observables and symmetries, which goes like this. Suppose that for each time t we want a unitary operator U(t) :  L^2(X) \to L^2(X) that describes time evolution. Then it makes a lot of sense to demand that these operators form a 1-parameter group:

Definition. A collection of linear operators U(t) (t \in \mathbb{R}) on some vector space forms a 1-parameter group if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s,t \in \mathbb{R}.

Note that these conditions force all the operators U(t) to be invertible.

Now suppose our vector space is a Hilbert space, like L^2(X). Then we call a 1-parameter group a 1-parameter unitary group if the operators involved are all unitary.

It turns out that 1-parameter unitary groups are either continuous in a certain way, or so pathological that you can’t even prove they exist without the axiom of choice! So, we always focus on the continuous case:

Definition. A 1-parameter unitary group is strongly continuous if U(t) \psi depends continuously on t for all \psi, in this sense:

t_i \to t \;\; \Rightarrow \; \;\|U(t_i) \psi - U(t) \psi \| \to 0

Then we get a classic result proved by Marshall Stone back in the early 1930s. You may not know him, but he was so influential at the University of Chicago during this period that it’s often called the “Stone Age”. And here’s one reason why:

Stone’s Theorem. There is a one-to-one correspondence between strongly continuous 1-parameter unitary groups on a Hilbert space and self-adjoint operators on that Hilbert space, given as follows. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(-i t H)

for a unique self-adjoint operator H. Conversely, any self-adjoint operator determines a strongly continuous 1-parameter group this way. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

When U(t) = \exp(-i t H) describes the evolution of a system in time, H is is called the Hamiltonian, and it has the physical meaning of ‘energy’. The equation I just wrote down is then called Schrödinger’s equation.

So, simply put, in quantum mechanics we have a correspondence between observables and nice one-parameter groups of symmetries. Not surprisingly, our favorite observable, energy, corresponds to our favorite symmetry: time evolution!

However, if you were paying attention, you noticed that I carefully avoided explaining how we define \exp(- i t H). I didn’t even say what a self-adjoint operator is. This is where the technicalities come in: they arise when H is unbounded, and not defined on all vectors in our Hilbert space.

Luckily, these technicalities evaporate for finite-dimensional Hilbert spaces, such as L^2(X) for a finite set X. Then we get:

Stone’s Theorem (Baby Version). Suppose we are given a finite-dimensional Hilbert space. In this case, a linear operator H on this space is self-adjoint iff it’s defined on the whole space and

\langle \phi , H \psi \rangle = \langle H \phi, \psi \rangle

for all vectors \phi, \psi. Given a strongly continuous 1-parameter unitary group U(t) we can always write

U(t) = \exp(- i t H)

for a unique self-adjoint operator H, where

\displaystyle{ \exp(-i t H) \psi = \sum_{n = 0}^\infty \frac{(-i t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any self-adjoint operator on our space determines a strongly continuous 1-parameter group this way. For all vectors \psi in our space we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = -i H \psi }

and if we set

\psi(t) = \exp(-i t H) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = - i H \psi(t) }

Time evolution in stochastic mechanics

We’ve seen that in quantum mechanics, time evolution is usually described by a 1-parameter group of operators that comes from an observable: the Hamiltonian. Stochastic mechanics is different!

First, since stochastic operators aren’t usually invertible, we typically describe time evolution by a mere ‘semigroup’:

Definition. A collection of linear operators U(t) (t \in [0,\infty)) on some vector space forms a 1-parameter semigroup if

U(0) = 1

and

U(s+t) = U(s) U(t)

for all s, t \ge 0.

Now suppose this vector space is L^1(X) for some measure space X. We want to focus on the case where the operators U(t) are stochastic and depend continuously on t in the same sense we discussed earlier.

Definition. A 1-parameter strongly continuous semigroup of stochastic operators U(t) : L^1(X) \to L^1(X) is called a Markov semigroup.

What’s the analogue of Stone’s theorem for Markov semigroups? I don’t know a fully satisfactory answer! If you know, please tell me.

Later I’ll say what I do know—I’m not completely clueless—but for now let’s look at the ‘baby’ case where X is a finite set. Then the story is neat and complete:

Theorem. Suppose we are given a finite set X. In this case, a linear operator H on L^1(X) is infinitesimal stochastic iff it’s defined on the whole space,

\int H \psi = 0

for all \psi \in L^1(X), and the matrix of H in terms of the obvious basis obeys

H_{i j} \ge 0

for all j \ne i. Given a Markov semigroup U(t) on L^1(X), we can always write

U(t) = \exp(t H)

for a unique infinitesimal stochastic operator H, where

\displaystyle{ \exp(t H) \psi = \sum_{n = 0}^\infty \frac{(t H)^n}{n!} \psi }

with the sum converging for all \psi. Conversely, any infinitesimal stochastic operator on our space determines a Markov semigroup this way. For all \psi \in L^1(X) we then have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

and if we set

\psi(t) = \exp(t H) \psi

we have the master equation:

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

In short, time evolution in stochastic mechanics is a lot like time evolution in quantum mechanics, except it’s typically not invertible, and the Hamiltonian is typically not an observable.

Why not? Because we defined an observable to be a function A: X \to \mathbb{R}. We can think of this as giving an operator on L^1(X), namely the operator of multiplication by A. That’s a nice trick, which we used to good effect last time. However, at least when X is a finite set, this operator will be diagonal in the obvious basis consisting of functions that equal 1 at one point of X and zero elsewhere. So, it can only be infinitesimal stochastic if it’s zero!

Puzzle 3. If X is a finite set, show that any operator on L^1(X) that’s both diagonal and infinitesimal stochastic must be zero.

The Hille–Yosida theorem

I’ve now told you everything you really need to know… but not everything I want to say. What happens when X is not a finite set? What are Markov semigroups like then? I can’t abide letting this question go unresolved! Unfortunately I only know a partial answer.

We can get a certain distance using the Hille-Yosida theorem, which is much more general.

Definition. A Banach space is vector space with a norm such that any Cauchy sequence converges.

Examples include Hilbert spaces like L^2(X) for any measure space, but also other spaces like L^1(X) for any measure space!

Definition. If V is a Banach space, a 1-parameter semigroup of operators U(t) : V \to V is called a contraction semigroup if it’s strongly continuous and

\| U(t) \psi \| \le \| \psi \|

for all t \ge 0 and all \psi \in V.

Examples include strongly continuous 1-parameter unitary groups, but also Markov semigroups!

Puzzle 4. Show any Markov semigroup is a contraction semigroup.

The Hille–Yosida theorem generalizes Stone’s theorem to contraction semigroups. In my misspent youth, I spent a lot of time carrying around Yosida’s book Functional Analysis. Furthermore, Einar Hille was the advisor of my thesis advisor, Irving Segal. Segal generalized the Hille–Yosida theorem to nonlinear operators, and I used this generalization a lot back when I studied nonlinear partial differential equations. So, I feel compelled to tell you this theorem:

Hille-Yosida Theorem. Given a contraction semigroup U(t) we can always write

U(t) = \exp(t H)

for some densely defined operator H such that H - \lambda I has an inverse and

\displaystyle{ \| (H - \lambda I)^{-1} \psi \| \le \frac{1}{\lambda} \| \psi \| }

for all \lambda > 0 and \psi \in V. Conversely, any such operator determines a strongly continuous 1-parameter group. For all vectors \psi for which H \psi is well-defined, we have

\displaystyle{ \left.\frac{d}{d t} U(t) \psi \right|_{t = 0} = H \psi }

Moreover, for any of these vectors, if we set

\psi(t) = U(t) \psi

we have

\displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

If you like, you can take the stuff at the end of this theorem to be what we mean by saying U(t) = \exp(t H). When U(t) = \exp(t H), we say that H generates the semigroup U(t).

But now suppose V = L^1(X). Besides the conditions in the Hille–Yosida theorem, what extra conditions on H are necessary and sufficient for it to generate a Markov semigroup? In other words, what’s a definition of ‘infinitesimal stochastic operator’ that’s suitable not only when X is a finite set, but an arbitrary measure space?

I asked this question on Mathoverflow a few months ago, and so far the answers have not been completely satisfactory.

Some people mentioned the Hille–Yosida theorem, which is surely a step in the right direction, but not the full answer.

Others discussed the special case when \exp(t H) extends to a bounded self-adjoint operator on L^2(X). When X is a finite set, this special case happens precisely when the matrix H_{i j} is symmetric: the probability of hopping from j to i equals the probability of hopping from i to j. This is a fascinating special case, not least because when H is both infinitesimal stochastic and self-adjoint, we can use it as a Hamiltonian for both stochastic mechanics and quantum mechanics! Someday I want to discuss this. However, it’s just a special case.

After grabbing people by the collar and insisting that I wanted to know the answer to the question I actually asked—not some vaguely similar question—the best answer seems to be Martin Gisser’s reference to this book:

• Zhi-Ming Ma and Michael Röckner, Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer, Berlin, 1992.

This book provides a very nice self-contained proof of the Hille-Yosida theorem. On the other hand, it does not answer my question in general, but only when the skew-symmetric part of H is dominated (in a certain sense) by the symmetric part.

So, I’m stuck on this front, but that needn’t bring the whole project to a halt. We’ll just sidestep this question.

For a good well-rounded introduction to Markov semigroups and what they’re good for, try:

• Ryszard Rudnicki, Katarzyna Pichór and Marta Tyran-Kamínska, Markov semigroups and their applications.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers